ACK/Cmnt: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Kleber Souza kleber.souza at canonical.com
Thu Sep 27 15:27:38 UTC 2018


On 09/27/18 17:15, Stefan Bader wrote:
> Subject: [Xenial SRU] nvme-pci: add a memory barrier to
> nvme_dbbuf_update_and_check_event
> 
> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1788222
>>
> 
> -- cut ---
>> That's a custom backport for Xenial of the commit 07e860202d18
>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
>> from git://git.infradead.org/nvme.git).
>>
>> Original description:
> -- cut --
>>
>> In many architectures loads may be reordered with older stores to
>> different locations.  In the nvme driver the following two operations
>> could be reordered:
>>
>>     - Write shadow doorbell (dbbuf_db) into memory.
>>     - Read EventIdx (dbbuf_ei) from memory.
>>
>> This can result in a potential race condition between driver and VM host
>> processing requests (if given virtual NVMe controller has a support for
>> shadow doorbell).  If that occurs, then the NVMe controller may decide to
>> wait for MMIO doorbell from guest operating system, and guest driver may
>> decide not to issue MMIO doorbell on any of subsequent commands.
>>
>> This issue is purely timing-dependent one, so there is no easy way to
>> reproduce it. Currently the easiest known approach is to run "Oracle IO
>> Numbers" (orion) that is shipped with Oracle DB:
>>
>> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
>> 	concat -write 40 -duration 120 -matrix row -testname nvme_test
>>
>> Where nvme_test is a .lun file that contains a list of NVMe block
>> devices to run test against. Limiting number of vCPUs assigned to given
>> VM instance seems to increase chances for this bug to occur. On test
>> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
>> virtual NVMe controller hang could be observed within 10-20 minutes.
>> That correspond to about 400-500k IO operations processed (or about
>> 100GB of IO read/writes).
>>
>> Orion tool was used as a validation and set to run in a loop for 36
>> hours (equivalent of pushing 550M IO operations). No issues were
>> observed. That suggest that the patch fixes the issue.
> 
> Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
> Signed-off-by: Michal Wnukowski <wnukowski at google.com>
> Reviewed-by: Keith Busch <keith.busch at intel.com>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> [hch: updated changelog and comment a bit]
> Signed-off-by: Christoph Hellwig <hch at lst.de>
>>
>> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> (backported from commit 07e860202d187d35ad24f0100caa0d244d4df2af
> http://git.infradead.org/nvme.git)

The fix has landed upstream on v4.19-rc2 as
f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c. We can fix that when applying
the patch.

>> Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri at canonical.com>
> Acked-by: Stefan Bader <stefan.bader at canonical.com>
>> ---
> 
> Since this is at least a backport to some degree, this should preserve the
> original commit subject and message and only add bug reference and a new sob
> area like I tried to comment inline.
> 
>>  drivers/nvme/host/pci.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 5e5f065bf729..3752052ae20a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>>  	old_value = *db_addr;
>>  	*db_addr = value;
>>  
>> -	rmb();
>> +	/*
>> +	 * Ensure that the doorbell is updated before reading the event
>> +	 * index from memory.  The controller needs to provide similar
>> +	 * ordering to ensure the envent index is updated before reading
>> +	 * the doorbell.
>> +	 */
>> +	mb();
>>  	if (!nvme_ext_need_event(*event_idx, value, old_value))
>>  		goto no_doorbell;
>>  
>>
> 
> 
> 
> 





More information about the kernel-team mailing list