APPLIED: [SRU][Bionic][PATCH 1/1] nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event

Khaled Elmously khalid.elmously at canonical.com
Thu Aug 23 05:21:55 UTC 2018


..to bionic/master-next - also updated the buglink as Kleber pointed out.

This patch will need to be backported for xenial as it cannot be cherry-picked - that will have to be done as a follow-up.

On 2018-08-17 14:27:53 , Khalid Elmously wrote:
> From: Michal Wnukowski <wnukowski at google.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1787635
> 
> In many architectures loads may be reordered with older stores to
> different locations.  In the nvme driver the following two operations
> could be reordered:
> 
>  - Write shadow doorbell (dbbuf_db) into memory.
>  - Read EventIdx (dbbuf_ei) from memory.
> 
> This can result in a potential race condition between driver and VM host
> processing requests (if given virtual NVMe controller has a support for
> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> wait for MMIO doorbell from guest operating system, and guest driver may
> decide not to issue MMIO doorbell on any of subsequent commands.
> 
> This issue is purely timing-dependent one, so there is no easy way to
> reproduce it. Currently the easiest known approach is to run "Oracle IO
> Numbers" (orion) that is shipped with Oracle DB:
> 
> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> 	concat -write 40 -duration 120 -matrix row -testname nvme_test
> 
> Where nvme_test is a .lun file that contains a list of NVMe block
> devices to run test against. Limiting number of vCPUs assigned to given
> VM instance seems to increase chances for this bug to occur. On test
> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> virtual NVMe controller hang could be observed within 10-20 minutes.
> That correspond to about 400-500k IO operations processed (or about
> 100GB of IO read/writes).
> 
> Orion tool was used as a validation and set to run in a loop for 36
> hours (equivalent of pushing 550M IO operations). No issues were
> observed. That suggest that the patch fixes the issue.
> 
> Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
> Signed-off-by: Michal Wnukowski <wnukowski at google.com>
> Reviewed-by: Keith Busch <keith.busch at intel.com>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> [hch: updated changelog and comment a bit]
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> (cherry-picked from 07e860202d187d35ad24f0100caa0d244d4df2af git://git.infradead.org/nvme.git )
> Signed-off-by: Khalid Elmously <khalid.elmously at canonical.com>
> ---
>  drivers/nvme/host/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9bd6ab79b851..af1773d5332e 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -305,6 +305,14 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
>  		old_value = *dbbuf_db;
>  		*dbbuf_db = value;
>  
> +		/*
> +		 * Ensure that the doorbell is updated before reading the event
> +		 * index from memory.  The controller needs to provide similar
> +		 * ordering to ensure the envent index is updated before reading
> +		 * the doorbell.
> +		 */
> +		mb();
> +
>  		if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value))
>  			return false;
>  	}
> -- 
> 2.17.1
> 




More information about the kernel-team mailing list