NACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell
Marcelo Henrique Cerri
marcelo.cerri at canonical.com
Thu Sep 13 14:02:04 UTC 2018
On Wed, Sep 12, 2018 at 10:26:29AM +0200, Stefan Bader wrote:
> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1788222
> --- cut --
> > That's a custom backport for Xenial of the commit 07e860202d18
> > ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> > from git://git.infradead.org/nvme.git).
> > Original description:
> -- cut end --
> > In many architectures loads may be reordered with older stores to
> > different locations. In the nvme driver the following two operations
> > could be reordered:
> > - Write shadow doorbell (dbbuf_db) into memory.
> > - Read EventIdx (dbbuf_ei) from memory.
> > This can result in a potential race condition between driver and VM host
> > processing requests (if given virtual NVMe controller has a support for
> > shadow doorbell). If that occurs, then the NVMe controller may decide to
> > wait for MMIO doorbell from guest operating system, and guest driver may
> > decide not to issue MMIO doorbell on any of subsequent commands.
> > This issue is purely timing-dependent one, so there is no easy way to
> > reproduce it. Currently the easiest known approach is to run "Oracle IO
> > Numbers" (orion) that is shipped with Oracle DB:
> > orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> > concat -write 40 -duration 120 -matrix row -testname nvme_test
> > Where nvme_test is a .lun file that contains a list of NVMe block
> > devices to run test against. Limiting number of vCPUs assigned to given
> > VM instance seems to increase chances for this bug to occur. On test
> > environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> > virtual NVMe controller hang could be observed within 10-20 minutes.
> > That correspond to about 400-500k IO operations processed (or about
> > 100GB of IO read/writes).
> > Orion tool was used as a validation and set to run in a loop for 36
> > hours (equivalent of pushing 550M IO operations). No issues were
> > observed. That suggest that the patch fixes the issue.
> > Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> (backported from commit ??? git://git.infradead.org/nvme.git)
> [mhcerry: <rough description of what had to be changed>]
I can add these lines describing the origin and changes.
> > Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri at canonical.com>
> > ---
> I cannot find the commit you refer to via the git web interface. Also, even
> reading through the comments, I am not sure this really is a backport of some
> upstream patch (in which case I would expect the commit message to be the same
> as the source of the backport.
What commit couldn't you find? That fix was crafted by me because
xenial carries a version of the original patchset that is completely
different from upstream.
I didn't keep the original title because it refers to the upstream
function that doesn't exist in Xenial.
> But the way it is now, I have no clue where this comes from and what was done to
> it before submission.
> > drivers/nvme/host/pci.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 5e5f065bf729..3752052ae20a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
> > old_value = *db_addr;
> > *db_addr = value;
> > - rmb();
> > + /*
> > + * Ensure that the doorbell is updated before reading the event
> > + * index from memory. The controller needs to provide similar
> > + * ordering to ensure the envent index is updated before reading
> > + * the doorbell.
> > + */
> > + mb();
> > if (!nvme_ext_need_event(*event_idx, value, old_value))
> > goto no_doorbell;
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 488 bytes
Desc: not available
More information about the kernel-team