[SRU X] [PATCH 0/1] NVMe polling on timeout

Fri Dec 21 13:59:42 UTC 2018

On 12/7/18 10:28 PM, Guilherme G. Piccoli wrote:
> BugLink: https://launchpad.net/bugs/1807393
>
>
> [Impact]
>
> * NVMe controllers potentially could miss to send an interrupt, specially
> due to bugs in virtual devices(which are common those days - qemu has its
> own NVMe virtual device, so does AWS). This would be a difficult to
> debug situation, because NVMe driver only reports the request timeout,
> not the reason.
>
> * The upstream patch proposed to SRU here was designed to provide more
> information in these cases, by pro-actively polling the CQEs on request
> timeouts, to check if the specific request was completed and some issue
> (probably a missed interrupt) prevented the driver to notice, or if the
> request really wasn't completed, which indicates more severe issues.
>
> * Although quite useful for debugging, this patch could help to mitigate
> issues in cloud environments like AWS, in case we may have jitter in
> request completion and the i/o timeout was set to low values, or even
> in case of atypical bugs in the virtual NVMe controller. With this patch,
> if polling succeeds the NVMe driver will continue working instead of
> trying a reset controller procedure, which may lead to fails in the
> rootfs - refer to https://launchpad.net/bugs/1788035.
>
>
> [Test Case]
>
> * It's a bit tricky to artificially create a situation of missed
> interrupt; one idea was to implement a small hack in the NVMe qemu
> virtual device that given a trigger in guest kernel, will induce the
> virtual device to skip an interrupt. The hack patch is present in a
> Launchpad comment, along with instructions to reproduce.
>
>
> [Regression Potential]
>
> * There are no clear risks in adding such polling mechanism to the NVMe driver;
> one bad thing that was neverreported but could happen with this patch is the
> device could be in a bad state IRQ-wise that a reset would fix, but
> the patch could cause all requests to be completed via polling, which
> prevents the adapter reset. This is however a very hypothetical situation,
> which would also happen in the mainline kernel (since it has the patch).

Hi Guilherme,

Have we run any I/O stressing test to make sure it didn't introduce at
least a easily triggered regression?

Thanks,

Kleber

>
>
> Keith Busch (1):
>   nvme/pci: Poll CQ on timeout
>
>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>