ACK: [Bionic][PATCH 1/1] nvme-pci: Fix EEH failure on ppc

Colin Ian King colin.king at canonical.com
Thu Mar 15 16:00:46 UTC 2018


On 14/03/18 22:03, Joseph Salisbury wrote:
> From: Wen Xiong <wenxiong at linux.vnet.ibm.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1753371
> 
> Triggering PPC EEH detection and handling requires a memory mapped read
> failure. The NVMe driver removed the periodic health check MMIO, so
> there's no early detection mechanism to trigger the recovery. Instead,
> the detection now happens when the nvme driver handles an IO timeout
> event. This takes the pci channel offline, so we do not want the driver
> to proceed with escalating its own recovery efforts that may conflict
> with the EEH handler.
> 
> This patch ensures the driver will observe the channel was set to offline
> after a failed MMIO read and resets the IO timer so the EEH handler has
> a chance to recover the device.
> 
> Signed-off-by: Wen Xiong <wenxiong at linux.vnet.ibm.com>
> [updated change log]
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> 
> (cherry picked from commit 651438bb0af5213f1f70d66e75bf11d08cb5537a)
> Signed-off-by: Joseph Salisbury <joseph.salisbury at canonical.com>
> ---
>  drivers/nvme/host/pci.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4276ebf..3a0fcb7 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1148,12 +1148,6 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
>  	if (!(csts & NVME_CSTS_CFS) && !nssro)
>  		return false;
>  
> -	/* If PCI error recovery process is happening, we cannot reset or
> -	 * the recovery mechanism will surely fail.
> -	 */
> -	if (pci_channel_offline(to_pci_dev(dev->dev)))
> -		return false;
> -
>  	return true;
>  }
>  
> @@ -1184,6 +1178,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>  	struct nvme_command cmd;
>  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
>  
> +	/* If PCI error recovery process is happening, we cannot reset or
> +	 * the recovery mechanism will surely fail.
> +	 */
> +	mb();
> +	if (pci_channel_offline(to_pci_dev(dev->dev)))
> +		return BLK_EH_RESET_TIMER;
> +
>  	/*
>  	 * Reset immediately if the controller is failed
>  	 */
> 

Clean upstream cherry pick, looks good to me.

Acked-by: Colin Ian King <colin.king at canonical.com>





More information about the kernel-team mailing list