ACK: [PATCH 0/2][SRU][E][F] Fix logic flaw in PCIe error recovery

Andrea Righi andrea.righi at canonical.com
Mon Apr 20 08:21:52 UTC 2020


On Fri, Apr 17, 2020 at 05:30:10PM -0700, Jay Vosburgh wrote:
> 
> SRU Justification
> 
> Impact:
> 
> During PCI Express Downstream Port Containment (DPC) recovery,
> certain types of failures do not recover due to a logic flaw
> in pcie_do_recovery().
> 
> The upstream git commit log explains the change:
> 
> PCI/ERR: Update error status after reset_link()
> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
> reset_link() to recover from fatal errors.  But during fatal error
> recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT
> or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using
> reset_link()) pcie_do_recovery() will report the recovery result as
> failure.  Update the status of error after reset_link().
> 
> You can reproduce this issue by triggering a SW DPC using "DPC Software
> Trigger" bit in "DPC Control Register".  You should see recovery failed
> dmesg log as below:
> 
>   pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000
>   pcieport 0000:00:16.0: DPC: software trigger detected
>   pci 0000:04:00.0: AER: can't recover (no error_detected callback)
>   pcieport 0000:00:16.0: AER: device recovery failed
> 
> Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
> Link: https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
> [bhelgaas: split pci_channel_io_frozen simplification to separate patch]
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy at linux.intel.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
> Acked-by: Keith Busch <keith.busch at intel.com>
> Cc: Ashok Raj <ashok.raj at intel.com>
> 
> Note that a second prerequisite patch is necessary as well.  This patch,
> 
> commit b5dfbeacf74865a8d62a4f70f501cdc61510f8e0
> Author: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy at linux.intel.com>
> Date:   Fri Mar 27 17:33:24 2020 -0500
> 
>     PCI/ERR: Combine pci_channel_io_frozen cases
> 
> is a code readability change, and makes no functional changes.
> 
> 
> Testcase:
> 
> On a system with DPC enabled, setpci may be used to set the DPC Software
> Trigger bit (bit 6, value 0x40) in the DPC Control register of a suitable
> PCIe device (a PCIe bridge, for example).
> 
> On a system lacking the fix, the output will be as shown above (i.e.,
> culminating in the "device recovery failed" message).  With the fix
> applied, the device successfully recovers, resulting in a message of the
> form
> 
> pcieport 0000:d9:01.0: AER: Device recovery successful
> 
> 
> Regression Potential:
> 
> The risk of regression is low, as (a) the path in question currently does
> not work, and (b) the changes are minimal, comprising only a housekeeping
> change and the logically correct updating of a status variable that did
> not previously occur.

Looks good to me.

Acked-by: Andrea Righi <andrea.righi at canonical.com>



More information about the kernel-team mailing list