ACK: [SRU] [L/M/Unstable] [PATCH 0/2] Fix numerous AER related issues

Tim Gardner tim.gardner at canonical.com
Mon Aug 28 12:24:28 UTC 2023


On 8/25/23 2:19 AM, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/2033025
> 
> [Impact]
> Numerous issues triggered from AER/DPC services
> 
> - When AER is shared with PME, cutting the power off the device can
>    trigger AER IRQ. Since AER IRQ is shared with PME, it's treated like a
>    wakeup source, preventing the system from entering sleep.
> 
> - When system resume from S3, device can reset itself and start sending
>    PTM messages, triggering AER and reset the entire hierarchy. Since the
>    hardware/firmware starts before software, it's never soon enough to put
>    a band-aid from kernel.
> 
> - Following above one, device firmware restarts before kernel resume,
>    when DPC is triggered then the device is gone without any recovering
>    method. We really want to prevent that from happening.
> 
> [Fix]
> Disable and re-enable AER and DPC services on suspend and resume,
> respectively.  Right now the the PCI mailing list doesn't have a
> consensus which PCI state (D3hot vs D3cold) should the AER/DPC services
> should be disabled, so re-instate the old workaround for now.
> 
> [Test]
> One the workaround is applied, symptoms described above can no longer be
> observed.
> 
> [Where problems could occur]
> Theoretically there can be some "real" issues get unnoticed once AER
> gets temporarily disabled, but the benefit far outweighs the downside.
> 
> Kai-Heng Feng (2):
>    UBUNTU: SAUCE: PCI/AER: Disable AER service during suspend, again
>    UBUNTU: SAUCE: PCI/DPC: Disable DPC service during suspend, again
> 
>   drivers/pci/pcie/aer.c | 18 ++++++++++++++++
>   drivers/pci/pcie/dpc.c | 49 ++++++++++++++++++++++++++++++++----------
>   2 files changed, 56 insertions(+), 11 deletions(-)
> 
Acked-by: Tim Gardner <tim.gardner at canonical.com>
-- 
-----------
Tim Gardner
Canonical, Inc




More information about the kernel-team mailing list