APPLIED[L]: [SRU] [L/M/Unstable] [PATCH 0/2] Fix numerous AER related issues

Roxana Nicolescu roxana.nicolescu at canonical.com
Fri Sep 1 09:21:13 UTC 2023


On 25/08/2023 10:19, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/2033025
>
> [Impact]
> Numerous issues triggered from AER/DPC services
>
> - When AER is shared with PME, cutting the power off the device can
>    trigger AER IRQ. Since AER IRQ is shared with PME, it's treated like a
>    wakeup source, preventing the system from entering sleep.
>
> - When system resume from S3, device can reset itself and start sending
>    PTM messages, triggering AER and reset the entire hierarchy. Since the
>    hardware/firmware starts before software, it's never soon enough to put
>    a band-aid from kernel.
>
> - Following above one, device firmware restarts before kernel resume,
>    when DPC is triggered then the device is gone without any recovering
>    method. We really want to prevent that from happening.
>
> [Fix]
> Disable and re-enable AER and DPC services on suspend and resume,
> respectively.  Right now the the PCI mailing list doesn't have a
> consensus which PCI state (D3hot vs D3cold) should the AER/DPC services
> should be disabled, so re-instate the old workaround for now.
>
> [Test]
> One the workaround is applied, symptoms described above can no longer be
> observed.
>
> [Where problems could occur]
> Theoretically there can be some "real" issues get unnoticed once AER
> gets temporarily disabled, but the benefit far outweighs the downside.
>
> Kai-Heng Feng (2):
>    UBUNTU: SAUCE: PCI/AER: Disable AER service during suspend, again
>    UBUNTU: SAUCE: PCI/DPC: Disable DPC service during suspend, again
>
>   drivers/pci/pcie/aer.c | 18 ++++++++++++++++
>   drivers/pci/pcie/dpc.c | 49 ++++++++++++++++++++++++++++++++----------
>   2 files changed, 56 insertions(+), 11 deletions(-)
>
Applied to lunar:master-next. Thanks!

Roxana



More information about the kernel-team mailing list