[SRU] [L/M/Unstable] [PATCH 0/2] Fix numerous AER related issues
Kai-Heng Feng
kai.heng.feng at canonical.com
Fri Aug 25 08:19:43 UTC 2023
BugLink: https://bugs.launchpad.net/bugs/2033025
[Impact]
Numerous issues triggered from AER/DPC services
- When AER is shared with PME, cutting the power off the device can
trigger AER IRQ. Since AER IRQ is shared with PME, it's treated like a
wakeup source, preventing the system from entering sleep.
- When system resume from S3, device can reset itself and start sending
PTM messages, triggering AER and reset the entire hierarchy. Since the
hardware/firmware starts before software, it's never soon enough to put
a band-aid from kernel.
- Following above one, device firmware restarts before kernel resume,
when DPC is triggered then the device is gone without any recovering
method. We really want to prevent that from happening.
[Fix]
Disable and re-enable AER and DPC services on suspend and resume,
respectively. Right now the the PCI mailing list doesn't have a
consensus which PCI state (D3hot vs D3cold) should the AER/DPC services
should be disabled, so re-instate the old workaround for now.
[Test]
One the workaround is applied, symptoms described above can no longer be
observed.
[Where problems could occur]
Theoretically there can be some "real" issues get unnoticed once AER
gets temporarily disabled, but the benefit far outweighs the downside.
Kai-Heng Feng (2):
UBUNTU: SAUCE: PCI/AER: Disable AER service during suspend, again
UBUNTU: SAUCE: PCI/DPC: Disable DPC service during suspend, again
drivers/pci/pcie/aer.c | 18 ++++++++++++++++
drivers/pci/pcie/dpc.c | 49 ++++++++++++++++++++++++++++++++----------
2 files changed, 56 insertions(+), 11 deletions(-)
--
2.34.1
More information about the kernel-team
mailing list