APPLIED[M]: [SRU] [L/M/Unstable] [PATCH 0/2] Fix numerous AER related issues

Andrea Righi andrea.righi at canonical.com
Mon Aug 28 06:45:38 UTC 2023


On Fri, Aug 25, 2023 at 04:19:43PM +0800, Kai-Heng Feng wrote:
> BugLink: https://bugs.launchpad.net/bugs/2033025
> 
> [Impact]
> Numerous issues triggered from AER/DPC services
> 
> - When AER is shared with PME, cutting the power off the device can
>   trigger AER IRQ. Since AER IRQ is shared with PME, it's treated like a
>   wakeup source, preventing the system from entering sleep.
> 
> - When system resume from S3, device can reset itself and start sending
>   PTM messages, triggering AER and reset the entire hierarchy. Since the
>   hardware/firmware starts before software, it's never soon enough to put
>   a band-aid from kernel.
> 
> - Following above one, device firmware restarts before kernel resume,
>   when DPC is triggered then the device is gone without any recovering
>   method. We really want to prevent that from happening.
> 
> [Fix]
> Disable and re-enable AER and DPC services on suspend and resume,
> respectively.  Right now the the PCI mailing list doesn't have a
> consensus which PCI state (D3hot vs D3cold) should the AER/DPC services
> should be disabled, so re-instate the old workaround for now.
> 
> [Test]
> One the workaround is applied, symptoms described above can no longer be
> observed.
> 
> [Where problems could occur]
> Theoretically there can be some "real" issues get unnoticed once AER
> gets temporarily disabled, but the benefit far outweighs the downside.

Applied to mantic/linux, thanks!

-Andrea



More information about the kernel-team mailing list