ACK: [SRU][F][PATCH 0/1] zPCI DMA tables and bitmap leak on hard unplug (PCI Event 0x0304) (LP: 1896216)
Colin Ian King
colin.king at canonical.com
Mon Sep 21 14:53:50 UTC 2020
On 21/09/2020 15:43, frank.heimes at canonical.com wrote:
> BugLink: https://bugs.launchpad.net/bugs/1896216
>
> SRU Justification:
>
> [Impact]
>
> * Since zpci_dma_exit_device() is never called on a zPCI device there is a potential leaking in DMA tables and bitmaps.
>
> * This is because commit "s390/pci: adapt events for zbus" removed the zpci_disable_device() call for a zPCI event with PEC 0x0304 (means on hot unplug).
>
> * It is only not called on hot unplug with event type PEC 0x0304 - this is the one where Linux is informed the device is
> gone instead of being asked to deconfigure it.
>
> * It should also always leak them with that event type on an enabled device.
>
> [Fix]
>
> * afdf9550e54627fcf4dd609bdc1153059378cdf5 afdf9550e546 "s390/pci: fix leak of DMA tables on hard unplug"
>
> [Test Case]
>
> * Have an IBM Z LPAR, that has PCIe devices (like RoCE adapters) assigned and Ubuntu Server 20.04 installed.
>
> * Disable and re-enable one (or more) of the assigned PCIe cards (using hotplug) - on LPAR this can be triggered using the 'Reassign I/O Path' function at the HMC/SE.
>
> * Monitor DMA tables and bitmaps for any kind of leaking.
>
> * Since these tables are vmalloc-ed memory, it's sufficient to monitor via /proc/meminfo and see that reassigning back and forth of a device will have the memory usage grow continuously.
>
> * The test and verification needs to be conducted by IBM.
>
> [Regression Potential]
>
> * There regression risk can be considered as moderate, because:
>
> * only a call of zpci_disable_device(zdev) got reintroduced (and some comment lines).
>
> * Since __zpci_event_availability gets modified, the zPCI event handling could be scrud up,
>
> * which could cause issues regarding the availability of zPCI devices
>
> * and in worst case make zPCI devices unusable.
>
> * But only one switch case of the function is modified and all cases break, so only PEC 0x0304 should be affected.
>
> * And the code changes themselves are minimal, and the zPCI code is limited to the s390x architecture.
>
> * On top test kernels were built and shared for further testing.
>
> [Other]
>
> * Since this commit needs to land in groovy too, but groovy is still in development (hence the SRU process does not apply for groovy yet), I've sent a separate Patch request for groovy.
>
> Niklas Schnelle (1):
> From: Niklas Schnelle <schnelle at linux.ibm.com>
>
> arch/s390/pci/pci.c | 4 ++++
> arch/s390/pci/pci_event.c | 2 ++
> 2 files changed, 6 insertions(+)
>
Same as my notes for the Groovy variant of this fix.
Acked-by: Colin Ian King <colin.king at canonical.com>
More information about the kernel-team
mailing list