[SRU][F][PATCH 0/1] NULL pointer dereference when configuring multi-function with devfn != 0 before devfn == 0 (LP: 1903682)

frank.heimes at canonical.com frank.heimes at canonical.com
Thu Nov 12 09:44:28 UTC 2020


BugLink: https://bugs.launchpad.net/bugs/1903682

SRU Justification:

[Impact]

* While handling multifunction devices in zPCI the UID of the PCI function with function number 0 (that always exists according to the PCI spec) is taken as domain number.

* Therefore if hot plugging functions with a function number larger than 0 are used before function 0, these need to be held in standby before creating the domain and bus.

* This has been tested during development of this feature using a patched QEMU and in DPM, but unfortunately never in classic/traditional HMC mode.

* On a classic/traditional mode machine with a multi-function device, and hot plug ("Reassign I/O Path") of the FID of the second port of the LPAR, any additional hotplug (and even just deconfiguring a PCI device) will hang - and hotplug now makes the entire Linux instance unresponsive.

* The reason for this is a NULL pointer dereference - inc case configuring multi-function with devfn != 0 before devfn == 0.

* This issue was introduced with the topology-aware PCI enumeration code.

[Fix]

* 0b2ca2c7d0c9e2731d01b6c862375d44a7e13923 0b2ca2c7d0c9 "s390/pci: fix hot-plug of PCI function missing bus"

[Test Case]

* IBM Z or LinuxONE hardware, equipped with hot-pluggable, multi-functional PCIe cards (like for example RoCE Express 2 adapters) in classic/traditional mode.

* An Ubuntu OS running in LPAR, that comes with a kernel that includes the topology-aware PCI enumeration code (like for example 20.04.1 w/o further updates or 20.10 GA kernel).

* Now on a system that is in classic/traditional mode, hot plug ("Reassign I/O Path") a multi-function device, but using the FID of the second port.

[Regression Potential]

* There is at least some regression risk, but I consider it as low, because:

* Even is the modification is a single if statement (that spans two lines) in 'zpci_event_availability' it could harm the zPCI event management even more, in worst case it could break hot plug not only for systems in classic/traditional mode, but also in DPM mode (and making the system hang) or for all ports.

* In such a case no enabling / disabling of devices would be possible.

* But the fix is very simple and straight-forward, it checks zdev->zbus->bus for being NULL and in such a case break the function - means breaking instead of calling the PCI common code pci_scan_single_device() with the NULL pointer.

* PCIe devices are usually more optional devices on s390x (compared to CCW and OSA devices for network) and this affects the zPCI subsystem only, which is unique to s390x.

[Other]

* The patch got upstream accepted with kernel v5.10-rc3, hence it will land sooner or later in Hirsute.

* The patch has also been tagged for the upstream stable v5.8 series, hence will land in Groovy (based on kernel teams regular 'Groovy update: v5.8.x upstream stable release' LP bug).

* Hence requesting this Kernel SRU for Focal only, since Ubuntu releases older than Focal do not have the topology-aware zPCI enumeration code.

Niklas Schnelle (1):
  s390/pci: fix hot-plug of PCI function missing bus

 arch/s390/pci/pci_event.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.25.1




More information about the kernel-team mailing list