[SRU][G][PATCH 0/1] NULL pointer dereference when configuring multi-function with devfn != 0 before devfn == 0 (LP: 1903682)

frank.heimes at canonical.com frank.heimes at canonical.com
Wed Dec 2 16:48:09 UTC 2020


BugLink: https://bugs.launchpad.net/bugs/1903682

SRU Justification:

[Impact]

* While handling multifunction devices in zPCI the UID of the PCI function with function number 0 (that always exists according to the PCI spec) is taken as domain number.

* Therefore if hot plugging functions with a function number larger than 0 are used before function 0, these need to be held in standby before creating the domain and bus.

* This has been tested during development of this feature using a patched QEMU and in DPM, but unfortunately never in classic/traditional HMC mode.

* On a classic/traditional mode machine with a multi-function device, and hot plug ("Reassign I/O Path") of the FID of the second port of the LPAR, any additional hotplug (and even just deconfiguring a PCI device) will hang - and hotplug now makes the entire Linux instance unresponsive.

* The reason for this is a NULL pointer dereference - inc case configuring multi-function with devfn != 0 before devfn == 0.

* This issue was introduced with the topology-aware PCI enumeration code.

[Fix]

* 0b2ca2c7d0c9e2731d01b6c862375d44a7e13923 0b2ca2c7d0c9 "s390/pci: fix hot-plug of PCI function missing bus"

[Test Case]

* IBM Z or LinuxONE hardware, equipped with hot-pluggable, multi-functional PCIe cards (like for example RoCE Express 2 adapters) in classic/traditional mode.

* An Ubuntu OS running in LPAR, that comes with a kernel that includes the topology-aware PCI enumeration code (like for example 20.04.1 w/o further updates or 20.10 GA kernel).

* Now on a system that is in classic/traditional mode, hot plug ("Reassign I/O Path") a multi-function device, but using the FID of the second port.

[Regression Potential]

* There is at least some regression risk, but I consider it as low, because:

* Even is the modification is a single if statement (that spans two lines) in 'zpci_event_availability' it could harm the zPCI event management even more, in worst case it could break hot plug not only for systems in classic/traditional mode, but also in DPM mode (and making the system hang) or for all ports.

* In such a case no enabling / disabling of devices would be possible.

* But the fix is very simple and straight-forward, it checks zdev->zbus->bus for being NULL and in such a case break the function - means breaking instead of calling the PCI common code pci_scan_single_device() with the NULL pointer.

* PCIe devices are usually more optional devices on s390x (compared to CCW and OSA devices for network) and this affects the zPCI subsystem only, which is unique to s390x.

[Other]

* The patch got upstream accepted with kernel v5.10-rc3, hence it will land sooner or later in Hirsute.

* It was initially planned to address groovy via 5.8 upstream stable update, and in fact the patch was already marked for this, but it didn't made it because 5.8 reached it's EOL already. 

* Hence in addition to the already submitted SRU for focal, this is now a separate SRU for groovy.

Niklas Schnelle (1):
  s390/pci: fix hot-plug of PCI function missing bus

 arch/s390/pci/pci_event.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.25.1




More information about the kernel-team mailing list