<div dir="ltr"><div>After reaching out to the initial bug reporter and condensing the (probably too detailed and too long) description, I think this is a brief summary:</div><div>I hope that this better fit's to the Impact section of the SRU Justification - I also updated the justification in the LP bug description.</div><div><br></div>[Impact]<br><br>* Mellanox CX5 port multi-pathing is broken on s390x due to non-standard topology of PCI IDs (phys. and virtual):<br><br>* The Mellanox Connect-X 5 PCI driver (mlx5) implements multi-path that can be used to combine multiple networking ports to improve performance and reliability.<br><br>* For that purpose, the mlx5 driver combines PCI functions based on topology information (the function number) as determined by their PCI ID.<br><br>* Currently the Linux on Z PCI bus does not reflect PCI topology information in the PCI ID. As a result, the mlx5 multi-path function is broken and cannot be activated.<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><br></div><div><br></div></div></div></div></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 19, 2020 at 10:04 AM Stefan Bader <<a href="mailto:stefan.bader@canonical.com">stefan.bader@canonical.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 18.05.20 20:24, <a href="mailto:frank.heimes@canonical.com" target="_blank">frank.heimes@canonical.com</a> wrote:<br>
> Buglink: <a href="https://bugs.launchpad.net/bugs/1874056" rel="noreferrer" target="_blank">https://bugs.launchpad.net/bugs/1874056</a><br>
> <br>
> SRU Justification:<br>
> <br>
> [Impact]<br>
<br>
Somehow the impact section should give a quick overview about the issue that one<br>
attempts to fix. Right now it sounds more like a nice to have which would not<br>
really be a reason to adapt code that much.<br>
<br>
Maybe if it were preventing to reliably pass virtual functions of physical<br>
adapters into VM guests. And that was part of the release and now is found not<br>
to work as promised...<br>
<br>
And if say that gets said in simple words in the bug reports justification,<br>
maybe this would raise enough interest in even looking at the patches...<br>
<br>
-Stefan<br>
<br>
> <br>
> * On s390x the enumeration of PCI functions does not reflect which functions belongs to which physical adapter.<br>
> <br>
> * Layout of a PCI function address on Linux:<br>
> 0000:00:00.0<br>
> <root complex>:<bus>:<device>.<function><br>
> <br>
> * On s390x, each function is presented as individual root complex today, e.g.:<br>
> PCHID 0100 VF1 0000:00:00.0<br>
> PCHID 0100 VF23 0001:00:00.0<br>
> PCHID 0200 VF1 0002:00:00.0<br>
> OCHID 0100 VF17 0003:00:00.0<br>
> <br>
> * On other platforms, the addresses correctly reflect the actual HW configuration.<br>
> <br>
> * Some device drivers (like mlx5, for Mellanox adapters) group functions of one physical adapter by checking which PCI functions have identical values for <root complex>:<bus>:<device>.<br>
> <br>
> * We need to use the same enumeration scheme to achieve this functionality on s390x.<br>
> <br>
> * In this case, the two physical functions of a Mellanox adapter need to get function number 0 and 1,<br>
> and all virtual functions need to use the same <root complex>:<bus> numbers with function/device numbers counting up.<br>
> <br>
> * Required result (example with 4 VFs per PF):<br>
> PCHID 0100 PF 0 0000:00:00.0<br>
> PCHID 0100 PF 1 0000:00:00.1<br>
> PCHID 0100 PF 0 VF 0 0000:00:00.2<br>
> PCHID 0100 PF 0 VF 1 0000:00:00.3<br>
> PCHID 0100 PF 0 VF 2 0000:00:00.4<br>
> PCHID 0100 PF 0 VF 3 0000:00:00.5<br>
> PCHID 0100 PF 1 VF 0 0000:00:00.6<br>
> PCHID 0100 PF 1 VF 1 0000:00:00.7<br>
> PCHID 0100 PF 1 VF 2 0000:00:00.8<br>
> PCHID 0100 PF 1 VF 3 0000:00:00.9<br>
> PCHID 0200 PF 0 0001:00:00.0<br>
> <br>
> [Fix]<br>
> <br>
> * Backport 1: <a href="https://launchpadlibrarian.net/479699471/0001-s390-pci-Improve-handling-of-unset-UID.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479699471/0001-s390-pci-Improve-handling-of-unset-UID.patch</a><br>
> <br>
> * Backport 2: <a href="https://launchpadlibrarian.net/479699482/0002-s390-pci-embedding-hotplug_slot-in-zdev.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479699482/0002-s390-pci-embedding-hotplug_slot-in-zdev.patch</a><br>
> <br>
> * Backport 3: <a href="https://launchpadlibrarian.net/479699492/0003-s390-pci-Expose-new-port-attribute-for-PCIe-function.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479699492/0003-s390-pci-Expose-new-port-attribute-for-PCIe-function.patch</a><br>
> <br>
> * Backport 4: <a href="https://launchpadlibrarian.net/479699497/0004-s390-pci-adaptation-of-iommu-to-multifunction.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479699497/0004-s390-pci-adaptation-of-iommu-to-multifunction.patch</a><br>
> <br>
> * Backport 5: <a href="https://launchpadlibrarian.net/479700706/0005-s390-pci-define-kernel-parameters-for-PCI-multifunct.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700706/0005-s390-pci-define-kernel-parameters-for-PCI-multifunct.patch</a><br>
> <br>
> * Backport 6: <a href="https://launchpadlibrarian.net/479700712/0006-s390-pci-define-RID-and-RID-available.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700712/0006-s390-pci-define-RID-and-RID-available.patch</a><br>
> <br>
> * Backport 7: <a href="https://launchpadlibrarian.net/479700739/0007-s390-pci-create-zPCI-bus.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700739/0007-s390-pci-create-zPCI-bus.patch</a><br>
> <br>
> * Backport 8: <a href="https://launchpadlibrarian.net/479700769/0008-s390-pci-adapt-events-for-zbus.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700769/0008-s390-pci-adapt-events-for-zbus.patch</a><br>
> <br>
> * Backport 9: <a href="https://launchpadlibrarian.net/479700786/0009-s390-pci-Handling-multifunctions.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700786/0009-s390-pci-Handling-multifunctions.patch</a><br>
> <br>
> * Backport 10: <a href="https://launchpadlibrarian.net/479700794/0010-s390-pci-Do-not-disable-PF-when-VFs-exist.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700794/0010-s390-pci-Do-not-disable-PF-when-VFs-exist.patch</a><br>
> <br>
> * Backport 11: <a href="https://launchpadlibrarian.net/479700798/0011-s390-pci-Documentation-for-zPCI.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700798/0011-s390-pci-Documentation-for-zPCI.patch</a><br>
> <br>
> * Backport 12: <a href="https://launchpadlibrarian.net/479700799/0012-s390-pci-removes-wrong-PCI-multifunction-assignment.patch" rel="noreferrer" target="_blank">https://launchpadlibrarian.net/479700799/0012-s390-pci-removes-wrong-PCI-multifunction-assignment.patch</a><br>
> <br>
> [Test Case]<br>
> <br>
> * Prepare an IBM z13 or LinuxONE III (or newer) system with two or more RoCE Express PCI 2(.1) adapters.<br>
> <br>
> * Assign the adapters (and it's virtual functions) to an LPAR.<br>
> <br>
> * Verify whether the physical and virtual functions are grouped in arbitrary order or in consecutive order - physical first (for example with lspci -t ...)<br>
> <br>
> [Regression Potential] <br>
> <br>
> * The regression potential can be considered as moderate, since:<br>
> <br>
> * It is purely s390x specific code (arch/s390/* drivers/iommu/s390-iommu.c and drivers/pci/hotplug/s390_pci_hpc.c - and some doc adjustments, too).<br>
> <br>
> * It largely affects zPCI, the s390x specific PCI code layer.<br>
> <br>
> * PCI cards available for s390x are optional cards (RoCE and zEDC) and not very wide-spread.<br>
> <br>
> * The situation described above affects the RoCE adapters only (Mellanox based).<br>
> <br>
> * The patches are also upstream accepted and available via linux-next, but to apply them to focal kernel 5.4 the above backports are needed.<br>
> <br>
> * However, the code is modified by several patches (12), hence there is a chance to break zPCI with them.<br>
> <br>
> * For upfront testing a PPA got created with a focal (master-next) kernel that incl. all the above patches.<br>
> <br>
> Alexander Schmidt (1):<br>
> s390/pci: Expose new port attribute for PCIe functions<br>
> <br>
> Niklas Schnelle (1):<br>
> s390/pci: Improve handling of unset UID<br>
> <br>
> Pierre Morel (10):<br>
> s390/pci: embedding hotplug_slot in zdev<br>
> s390/pci: adaptation of iommu to multifunction<br>
> s390/pci: define kernel parameters for PCI multifunction<br>
> s390/pci: define RID and RID available<br>
> s390/pci: create zPCI bus<br>
> s390/pci: adapt events for zbus<br>
> s390/pci: Handling multifunctions<br>
> s390/pci: Do not disable PF when VFs exist<br>
> s390/pci: Documentation for zPCI<br>
> s390/pci: removes wrong PCI multifunction assignment<br>
> <br>
> .../admin-guide/kernel-parameters.txt | 2 +<br>
> Documentation/s390/index.rst | 1 +<br>
> Documentation/s390/pci.rst | 126 +++++++++<br>
> MAINTAINERS | 1 +<br>
> arch/s390/include/asm/pci.h | 45 +++-<br>
> arch/s390/include/asm/pci_clp.h | 12 +-<br>
> arch/s390/pci/Makefile | 3 +-<br>
> arch/s390/pci/pci.c | 198 ++++++--------<br>
> arch/s390/pci/pci_bus.c | 255 ++++++++++++++++++<br>
> arch/s390/pci/pci_bus.h | 31 +++<br>
> arch/s390/pci/pci_clp.c | 6 +-<br>
> arch/s390/pci/pci_event.c | 39 ++-<br>
> arch/s390/pci/pci_sysfs.c | 4 +-<br>
> drivers/iommu/s390-iommu.c | 8 +-<br>
> drivers/pci/hotplug/s390_pci_hpc.c | 105 +++-----<br>
> 15 files changed, 618 insertions(+), 218 deletions(-)<br>
> create mode 100644 Documentation/s390/pci.rst<br>
> create mode 100644 arch/s390/pci/pci_bus.c<br>
> create mode 100644 arch/s390/pci/pci_bus.h<br>
> <br>
<br>
<br>
</blockquote></div>