[Acked/cmt] [SRU X][PATCH 0/3] Add kernel parameter 'pci=clearmsi' to clear MSI(X)s early on boot

Andy Whitcroft apw at canonical.com
Thu Nov 8 10:11:54 UTC 2018


On Wed, Nov 07, 2018 at 11:12:17PM -0200, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1797990
> 
> (note: the patch set for later releases will be sent soon)
> 
> [Impact]
> 
>  * A kexec/crash kernel might get stuck and fail to boot
>    (for crash kernel, kdump fails to collect a crashdump)
>    if a PCI device is buggy/stuck/looping and triggers a
>    continuous flood of MSI(X) interrupts (that the kernel
>    does not yet know about).
> 
>  * This fix allowed to obtain crashdumps when debugging a
>    heavy-load scenario, in which a (heavy-loaded) network
>    adapter wouldn't stop triggering MSI-X interrupts ever
>    after panic()->kdump kicked in.
> 
>  * This fix disables MSI(X) in all PCI devices on early
>    boot (this is OK as it's (re-)enabled normally later)
>    with a kernel cmdline parameter (disabled by default).
> 
> [Test Case]
> 
>  * A synthetic test-case is not yet available, however,
>    this particular system/workload triggered the problem
>    consistently, and it was used for development/testing.
> 
>  * We'll update this bug once a synthetic test-case is
>    available; we're working on patching QEMU for this.
> 
>  * $ dmesg | grep 'Clearing MSI'
>    [    0.000000] Clearing MSI/MSI-X enable bits early in boot (quirk)
> 
>  * The comparison of 'dmesg -t | sort' has been reviewed
>    between option disabled/enabled on boot & kexec modes,
>    and only expected differences found (MHz, PIDs, MIPS).
> 
> [Regression Potential] 
> 
>  * The potential area for regressions is early boot,
>    particularly effects of applying quirks during PCI
>    bus scan, which is changed/broader w/ these patches.
> 
>  * However, all quirks are applied based on PCI ID
>    matching, so would only apply if actually targeting
>    a new device.
> 
>  * Moreover, the new quirk is only applied based on
>    a kernel cmdline parameter that is disabled by
>    default, which constraints even more when this
>    is actually in effect.
> 
> [Other Info]
>  
>  * The patch series is still under review/discussion
>    upstream, but it's relatively important for Ubuntu
>    users at this point, and after internal discussions
>    we decided to submit it for SRU.
> 
>  * These are links to the linux-pci archive with the
>    patches [1, 2, 3]
> 
>    [1] [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks
>        https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@canonical.com/
> 
>    [2] [PATCH 2/3] x86/PCI: Export find_cap() to be used in early PCI code
>        https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@canonical.com/
> 
>    [3] [PATCH 3/3] x86/quirks: Add parameter to clear MSIs early on boot
>        https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@canonical.com/
> 
> Guilherme G. Piccoli (3):
>   UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks
>   UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
>   UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
> 
>  Documentation/kernel-parameters.txt |  6 ++++
>  arch/x86/include/asm/pci-direct.h   |  2 ++
>  arch/x86/kernel/aperture_64.c       | 30 ++-----------------
>  arch/x86/kernel/early-quirks.c      | 45 ++++++++++++++++++++++++-----
>  arch/x86/pci/common.c               |  4 +++
>  arch/x86/pci/early.c                | 25 ++++++++++++++++
>  6 files changed, 77 insertions(+), 35 deletions(-)
> 

As the new functionality is off by default this should be a low risk.
The only obvious remaining risk is that we scan all of the pci busses
where we did not necessarily do so before.  And as the patch itself
mentions this might have been done to avoid bugs in Nvidia chipsets.
Likely some nvidia based and specific testing is warrented before this
cycle releases.

Acked-by: Andy Whitcroft <apw at canonical.com>

-apw



More information about the kernel-team mailing list