ACK: [SRU][N/O][PATCH 0/1] PCI: Batch BAR sizing operations
Jacob Martin
jacob.martin at canonical.com
Wed Feb 5 23:48:26 UTC 2025
On 2/5/25 9:52 AM, Mitchell Augustin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2097389
>
> SRU Justification:
>
> [ Impact ]
>
> VM guests that have large-BAR GPUs passed through to them
> will take 2x as long to initialize those devices' BARs without
> this patch
>
> [ Test Plan ]
>
> I verified that this patch applies cleanly to the Noble kernel
> at 6.8.0-53.55
> and resolves the bug on DGX H100 and DGX A100. I observed no
> regressions. This can be verified on any machine with a GPU w/ a
> sufficiently large BAR and the capability to pass through
> to a VM using vfio.
>
> ppa:mitchellaugustin/linux-generic-pci-redundancy-fix contains
> the noble-generic kernel with this patch applied and can be
> used to validate this patch.
>
> To verify no regressions, I installed the kernel in that PPA
> to the guest VM, then rebooted and confirmed that:
> 1. The measured PCI initialization time on boot was ~50% of the
> unmodified kernel
> 2. Relevant parts of /proc/iomem mappings, the PCI init section
> of dmesg output, and lspci -vv output remained unchanged between
> the system with the unmodified kernel and with the patched kernel
> 3. The Nvidia driver still successfully loaded and was shown via
> nvidia-smi after the patch was applied
>
> [ Fix ]
>
> Roughly half of the time consuming device configuration options
> invoked during the PCI probe function can be eliminated by
> rearranging the memory and I/O disable/enable calls such that
> they only occur per-device rather than per-BAR. This is what the
> upstream patch does, and it results in roughly half the excess
> initialization time being eliminated reliably during VM boot.
>
> [ Where problems could occur ]
>
> I do not expect any regressions. The only callers of ABIs changed
> by this patch are also adjusted within this patch, and the functional
> change only removes entirely redundant calls to disable/enable PCI
> memory/IO. With that said, the main altered function is the PCI
> probe function, which is highly used across Ubuntu deployments, so
> we should pay attention to any user reports regarding PCI device
> initialization just in case they might be related.
>
> [ Additional Context ]
>
> Upstream patch: https://lore.kernel.org/all/20250111210652.402845-1-alex.williamson@redhat.com/
> Upstream bug report: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com/
>
>
>
> Alex Williamson (1):
> PCI: Batch BAR sizing operations
>
> drivers/pci/iov.c | 8 +++-
> drivers/pci/pci.h | 4 +-
> drivers/pci/probe.c | 93 +++++++++++++++++++++++++++++++++------------
> 3 files changed, 78 insertions(+), 27 deletions(-)
>
Acked-by: Jacob Martin <jacob.martin at canonical.com>
More information about the kernel-team
mailing list