ACK: [SRU][N/O][PATCH 0/1] PCI: Batch BAR sizing operations

Jacob Martin jacob.martin at canonical.com
Wed Feb 5 23:48:26 UTC 2025


On 2/5/25 9:52 AM, Mitchell Augustin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2097389
> 
> SRU Justification:
> 
> [ Impact ]
> 
> VM guests that have large-BAR GPUs passed through to them
> will take 2x as long to initialize those devices' BARs without
> this patch
> 
> [ Test Plan ]
> 
> I verified that this patch applies cleanly to the Noble kernel
> at 6.8.0-53.55
> and resolves the bug on DGX H100 and DGX A100. I observed no
> regressions. This can be verified on any machine with a GPU w/ a
> sufficiently large BAR and the capability to pass through
> to a VM using vfio.
> 
> ppa:mitchellaugustin/linux-generic-pci-redundancy-fix contains
> the noble-generic kernel with this patch applied and can be
> used to validate this patch.
> 
> To verify no regressions, I installed the kernel in that PPA
> to the guest VM, then rebooted and confirmed that:
> 1. The measured PCI initialization time on boot was ~50% of the
> unmodified kernel
> 2. Relevant parts of /proc/iomem mappings, the PCI init section
> of dmesg output, and lspci -vv output remained unchanged between
> the system with the unmodified kernel and with the patched kernel
> 3. The Nvidia driver still successfully loaded and was shown via
> nvidia-smi after the patch was applied
> 
> [ Fix ]
> 
> Roughly half of the time consuming device configuration options
> invoked during the PCI probe function can be eliminated by
> rearranging the memory and I/O disable/enable calls such that
> they only occur per-device rather than per-BAR. This is what the
> upstream patch does, and it results in roughly half the excess
> initialization time being eliminated reliably during VM boot.
> 
> [ Where problems could occur ]
> 
> I do not expect any regressions. The only callers of ABIs changed
> by this patch are also adjusted within this patch, and the functional
> change only removes entirely redundant calls to disable/enable PCI
> memory/IO. With that said, the main altered function is the PCI
> probe function, which is highly used across Ubuntu deployments, so
> we should pay attention to any user reports regarding PCI device
> initialization just in case they might be related.
> 
> [ Additional Context ]
> 
> Upstream patch: https://lore.kernel.org/all/20250111210652.402845-1-alex.williamson@redhat.com/
> Upstream bug report: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com/
> 
> 
> 
> Alex Williamson (1):
>    PCI: Batch BAR sizing operations
> 
>   drivers/pci/iov.c   |  8 +++-
>   drivers/pci/pci.h   |  4 +-
>   drivers/pci/probe.c | 93 +++++++++++++++++++++++++++++++++------------
>   3 files changed, 78 insertions(+), 27 deletions(-)
> 

Acked-by: Jacob Martin <jacob.martin at canonical.com>




More information about the kernel-team mailing list