ACK: [SRU][P][PATCH 0/6] vfio: Improve DMA mapping performance for huge pfnmaps
Edoardo Canepa
edoardo.canepa at canonical.com
Tue Jun 10 07:20:06 UTC 2025
On 29/05/25 00:10, Mitchell Augustin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2111861
>
> SRU Justification:
>
> [ Impact ]
>
> Due to an inefficiency in the way older host kernels manage pfnmaps for guest VM memory ranges[1], guests with large-BAR GPUs passed through have a very long (multiple minutes) initialization time when the MMIO window advertised by OVMF is sufficiently sized for the passed-through BARs (i.e., the correct OVMF behavior).
>
> We have already integrated a partial efficiency improvement [2] which is transparent to the user in 6.8+ kernels, as well as an OVMF-based approach to allow the user to force Jammy-like, faster boot speeds via fw_ctl [3], but the approach in the patch series outlined in this report is the full fix for the underlying cause of the issue on kernels that have support for huge pfnmaps.
>
> With this series [0] applied to both the host and guest of an impacted system, BAR initialization times are reduced substantially: In the commonly achieved optimal case, this results in a reduction of pfn lookups by a factor of 256k. For a local test system, an overhead of ~1s for DMA mapping a 32GB PCI BAR is reduced to sub-millisecond (8M page sized operations reduced to 32 pud sized operations).
>
> [ Test Plan ]
>
> On a machine with GPUs with sufficiently sized BARs:
> 1. Create a virtual machine with 4 GPUs passed through and CPU host-passthrough enabled. (We use DGX H100 or A100, typically)
> 2. Observe that, on an unaltered 6.14 kernel, the VM boot time exceeds 5 minutes
> 3. After applying this series to both the host and guest kernels (applied in ppa:mitchellaugustin/pcihugepfnmapfixes-plucky-kernel [4]), boot the guest and observe that the VM boot time is under 30 seconds, with the BAR initialization steps occurring significantly faster in dmesg output.
>
> I have verified this with the series applied to both the plucky kernel and the linux-nvidia-6.14
> kernel on DGX H100
>
> [ Fix ]
>
> This series attempts to fully address the issue by leveraging the huge
> pfnmap support added in v6.12. When we insert pfnmaps using pud and pmd
> mappings, we can later take advantage of the knowledge of the mapping
> level page mask to iterate on the relevant mapping stride.
>
> [ Where problems could occur ]
>
> I do not expect any regressions. The only callers of ABIs changed by this series are also adjusted within this series.
>
> [ Additional Context ]
>
> [0]: https://lore.kernel.org/all/20250218222209.1382449-1-alex.williamson@redhat.com/
> [1]: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com/
> [2]: https://bugs.launchpad.net/bugs/2097389
> [3]: https://bugs.launchpad.net/bugs/2101903
> [4]: https://launchpad.net/~mitchellaugustin/+archive/ubuntu/pcihugepfnmapfixes-plucky-kernel/
>
>
> Alex Williamson (6):
> mm: Provide address mask in struct follow_pfnmap_args
> vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch
> vfio/type1: Catch zero from pin_user_pages_remote()
> vfio/type1: Use vfio_batch for vaddr_get_pfns()
> vfio/type1: Use consistent types for page counts
> vfio/type1: Use mapping page mask for pfnmaps
>
> drivers/vfio/vfio_iommu_type1.c | 123 ++++++++++++++++++++------------
> include/linux/mm.h | 2 +
> mm/memory.c | 1 +
> 3 files changed, 80 insertions(+), 46 deletions(-)
>
Acked-by: Edoardo Canepa <edoardo.canepa at canonical.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x20F88172E14F6784.asc
Type: application/pgp-keys
Size: 3167 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250610/7b98c4b1/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250610/7b98c4b1/attachment-0001.sig>
More information about the kernel-team
mailing list