NAK: [SRU][J/N][PATCH 0/4] Fix memremap_pages failures on x86 systems with large PCIe BAR addresses
Jacob Martin
jacob.martin at canonical.com
Tue Aug 12 22:03:00 UTC 2025
On 8/12/25 4:51 PM, Jacob Martin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2120209
>
> SRU Justification
>
> [Impact]
>
> On some x86 systems, it is possible for PCIe device BAR addresses to exceed the
> range reserved by KASLR for direct mappings. This causes attempts to map the
> impacted BAR region using devm_memremap_pages() to fail. These memmap-backed
> mappings are required for multiple use-cases, including P2PDMA, and CUDA with
> Heterogeneous Memory Management (HMM) enabled.
>
> [Fix]
>
> This is resolved upstream by commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR
> entropy on most x86 systems"). It changes the behavior of KASLR to not shrink
> direct mapping space when CONFIG_PCI_P2PDMA is enabled. The consequence of this
> is that there is less room for KASLR to maneuver, and thus the amount of
> entropy in the randomized layout is reduced. In discussion on the upstream
> patch submission [1], it is noted that on the submitter's system this reduces
> entropy from 16 bits down to 15 bits.
>
> Cherry-picking the mentioned commit allows CUDA with HMM enabled and
> P2PDMA to function on the systems described above, as with it the direct
> mapping space is not shrunk, so all BAR regions fall within its bounds,
> and thus the devm_memremap_pages() operation succeeds.
>
> Additionally, the commit 7170130e4c72 ("x86/mm/init: Handle the special
> case of device private pages in add_pages(), to not increase max_pfn and
> trigger dma_addr essing_limited() bounce buffers") addresses a
> performance regression revealed by applying commit 7ffb791423c7
> ("x86/kaslr: Reduce KASLR entropy on most x86 systems").
>
> Jammy 5.15 has CONFIG_PCI_P2PDMA set to n, so a cherry-pick alone will not
> resolve the issue. In addition to the cherry-pick, set CONFIG_PCI_P2PDMA=y.
>
> Jammy: 7ffb791423c7 already in-tree. Cherry-pick of 7170130e4c72 and
> CONFIG_PCI_P2PDMA=y needed.
> Noble: Cherry-pick of both commits mentioned above needed.
> Plucky: Not affected, fix commits already in tree and
> CONFIG_PCI_P2PDMA=y.
> Questing: Not affected, fix commits already in tree and config set and
> CONFIG_PCI_P2PDMA=y.
>
> [Test Case]
>
> The issue only occurs on systems with PCIe BAR addresses located outside of the
> current minimum address range of [0, ceil(max_pfn / 1TiB) +
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING (10 TiB)].
>
> With the NVIDIA Container Toolkit installed and enabled for Docker, the
> following reproduces the issue on affected systems where one or more NVIDIA
> GPUs have BAR addresses outside of the current minimum range:
>
> $ sudo docker run --runtime nvidia --rm -it nvcr.io/nvidia/pytorch:25.03-py3
> ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU
> functionality will not be available.
> [[ Initialization error (error 3) ]]
>
> [Where things could go wrong]
>
> This reduces the entropy of the memory layouts KASLR generates on most x86
> systems. A bug would likely show up as misbehavior of KASLR.
>
> On Jammy, this changeset also enables CONFIG_PCI_P2PDMA, which could have
> additional side-effects. There is an LP bug [2] noting the change of
> CONFIG_PCI_P2PDMA in newer kernels.
>
> [Other Notes]
>
> [1] https://lore.kernel.org/lkml/202502061145.8AFAF053E4@keescook/
> [2] https://bugs.launchpad.net/bugs/1987394
>
Re-sending with updated Jammy config change. The initial one would fail
to build on some arches where CONFIG_PCI_P2PDMA is unsupported.
More information about the kernel-team
mailing list