[SRU][J/N][PATCH 0/4] Fix memremap_pages failures on x86 systems with large PCIe BAR addresses

Jacob Martin jacob.martin at canonical.com
Tue Aug 12 21:51:37 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2120209

SRU Justification

[Impact]

On some x86 systems, it is possible for PCIe device BAR addresses to exceed the
range reserved by KASLR for direct mappings. This causes attempts to map the
impacted BAR region using devm_memremap_pages() to fail. These memmap-backed
mappings are required for multiple use-cases, including P2PDMA, and CUDA with
Heterogeneous Memory Management (HMM) enabled.

[Fix]

This is resolved upstream by commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR
entropy on most x86 systems"). It changes the behavior of KASLR to not shrink
direct mapping space when CONFIG_PCI_P2PDMA is enabled. The consequence of this
is that there is less room for KASLR to maneuver, and thus the amount of
entropy in the randomized layout is reduced. In discussion on the upstream
patch submission [1], it is noted that on the submitter's system this reduces
entropy from 16 bits down to 15 bits.

Cherry-picking the mentioned commit allows CUDA with HMM enabled and
P2PDMA to function on the systems described above, as with it the direct
mapping space is not shrunk, so all BAR regions fall within its bounds,
and thus the devm_memremap_pages() operation succeeds.

Additionally, the commit 7170130e4c72 ("x86/mm/init: Handle the special
case of device private pages in add_pages(), to not increase max_pfn and
trigger dma_addr essing_limited() bounce buffers") addresses a
performance regression revealed by applying commit 7ffb791423c7
("x86/kaslr: Reduce KASLR entropy on most x86 systems").

Jammy 5.15 has CONFIG_PCI_P2PDMA set to n, so a cherry-pick alone will not
resolve the issue. In addition to the cherry-pick, set CONFIG_PCI_P2PDMA=y.

Jammy: 7ffb791423c7 already in-tree. Cherry-pick of 7170130e4c72 and
       CONFIG_PCI_P2PDMA=y needed.
Noble: Cherry-pick of both commits mentioned above needed.
Plucky: Not affected, fix commits already in tree and
        CONFIG_PCI_P2PDMA=y.
Questing: Not affected, fix commits already in tree and config set and
          CONFIG_PCI_P2PDMA=y.

[Test Case]

The issue only occurs on systems with PCIe BAR addresses located outside of the
current minimum address range of [0, ceil(max_pfn / 1TiB) +
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING (10 TiB)].

With the NVIDIA Container Toolkit installed and enabled for Docker, the
following reproduces the issue on affected systems where one or more NVIDIA
GPUs have BAR addresses outside of the current minimum range:

$ sudo docker run --runtime nvidia --rm -it nvcr.io/nvidia/pytorch:25.03-py3
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU
functionality will not be available.
   [[ Initialization error (error 3) ]]

[Where things could go wrong]

This reduces the entropy of the memory layouts KASLR generates on most x86
systems. A bug would likely show up as misbehavior of KASLR.

On Jammy, this changeset also enables CONFIG_PCI_P2PDMA, which could have
additional side-effects. There is an LP bug [2] noting the change of
CONFIG_PCI_P2PDMA in newer kernels.

[Other Notes]

[1] https://lore.kernel.org/lkml/202502061145.8AFAF053E4@keescook/
[2] https://bugs.launchpad.net/bugs/1987394

-- 
2.43.0




More information about the kernel-team mailing list