NAK: [SRU][J/N][PATCH 0/4] Fix memremap_pages failures on x86 systems with large PCIe BAR addresses

Jacob Martin jacob.martin at canonical.com
Tue Aug 12 22:03:00 UTC 2025


On 8/12/25 4:51 PM, Jacob Martin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2120209
> 
> SRU Justification
> 
> [Impact]
> 
> On some x86 systems, it is possible for PCIe device BAR addresses to exceed the
> range reserved by KASLR for direct mappings. This causes attempts to map the
> impacted BAR region using devm_memremap_pages() to fail. These memmap-backed
> mappings are required for multiple use-cases, including P2PDMA, and CUDA with
> Heterogeneous Memory Management (HMM) enabled.
> 
> [Fix]
> 
> This is resolved upstream by commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR
> entropy on most x86 systems"). It changes the behavior of KASLR to not shrink
> direct mapping space when CONFIG_PCI_P2PDMA is enabled. The consequence of this
> is that there is less room for KASLR to maneuver, and thus the amount of
> entropy in the randomized layout is reduced. In discussion on the upstream
> patch submission [1], it is noted that on the submitter's system this reduces
> entropy from 16 bits down to 15 bits.
> 
> Cherry-picking the mentioned commit allows CUDA with HMM enabled and
> P2PDMA to function on the systems described above, as with it the direct
> mapping space is not shrunk, so all BAR regions fall within its bounds,
> and thus the devm_memremap_pages() operation succeeds.
> 
> Additionally, the commit 7170130e4c72 ("x86/mm/init: Handle the special
> case of device private pages in add_pages(), to not increase max_pfn and
> trigger dma_addr essing_limited() bounce buffers") addresses a
> performance regression revealed by applying commit 7ffb791423c7
> ("x86/kaslr: Reduce KASLR entropy on most x86 systems").
> 
> Jammy 5.15 has CONFIG_PCI_P2PDMA set to n, so a cherry-pick alone will not
> resolve the issue. In addition to the cherry-pick, set CONFIG_PCI_P2PDMA=y.
> 
> Jammy: 7ffb791423c7 already in-tree. Cherry-pick of 7170130e4c72 and
>         CONFIG_PCI_P2PDMA=y needed.
> Noble: Cherry-pick of both commits mentioned above needed.
> Plucky: Not affected, fix commits already in tree and
>          CONFIG_PCI_P2PDMA=y.
> Questing: Not affected, fix commits already in tree and config set and
>            CONFIG_PCI_P2PDMA=y.
> 
> [Test Case]
> 
> The issue only occurs on systems with PCIe BAR addresses located outside of the
> current minimum address range of [0, ceil(max_pfn / 1TiB) +
> CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING (10 TiB)].
> 
> With the NVIDIA Container Toolkit installed and enabled for Docker, the
> following reproduces the issue on affected systems where one or more NVIDIA
> GPUs have BAR addresses outside of the current minimum range:
> 
> $ sudo docker run --runtime nvidia --rm -it nvcr.io/nvidia/pytorch:25.03-py3
> ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU
> functionality will not be available.
>     [[ Initialization error (error 3) ]]
> 
> [Where things could go wrong]
> 
> This reduces the entropy of the memory layouts KASLR generates on most x86
> systems. A bug would likely show up as misbehavior of KASLR.
> 
> On Jammy, this changeset also enables CONFIG_PCI_P2PDMA, which could have
> additional side-effects. There is an LP bug [2] noting the change of
> CONFIG_PCI_P2PDMA in newer kernels.
> 
> [Other Notes]
> 
> [1] https://lore.kernel.org/lkml/202502061145.8AFAF053E4@keescook/
> [2] https://bugs.launchpad.net/bugs/1987394
> 

Re-sending with updated Jammy config change. The initial one would fail 
to build on some arches where CONFIG_PCI_P2PDMA is unsupported.




More information about the kernel-team mailing list