ACK: [SRU][N:gcp][PATCH 000/109] Enable NVIDIA Grace platform for Google Cloud Project

Jacob Martin jacob.martin at canonical.com
Mon Jun 23 19:05:17 UTC 2025


On 6/16/25 7:43 PM, Tim Whisonant wrote:
> BugLink: https://bugs.launchpad.net/bugs/2111859
> 
> SRU Justification
> 
> [Impact]
> 
> Google requested a Grace-enabled 6.8 kernel for GCP. This patchset targets
> noble:linux-gcp. The derivative kernel jammy:linux-gcp-6.8 will inherit
> these patches and will be the primary deployment kernel.
> 
> [Fix]
> 
> Add select NVIDIA Grace platform patches, similar to what has been done
> for kernels like noble:linux-azure-nvidia. NVIDIA provides a
> document [1] listing all of the required and recommended kernel patches for
> Grace enablement. Version 11 April 25, 2025 of the document was used
> when preparing this patchset as was version
> Ubuntu-azure-nvidia-6.8.0-1016.17 of the noble:linux-azure-nvidia kernel.
> 
> [1] https://docs.nvidia.com/grace-patch-config-guide.pdf
> 
> [Test Plan]
> 
> Boot testing on arm64/64k pages was performed in house. Further testing
> will be done by Google in their Grace-enabled environment once available.
> 
> [Regression potential]
> 
> The regression potential is considered moderate due to the number of
> patches involved and the breadth of the changes. The changes affect
> pte, perf, PCI, cpufreq, ACPI, mmu, coresight, and other kernel
> subsystems.
> 
> [Other]
> 
> SF #00411226
> 
> Alexey Kardashevskiy (1):
>    PCI/DOE: Support discovery version 2
> 
> Arnd Bergmann (1):
>    arm64/io: add constant-argument check
> 
> Aubrey Li (1):
>    ACPI: PRM: Remove unnecessary strict handler address checks
> 
> Barry Song (1):
>    mm: make folio_pte_batch available outside of mm/memory.c
> 
> Beata Michalska (6):
>    arm64: amu: Delay allocating cpumask for AMU FIE support
>    cpufreq: Allow arch_freq_get_on_cpu to return an error
>    cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
>    arm64: Provide an AMU-based version of arch_freq_get_on_cpu
>    arm64: Update AMU-based freq scale factor on entering idle
>    arm64: Utilize for_each_cpu_wrap for reference lookup
> 
> Besar Wicaksono (5):
>    perf arm-spe: Add Neoverse-V2 to common data source encoding list
>    perf: arm_cspmu: nvidia: remove unsupported SCF events
>    perf: arm_cspmu: nvidia: fix sysfs path in the kernel doc
>    perf: arm_cspmu: nvidia: enable NVLINK-C2C port filtering
>    perf: arm_cspmu: nvidia: monitor all ports by default
> 
> Dan Williams (1):
>    ACPI/HMAT: Move HMAT messages to pr_debug()
> 
> David Hildenbrand (14):
>    arm/pgtable: define PFN_PTE_SHIFT
>    nios2/pgtable: define PFN_PTE_SHIFT
>    powerpc/pgtable: define PFN_PTE_SHIFT
>    riscv/pgtable: define PFN_PTE_SHIFT
>    s390/pgtable: define PFN_PTE_SHIFT
>    sparc/pgtable: define PFN_PTE_SHIFT
>    mm/pgtable: make pte_next_pfn() independent of set_ptes()
>    arm/mm: use pte_next_pfn() in set_ptes()
>    powerpc/mm: use pte_next_pfn() in set_ptes()
>    mm/memory: factor out copying the actual PTE in copy_present_pte()
>    mm/memory: pass PTE to copy_present_pte()
>    mm/memory: optimize fork() with PTE-mapped THP
>    mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch()
>    mm/memory: ignore writable bit in folio_pte_batch()
> 
> Gavin Shan (2):
>    arm64: tlb: Improve __TLBI_VADDR_RANGE()
>    arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
> 
> Ian Rogers (1):
>    perf arm-spe/cs-etm: Directly iterate CPU maps
> 
> Ilkka Koskinen (1):
>    perf cs-etm: Fix the assert() to handle captured and unprocessed cpu
>      trace
> 
> Ionela Voinescu (1):
>    arch_topology: init capacity_freq_ref to 0
> 
> James Clark (30):
>    coresight: Remove unused ETM Perf stubs
>    coresight: Clarify comments around the PID of the sink owner
>    coresight: Move struct coresight_trace_id_map to common header
>    coresight: Expose map arguments in trace ID API
>    coresight: Make CPU id map a property of a trace ID map
>    coresight: Make language around "activated" sinks consistent
>    coresight: Remove ops callback checks
>    coresight: Move mode to struct coresight_device
>    coresight: Remove the 'enable' field.
>    coresight: Move all sysfs code to sysfs file
>    coresight: Remove atomic type from refcnt
>    coresight: Remove unused stubs
>    coresight: Add explicit member initializers to coresight_dev_type
>    coresight: Add helper for atomically taking the device
>    coresight: Add a helper for getting csdev->mode
>    coresight: Use per-sink trace ID maps for Perf sessions
>    coresight: Remove pending trace ID release mechanism
>    coresight: Emit sink ID in the HW_ID packets
>    coresight: Make trace ID map spinlock local to the map
>    perf auxtrace: Allow number of queues to be specified
>    perf cs-etm: Print error for new PERF_RECORD_AUX_OUTPUT_HW_ID versions
>    perf cs-etm: Use struct perf_cpu as much as possible
>    perf cs-etm: Create decoders after both AUX and HW_ID search passes
>    perf: cs-etm: Allocate queues for all CPUs
>    perf: cs-etm: Move traceid_list to each queue
>    perf: cs-etm: Create decoders based on the trace ID mappings
>    perf: cs-etm: Only save valid trace IDs into files
>    perf: cs-etm: Support version 0.1 of HW_ID packets
>    perf: cs-etm: Print queue number in raw trace dump
>    perf arm-spe: Use old behavior when opening old SPE files
> 
> Jason Gunthorpe (6):
>    arm64/io: Provide a WC friendly __iowriteXX_copy()
>    net: hns3: Remove io_stop_wc() calls after __iowrite64_copy()
>    x86: Stop using weak symbols for __iowrite32_copy()
>    s390: Implement __iowrite32_copy()
>    s390: Stop using weak symbols for __iowrite64_copy()
>    PCI: Fix pci_enable_acs() support for the ACS quirks
> 
> Jie Zhan (1):
>    cppc_cpufreq: Remove HiSilicon CPPC workaround
> 
> Kai-Heng Feng (1):
>    PCI: Use downstream bridges for distributing resources
> 
> Leo Yan (8):
>    perf arm-spe: Rename arm_spe__synth_data_source_generic()
>    perf arm-spe: Rename the common data source encoding
>    perf arm-spe: Support metadata version 2
>    perf arm-spe: Introduce arm_spe__is_homogeneous()
>    perf arm-spe: Use metadata to decide the data source feature
>    perf arm-spe: Remove the unused 'midr' field
>    perf arm-spe: Add Cortex CPUs to common data source encoding list
>    perf arm-spe: Define metadata header version 2
> 
> Namhyung Kim (1):
>    tools/include: Sync arm64 headers with the kernel sources
> 
> Petr Vaněk (1):
>    mm: fix folio_pte_batch() on XEN PV
> 
> Piotr Jaroszynski (1):
>    Fix mmu notifiers for range-based invalidates
> 
> Ryan Roberts (20):
>    arm64/mm: make set_ptes() robust when OAs cross 48-bit boundary
>    mm: clarify the spec for set_ptes()
>    mm: thp: batch-collapse PMD with set_ptes()
>    mm: introduce pte_advance_pfn() and use for pte_next_pfn()
>    arm64/mm: convert pte_next_pfn() to pte_advance_pfn()
>    x86/mm: convert pte_next_pfn() to pte_advance_pfn()
>    mm: tidy up pte_next_pfn() definition
>    arm64/mm: convert READ_ONCE(*ptep) to ptep_get(ptep)
>    arm64/mm: convert set_pte_at() to set_ptes(..., 1)
>    arm64/mm: convert ptep_clear() to ptep_get_and_clear()
>    arm64/mm: new ptep layer to manage contig bit
>    arm64/mm: dplit __flush_tlb_range() to elide trailing DSB
>    arm64/mm: wire up PTE_CONT for user mappings
>    arm64/mm: implement new wrprotect_ptes() batch API
>    arm64/mm: implement new [get_and_]clear_full_ptes() batch APIs
>    mm: add pte_batch_hint() to reduce scanning in folio_pte_batch()
>    arm64/mm: implement pte_batch_hint()
>    arm64/mm: __always_inline to improve fork() perf
>    arm64/mm: automatically fold contpte mappings
>    arm64/mm: export contpte symbols only to GPL users
> 
> Tim Whisonant (2):
>    UBUNTU: [Packaging] gcp: enable CONFIG_CPUFREQ_ARCH_CUR_FREQ
>    UBUNTU: [Packaging] gcp: enable CONFIG_ARM64_CONTPTE
> 
> Tushar Dave (1):
>    PCI/ACS: Fix 'pci=config_acs=' parameter
> 
> Vidya Sagar (2):
>    PCI: Clear Secondary Status errors after enumeration
>    PCI: Extend ACS configurability
> 
>   .../admin-guide/kernel-parameters.txt         |  32 +
>   Documentation/admin-guide/perf/nvidia-pmu.rst |  52 +-
>   Documentation/admin-guide/pm/cpufreq.rst      |  17 +-
>   arch/arm/include/asm/pgtable.h                |   2 +
>   arch/arm/mm/mmu.c                             |   2 +-
>   arch/arm64/Kconfig                            |   9 +
>   arch/arm64/include/asm/io.h                   | 128 ++++
>   arch/arm64/include/asm/pgtable.h              | 431 ++++++++++--
>   arch/arm64/include/asm/tlbflush.h             |  68 +-
>   arch/arm64/kernel/efi.c                       |   4 +-
>   arch/arm64/kernel/io.c                        |  42 ++
>   arch/arm64/kernel/mte.c                       |   2 +-
>   arch/arm64/kernel/topology.c                  | 150 ++++-
>   arch/arm64/kvm/guest.c                        |   2 +-
>   arch/arm64/mm/Makefile                        |   1 +
>   arch/arm64/mm/contpte.c                       | 404 +++++++++++
>   arch/arm64/mm/fault.c                         |  12 +-
>   arch/arm64/mm/fixmap.c                        |   4 +-
>   arch/arm64/mm/hugetlbpage.c                   |  40 +-
>   arch/arm64/mm/kasan_init.c                    |   6 +-
>   arch/arm64/mm/mmu.c                           |  16 +-
>   arch/arm64/mm/pageattr.c                      |   6 +-
>   arch/arm64/mm/trans_pgd.c                     |   6 +-
>   arch/nios2/include/asm/pgtable.h              |   2 +
>   arch/powerpc/include/asm/pgtable.h            |   2 +
>   arch/powerpc/mm/pgtable.c                     |   5 +-
>   arch/riscv/include/asm/pgtable.h              |   2 +
>   arch/s390/include/asm/io.h                    |  15 +
>   arch/s390/include/asm/pgtable.h               |   2 +
>   arch/s390/pci/pci.c                           |   6 -
>   arch/sparc/include/asm/pgtable_64.h           |   2 +
>   arch/x86/include/asm/io.h                     |  17 +
>   arch/x86/include/asm/pgtable.h                |   8 +-
>   arch/x86/kernel/cpu/aperfmperf.c              |   2 +-
>   arch/x86/kernel/cpu/proc.c                    |   7 +-
>   arch/x86/lib/Makefile                         |   1 -
>   arch/x86/lib/iomap_copy_64.S                  |  15 -
>   debian.gcp/config/annotations                 |   6 +
>   drivers/acpi/numa/hmat.c                      |  24 +-
>   drivers/acpi/prmt.c                           |   4 +-
>   drivers/base/arch_topology.c                  |   8 +-
>   drivers/cpufreq/Kconfig.x86                   |  12 +
>   drivers/cpufreq/cppc_cpufreq.c                |  73 +-
>   drivers/cpufreq/cpufreq.c                     |  38 +-
>   drivers/hwtracing/coresight/coresight-core.c  | 515 +-------------
>   drivers/hwtracing/coresight/coresight-dummy.c |   3 +-
>   drivers/hwtracing/coresight/coresight-etb10.c |  29 +-
>   .../hwtracing/coresight/coresight-etm-perf.c  |  43 +-
>   .../hwtracing/coresight/coresight-etm-perf.h  |  18 -
>   drivers/hwtracing/coresight/coresight-etm.h   |   2 -
>   .../coresight/coresight-etm3x-core.c          |  32 +-
>   .../coresight/coresight-etm3x-sysfs.c         |   4 +-
>   .../coresight/coresight-etm4x-core.c          |  35 +-
>   drivers/hwtracing/coresight/coresight-etm4x.h |   1 -
>   drivers/hwtracing/coresight/coresight-priv.h  |   8 +-
>   drivers/hwtracing/coresight/coresight-stm.c   |  33 +-
>   drivers/hwtracing/coresight/coresight-sysfs.c | 392 +++++++++++
>   .../hwtracing/coresight/coresight-tmc-core.c  |   2 +-
>   .../hwtracing/coresight/coresight-tmc-etf.c   |  46 +-
>   .../hwtracing/coresight/coresight-tmc-etr.c   |  38 +-
>   drivers/hwtracing/coresight/coresight-tmc.h   |   7 +-
>   drivers/hwtracing/coresight/coresight-tpda.c  |  13 +-
>   drivers/hwtracing/coresight/coresight-tpdm.c  |   3 +-
>   drivers/hwtracing/coresight/coresight-tpiu.c  |  14 +-
>   .../hwtracing/coresight/coresight-trace-id.c  | 138 ++--
>   .../hwtracing/coresight/coresight-trace-id.h  |  70 +-
>   drivers/hwtracing/coresight/ultrasoc-smb.c    |  22 +-
>   drivers/hwtracing/coresight/ultrasoc-smb.h    |   2 -
>   .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   4 -
>   drivers/pci/doe.c                             |  12 +-
>   drivers/pci/pci.c                             | 160 +++--
>   drivers/pci/probe.c                           |   3 +
>   drivers/pci/setup-bus.c                       |   3 +-
>   drivers/perf/arm_cspmu/nvidia_cspmu.c         |  75 +--
>   include/linux/coresight-pmu.h                 |  17 +-
>   include/linux/coresight.h                     | 151 ++---
>   include/linux/cpufreq.h                       |   2 +-
>   include/linux/efi.h                           |   5 +
>   include/linux/io.h                            |   8 +-
>   include/linux/pgtable.h                       |  65 +-
>   include/uapi/linux/pci_regs.h                 |   1 +
>   lib/iomap_copy.c                              |  13 +-
>   mm/huge_memory.c                              |  58 +-
>   mm/internal.h                                 |  88 +++
>   mm/memory.c                                   | 143 ++--
>   tools/arch/arm64/include/asm/cputype.h        |  10 +
>   tools/include/linux/coresight-pmu.h           |  17 +-
>   tools/perf/arch/arm/util/cs-etm.c             | 307 ++++-----
>   tools/perf/arch/arm64/util/arm-spe.c          |   8 +-
>   .../util/arm-spe-decoder/arm-spe-decoder.h    |  18 +-
>   tools/perf/util/arm-spe.c                     | 234 ++++++-
>   tools/perf/util/arm-spe.h                     |  38 +-
>   tools/perf/util/auxtrace.c                    |   9 +-
>   tools/perf/util/auxtrace.h                    |   1 +
>   .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  36 +-
>   .../perf/util/cs-etm-decoder/cs-etm-decoder.h |   2 +-
>   tools/perf/util/cs-etm.c                      | 631 +++++++++++-------
>   tools/perf/util/cs-etm.h                      |  12 +-
>   98 files changed, 3414 insertions(+), 1874 deletions(-)
>   create mode 100644 arch/arm64/mm/contpte.c
>   delete mode 100644 arch/x86/lib/iomap_copy_64.S
> 

Acked-by: Jacob Martin <jacob.martin at canonical.com>




More information about the kernel-team mailing list