[SRU][Q][PATCH 0/2] Dell Machines with amdgpu cannot boot into OS with 6.17 kernel

AceLan Kao acelan.kao at canonical.com
Mon Mar 16 06:05:37 UTC 2026


From: "Chia-Lin Kao (AceLan)" <acelan.kao at canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2144522

[Impact]
Dell systems (CID: 202506-36819, 202506-36820, 202506-36823, 202506-36826) with
AMD GFX 11.0.4 (gfx11) graphics cannot boot into the OS after upgrading to
6.17.0-1012-oem. The machine becomes stuck during boot and cannot reach the
desktop. Booting with `nomodeset` works as a workaround, pointing to an amdgpu
driver initialization failure.
Failure rate: 4/4 (100%) on affected Dell systems.

[Fix]
Two patches fix this boot regression:
1. Raise the minimum MES firmware version for calling set_hw_resources_1 on
   GC 11.0.4 from 0x50 to 0x52, ensuring firmware that cannot properly
   initialize this call is not used.
   upstream in mainline kernel v7.0:
   1478a34470bf drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52
2. Remove the MES LR compute workaround (enable_lr_compute_wa) from both
   mes_v11_0.c and mes_v12_0.c, since the underlying issue was already fixed by
   adjusting the VGPR size, and keeping the workaround causes instability with
   newer GC microcode.
   upstream in mainline kernel v7.0:
   6b0d812971370 drm/amd: Disable MES LR compute W/A

[Test Plan]
Boot the affected machine with kernel 6.17.0-1012-oem (or later oem kernel).
Without the patches: System gets stuck during boot and cannot reach the OS
(amdgpu fails to initialize).
With the patches: System boots normally to the desktop without requiring
`nomodeset`.
To verify:
1. Boot without `nomodeset`
2. Check that the desktop loads successfully
3. Confirm no amdgpu-related errors blocking boot in dmesg:
   $ sudo dmesg | grep -i "amdgpu\|mes\|drm"

[Where problems could occur]
These changes affect the amdgpu MES initialization path for gfx11 (mes_v11_0.c)
and gfx12 (mes_v12_0.c) hardware.
For patch 1: If the version threshold 0x52 is incorrect, machines with MES
firmware between 0x50 and 0x51 that previously worked could stop calling
set_hw_resources_1, potentially causing degraded GPU performance or missing
hardware resource configuration. This would manifest as graphical glitches,
GPU compute failures, or silent capability loss after MES initialization.
For patch 2: If the LR compute workaround was actually needed for some
gfx11/gfx12 product beyond gfx1151, removing it could cause GPU hangs or
compute workload failures on those variants. Symptoms would include GPU hangs,
DRM timeout errors, or compute job failures under load.

[Other Info]
Both patches are upstream in Linux 7.0 (merged via drm-next-2026-02-11).

Mario Limonciello (2):
  drm/amd: Disable MES LR compute W/A
  drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52

 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 7 +------
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 5 -----
 2 files changed, 1 insertion(+), 11 deletions(-)

--
2.53.0




More information about the kernel-team mailing list