ACK: [SRU][Q][PATCH 0/2] Dell Machines with amdgpu cannot boot into OS with 6.17 kernel
Yufeng Gao
yufeng.gao at canonical.com
Tue Mar 17 01:26:35 UTC 2026
On 16/3/26 16:05, AceLan Kao wrote:
> From: "Chia-Lin Kao (AceLan)" <acelan.kao at canonical.com>
>
> BugLink: https://bugs.launchpad.net/bugs/2144522
>
> [Impact]
> Dell systems (CID: 202506-36819, 202506-36820, 202506-36823, 202506-36826) with
> AMD GFX 11.0.4 (gfx11) graphics cannot boot into the OS after upgrading to
> 6.17.0-1012-oem. The machine becomes stuck during boot and cannot reach the
> desktop. Booting with `nomodeset` works as a workaround, pointing to an amdgpu
> driver initialization failure.
> Failure rate: 4/4 (100%) on affected Dell systems.
>
> [Fix]
> Two patches fix this boot regression:
> 1. Raise the minimum MES firmware version for calling set_hw_resources_1 on
> GC 11.0.4 from 0x50 to 0x52, ensuring firmware that cannot properly
> initialize this call is not used.
> upstream in mainline kernel v7.0:
> 1478a34470bf drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52
> 2. Remove the MES LR compute workaround (enable_lr_compute_wa) from both
> mes_v11_0.c and mes_v12_0.c, since the underlying issue was already fixed by
> adjusting the VGPR size, and keeping the workaround causes instability with
> newer GC microcode.
> upstream in mainline kernel v7.0:
> 6b0d812971370 drm/amd: Disable MES LR compute W/A
>
> [Test Plan]
> Boot the affected machine with kernel 6.17.0-1012-oem (or later oem kernel).
> Without the patches: System gets stuck during boot and cannot reach the OS
> (amdgpu fails to initialize).
> With the patches: System boots normally to the desktop without requiring
> `nomodeset`.
> To verify:
> 1. Boot without `nomodeset`
> 2. Check that the desktop loads successfully
> 3. Confirm no amdgpu-related errors blocking boot in dmesg:
> $ sudo dmesg | grep -i "amdgpu\|mes\|drm"
>
> [Where problems could occur]
> These changes affect the amdgpu MES initialization path for gfx11 (mes_v11_0.c)
> and gfx12 (mes_v12_0.c) hardware.
> For patch 1: If the version threshold 0x52 is incorrect, machines with MES
> firmware between 0x50 and 0x51 that previously worked could stop calling
> set_hw_resources_1, potentially causing degraded GPU performance or missing
> hardware resource configuration. This would manifest as graphical glitches,
> GPU compute failures, or silent capability loss after MES initialization.
> For patch 2: If the LR compute workaround was actually needed for some
> gfx11/gfx12 product beyond gfx1151, removing it could cause GPU hangs or
> compute workload failures on those variants. Symptoms would include GPU hangs,
> DRM timeout errors, or compute job failures under load.
>
> [Other Info]
> Both patches are upstream in Linux 7.0 (merged via drm-next-2026-02-11).
>
> Mario Limonciello (2):
> drm/amd: Disable MES LR compute W/A
> drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52
>
> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 7 +------
> drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 5 -----
> 2 files changed, 1 insertion(+), 11 deletions(-)
>
> --
> 2.53.0
>
>
Acked-by: Yufeng Gao <yufeng.gao at canonical.com>
More information about the kernel-team
mailing list