[Bug 2125145] Re: [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel Crashdump Files Generated on the Latest HWE Kernel
Dave Jones
2125145 at bugs.launchpad.net
Mon Dec 15 14:33:28 UTC 2025
Thanks for the updated diffs; sponsoring for questing:
Uploading crash_8.0.6-1ubuntu2.25.10.1.dsc
Uploading crash_8.0.6-1ubuntu2.25.10.1.debian.tar.xz
Uploading crash_8.0.6-1ubuntu2.25.10.1_source.buildinfo
Uploading crash_8.0.6-1ubuntu2.25.10.1_source.changes
--
You received this bug notification because you are a member of Ubuntu
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/2125145
Title:
[SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel
Crashdump Files Generated on the Latest HWE Kernel
Status in crash package in Ubuntu:
Fix Released
Status in makedumpfile package in Ubuntu:
Fix Released
Status in crash source package in Noble:
New
Status in makedumpfile source package in Noble:
New
Status in crash source package in Plucky:
New
Status in makedumpfile source package in Plucky:
Fix Released
Status in crash source package in Questing:
New
Status in makedumpfile source package in Questing:
Fix Released
Status in crash source package in Resolute:
Fix Released
Status in makedumpfile source package in Resolute:
Fix Released
Bug description:
Note: Original description is at the bottom of this report
[Impact]
The current versions of Makedumpfile and Crash in the -updates pocket
on Noble do not support the latest hardware enablement kernel for that
platform, which is 6.14. There are several architecture-dependent and
kernel flavor-dependent behaviours that I will outline below, but the
steps to reproduce are the same.
Reproducer steps:
-----------------
Boot into a hardware enablement kernel. For example, on arm64 use the
6.14.0-1008-nvidia-64k kernel:
KERNEL_VERSION=6.14.0-1008-nvidia-64k
DISTRO=noble
sudo apt update
sudo apt install ubuntu-dbgsym-keyring
echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe multiverse
deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe multiverse | \
sudo tee /etc/apt/sources.list.d/ddebs.list
sudo apt update
sudo apt install linux-image-${KERNEL_VERSION}
sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym
Modify grub's cmdline to specify a crashkernel: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar
sudo update-grub
sudo apt install kexec-tools kdump-tools crash makedumpfile
sudo systemctl enable kdump-tools
sudo systemctl start kdump-tools
sudo reboot
echo c | sudo tee /proc/sysrq-trigger
After the machine recovers,
crash /usr/lib/debug/boot/<kernel-dbgsym> /var/crash/<dump-dir>/<dump-
file>
Results on Arm64
----------------
crash 8.0.4
Copyright (C) 2002-2022 Red Hat, Inc.
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
please wait... (gathering task table data)
crash: page excluded: kernel virtual address: ffff07ffa042d8e0 type: "xa_node.slots[off]"
Results on amd64
----------------
On an amd64 machine, using a kernel such as linux-
image-6.14.0-29-generic results in crash failing to open. No error is
printed but we don't obtain the prompt:
crash 8.0.4
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
# Program exits and no prompt is presented
[Test Plan]
* Ensure that with the proposed combination of makedumpfile and crash is capable of generating and subsequently opening crashdumps on the HWE and GA kernels available for that platform. Here is the mapping ATOW:
Noble GA: 6.8
Noble HWE: 6.14
Plucky (interim release, no HWE): 6.14
Questing (interim release, no HWE): 6.17
Resolute (development): 6.17 (as of Oct. 14th 2025)
* Ensure all of crash's commands produce the expected output (eg. ps,
mount, files, vm, vtop, runq, etc.)
* If bugs are found in generating and reading crashdumps on the HWE
kernel on other architectures (s390x, etc.), this test plan can be
expanded to include those.
[Where Problems Could Occur]
* Crash and Makedumpfile are designed to be backwards-compatible, so the risk of regression when backporting a commit is low - however, not zero. This is why it will be important to ensure that the proposed combination of Makedumpfile and crash does not break existing environments - eg. the GA kernel
* The matrix of hardware and kernel versions (including derivative /
cloud kernels) to test again is extensive. It's possible that the
commits identified to solve the known problems will not be
comprehensive. For example, cpu architectures and kernels not in the
test matrix may require additional commits to be backported.
[Other Info]
* Support/SEG are currently having conversations with the kernel team
about the potential to proactively SRU / MRE the latest upstream crash
version, and potentially Makedumpfile as well, alongside -hwe kernel
releases to avoid this sort of regression in the future. Though, we
understand this would require an SRUExceptionPolicy to be approved and
published.
[Investigation and summary of changes]
We have identified that on the Makedumpfile at least two commits are needed:
[1] https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
[2] https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda
These are patches to compensate for a change in the kernel's mapping
of memory. Using the patched Makedumpfile helps, but it is not
sufficient. Including the patches in Makedumpfile (or using the tip of
upstream master), but opening with the currently distributed crash
results in the following errors:
eg. Patched Makedumpfile with crash 8.0.4 on Arm64:
---------------------------------------------------
...
WARNING: cannot determine starting stack frame for task ffffd574e21b4800
WARNING: cannot determine starting stack frame for task
ffff07ff83296300
WARNING: cannot determine starting stack frame for task
ffff07ff83293f80
WARNING: cannot determine starting stack frame for task
ffff07ff83a04700
WARNING: cannot determine starting stack frame for task ffff08010507c400
KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
DUMPFILE: /var/crash/patched_mdf/dump.202509191531 [PARTIAL DUMP]
CPUS: 128 [OFFLINE: 127]
DATE: Thu Jan 1 00:00:00 UTC 1970
UPTIME: 00:13:38
LOAD AVERAGE: 0.12, 0.16, 0.10
TASKS: 1573
NODENAME: penguru
RELEASE: 6.14.0-1008-nvidia-64k
VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
MACHINE: aarch64 (unknown Mhz)
MEMORY: 63.8 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 7886
COMMAND: "tee"
TASK: ffff08010507c400 [THREAD_INFO: ffff08010507c400]
CPU: 85
STATE: TASK_RUNNING (PANIC)
On Amd64
--------
Crash still fails to open.
Therefore, in addition to the above Makedumpfile commits, crash
requires some patching. With the above two commits to Makedumpfile I
did a bisect on crash on amd64 and arm64.
On the amd64 crash side, I have identified that [3] applied in isolation (cherry-picked) is sufficient on amd64
[3] https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad
I have also found that cherry-picking [4] and [5] resolves the issue on arm64 hardware in testflinger (using the machine agent penguru)
[4] https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7
[5] https://github.com/crash-utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3
At this point, crash's commands such as mount, files, vm, etc. were
still broken. To resolve this, [6] and [7] are needed
[6] https://github.com/crash-utility/crash/commit/3d60d9d40457239683a5f20b01437db94f964fb8
[7] https://github.com/crash-utility/crash/commit/2795136a515446b798ebbfa257c97f0ca6ecb8ec
To SRU for Noble, crash must also be work on Plucky, Questing, and Resolute. The current version of makedumpfile on all of those series was found to be sufficient and so no SRU for makedumpfile is required on those. However for crash:
* Plucky uses the 6.14 kernel, so no additional commits are needed - in fact due to the newer version available on Plucky, only [7] is needed.
* Questing uses the 6.17 kernel. No issues other than [7] were observed on arm, but on amd64, an infinite loop while gdb loaded module symbols was observed, This is fixed in [8].
* Resolute will ship with a newer kernel than 6.17, but as of October 14th, 2025 is currently based on 6.17. Currently the package in Debian unstable, which will autosync to Resolute does not contain the required fixes and so it will also require SRU with [7] and [8] unless superceded by an upstream (Debian) version bump.
[8] https://github.com/crash-
utility/crash/commit/e44a9a9d808c83fb846060f65e5aaa9d30b6e2c4
PPA with all of the packages built (except resolute):
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lp2125145
--------------------------------------------------------------
Original Description:
=====================
24.04 LTS,
Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Problem Description:
crash utility is crashing (error code 1) when attempting to analyze kernel crash dumps.
Setup kdump & generated kernel panic using “echo 1 >
/proc/sys/kernel/sysrq” but, crash cannot access it:
# crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
dump.202509161821
crash 8.0.4
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
Copyright (C) 2015, 2021 VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
# echo $?
1
running as root user and file is readable fine:
$ :/var/crash/202509161821# ls -l
total 299144
-rw------- 1 root whoopsie 119627 Sep 16 18:21 dmesg.202509161821
-rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821
symbol file is there:
# ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic*
-rw-r--r-- 1 root root 450705920 Aug 14 18:02 /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
tail of strace:
14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008>
14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008>
14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008>
14:06:20.661360 write(1, "\n", 1
) = 1 <0.000119>
14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010>
14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 <0.000011>
14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such file or directory) <0.002921>
14:06:20.664817 exit_group(1) = ?
14:06:20.690105 +++ exited with 1 +++
full crash strace https://filebin.net/custom-bin/crash.strace.1
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: crash 8.0.4-1ubuntu2
ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8
Uname: Linux 6.14.0-29-generic x86_64
ApportVersion: 2.28.1-0ubuntu3.8
Architecture: amd64
CasperMD5CheckResult: pass
Date: Thu Sep 18 20:21:26 2025
InstallationDate: Installed on 2025-09-04 (14 days ago)
InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 (20250215)
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
SourcePackage: crash
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/crash/+bug/2125145/+subscriptions
More information about the Ubuntu-sponsors
mailing list