PROBLEM: UBSAN enabled in 5.14 and 5.13.14 kernels leads to kernel crash
Andrea Righi
andrea.righi at canonical.com
Sun Sep 5 06:50:37 UTC 2021
On Sat, Sep 04, 2021 at 08:20:31PM +0000, Andrew Moes wrote:
> Hi,
>
> Problematic change: https://lists.ubuntu.com/archives/kernel-team/2021-August/123425.html
> Affected kernels: 5.14, 5.14.1 and 5.13.14
> Last working kernel: 5.13.13
>
> Problem statement:
>
> The server with NVDIMMs won't boot throwing lots of misleading errors.
>
> I've spent significant amount time troubleshooting why our edge server won't boot with 5.14 kernel leading to various udev failures, acpi errors and failures to communicate with ipmi before coming to a full stop. After disabling most kernel modules that were being loaded when the kernel tainted, I finally isolated it to "nfit" module and had to pull it off the rack and had to physically remove NVDIMMs to make it boot again.
>
> While I was going through upstream commits and config options changes between 5.13 and 5.14, I decided to install 5.13.14, which surprisingly led to the same crash and allowed me to narrow it down to enabled UBSAN. I rebuilt all problematic kernels with UBSAN off and it solved the issue immediately.
>
> Available workarounds:
>
>
> 1. Remove all Intel Optane Persistent memory (PMEM, NVDIMM) or disable UBSAN.
>
> Proposed action:
>
> Disable UBSAN. As per: https://github.com/torvalds/linux/blob/master/lib/Kconfig.ubsan#L23 , having it enabled in this configuration may lead to undesired instability on systems that never had issues before.
Thanks for sharing this!
We've already disabled UBSAN in 5.13.0-16.16, because it was causing
other crashes / issues on specific test platforms. We're currently
stress-testing this new kernel and for now it's available in the
bootstrap ppa:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/bootstrap
Commit that reverts UBSAN:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/impish/commit/?h=Ubuntu-5.13.0-16.16&id=ce335d474b76598cefdbab2cfae53a0c94a5e72c
We will apply the same change also to the 5.14 kernel.
Thanks,
-Andrea
More information about the kernel-team
mailing list