PROBLEM: UBSAN enabled in 5.14 and 5.13.14 kernels leads to kernel crash

Andrea Righi andrea.righi at canonical.com
Sun Sep 5 06:50:37 UTC 2021


On Sat, Sep 04, 2021 at 08:20:31PM +0000, Andrew Moes wrote:
> Hi,
> 
> Problematic change: https://lists.ubuntu.com/archives/kernel-team/2021-August/123425.html
> Affected kernels: 5.14, 5.14.1 and 5.13.14
> Last working kernel: 5.13.13
> 
> Problem statement:
> 
> The server with NVDIMMs won't boot throwing lots of misleading errors.
> 
> I've spent significant amount time troubleshooting why our edge server won't boot with 5.14 kernel leading to various udev failures, acpi errors and failures to communicate with ipmi before coming to a full stop. After disabling most kernel modules that were being loaded when the kernel tainted, I finally isolated it to "nfit" module and had to pull it off the rack and had to physically remove NVDIMMs to make it boot again.
> 
> While I was going through upstream commits and config options changes between 5.13 and 5.14, I decided to install 5.13.14, which surprisingly led to the same crash and allowed me to narrow it down to enabled UBSAN. I rebuilt all problematic kernels with UBSAN off and it solved the issue immediately.
> 
> Available workarounds:
> 
> 
>   1.  Remove all Intel Optane Persistent memory (PMEM, NVDIMM) or disable UBSAN.
> 
> Proposed action:
> 
> Disable UBSAN. As per: https://github.com/torvalds/linux/blob/master/lib/Kconfig.ubsan#L23  , having it enabled in this configuration may lead to undesired instability on systems that never had issues before.

Thanks for sharing this!

We've already disabled UBSAN in 5.13.0-16.16, because it was causing
other crashes / issues on specific test platforms. We're currently
stress-testing this new kernel and for now it's available in the
bootstrap ppa:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/bootstrap

Commit that reverts UBSAN:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/impish/commit/?h=Ubuntu-5.13.0-16.16&id=ce335d474b76598cefdbab2cfae53a0c94a5e72c

We will apply the same change also to the 5.14 kernel.

Thanks,
-Andrea



More information about the kernel-team mailing list