[SRU][N][O][PATCH 0/1] btrfs will WARN_ON() in btrfs_remove_qgroup() unnecessarily

Matthew Ruffell matthew.ruffell at canonical.com
Fri Jan 24 01:39:36 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2091719

[Impact]

The following commit for noble and oracular introduced two new WARN_ON() calls
in btrfs qgroup removals, and even though the author at the time believed they
would not be reachable, it turns out it can happen quite frequently in the
right conditions.

ubuntu-noble b2ad25ba539452f492805e5f7d94e80894aa860f
commit a776bf5f3c2300cfdf8a195663460b1793ac9847
Author: Qu Wenruo <wqu at suse.com>
Date: Fri Apr 19 14:29:32 2024 +0930
Subject: btrfs: slightly loosen the requirement for qgroup removal
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a776bf5f3c2300cfdf8a195663460b1793ac9847

$ git describe --contains b2ad25ba539452f492805e5f7d94e80894aa860f
Ubuntu-6.8.0-50.51~143

This primarily affects the systemd CI that runs integration tests on merge:
https://github.com/systemd/systemd/actions/runs/12297539029/job/34318915884?pr=35589

Kernel panic - not syncing: kernel: panic_on_warn set ...
CPU: 0 PID: 1316 Comm: (sd-clean) Not tainted 6.8.0-50-generic #51-Ubuntu
Call Trace:
 <TASK>
 dump_stack_lvl+0x27/0xa0
 dump_stack+0x10/0x20
 panic+0x366/0x3c0
 ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
 check_panic_on_warn+0x4f/0x60
 __warn+0x95/0x160
 ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
 report_bug+0x17e/0x1b0
 handle_bug+0x51/0xa0
 exc_invalid_op+0x18/0x80
 asm_exc_invalid_op+0x1b/0x20
RIP: 0010:btrfs_remove_qgroup+0x271/0x490 [btrfs]
Code: c0 0f 85 27 fe ff ff 48 8b 43 b0 4c 39 f0 75 d5 4d 8d b5 e0 08 00 00 4c 89 f7 e8 8a 45 19 e2 48 83 7b 98 00 0f 84 52 01 00 00 <0f> 0b 49 8b 45 10 a8 10 74 42 41 f6 85 d0 08 00 00 0c 75 38 48 83
 ? btrfs_remove_qgroup+0x266/0x490 [btrfs]
 btrfs_ioctl+0x12b9/0x13a0 [btrfs]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __seccomp_filter+0x368/0x570
 ? __fput+0x15e/0x2e0
 __x64_sys_ioctl+0xa3/0xf0
 x64_sys_call+0x12a3/0x25a0
 do_syscall_64+0x7f/0x180
 entry_SYSCALL_64_after_hwframe+0x78/0x80

[Fix]

The fix just landed in mainline as:

commit c0def46dec9c547679a25fe7552c4bcbec0b0dd2
Author: Qu Wenruo <wqu at suse.com>
Date:   Mon Nov 11 07:29:07 2024 +1030
Subject: btrfs: improve the warning and error message for btrfs_remove_qgroup()
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0def46dec9c547679a25fe7552c4bcbec0b0dd2

The commit places the WARN_ON behind CONFIG_BTRFS_DEBUG, which silences the
warning for most users. It is safe to do so, as noted by the Author, as
the user space tool managing the qgroups would rescan them, to fix the
inconsistent view.

This is needed for both noble and oracular.

[Testcase]

The upstream systemd CI tests can consistently reproduce the issue, so the test
and proposed kernels will be run against the systemd CI for verification.

There is a test kernel available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp2091719-test

If you install it, the systemd CI will run to completion.

[Where problems could occur]

We are changing the WARN_ON() to occur only when CONFIG_BTRFS_DEBUG is enabled.
There is no other change in logic, so functionality should be the same as what
we have now.

If a regression were to occur, it would affect systems with btrfs filesystems
that are utilising subvolumes. It would not likely cause any data loss or disk
corruption, as userspace tools should be able to automatically fix up any
inconsistent views without user interaction.

[Other info]

Systemd upstream bisected the issue here:
https://github.com/systemd/systemd/pull/35567#issuecomment-2538160543

Qu Wenruo (1):
  btrfs: improve the warning and error message for btrfs_remove_qgroup()

 fs/btrfs/qgroup.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

-- 
2.45.2




More information about the kernel-team mailing list