[SRU][B][PATCH 0/2]btrfs: Attempting to balance a nearly full filesystem with relocated root nodes fails

Wed Jun 23 00:44:31 UTC 2021

BugLink: https://bugs.launchpad.net/bugs/1933172

[Impact]

If you attempt to balance a btrfs filesystem that is nearly full, and this
filesystem has had a lot of small, medium and large files created and deleted,
such that the b-tree needs to be rotated, when the balance fails due to not
having enough free space, the kernel oops, and the btrfs filesystem hangs.

It doesn't appear to cause any filesystem corruption, and is reproducible every
time on affected filesystems.

The following oops is generated:

general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs]
RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282
RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390
RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028
RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228
R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08
R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000
FS:  00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0
Call Trace:
 ? commit_fs_roots+0x130/0x1b0 [btrfs]
 ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs]
 btrfs_commit_transaction+0x42c/0x910 [btrfs]
 ? start_transaction+0x191/0x430 [btrfs]
 relocate_block_group+0x1e7/0x640 [btrfs]
 btrfs_relocate_block_group+0x18f/0x280 [btrfs]
 btrfs_relocate_chunk+0x38/0xd0 [btrfs]
 __btrfs_balance+0x972/0xcd0 [btrfs]
 ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs]
 btrfs_balance+0x32c/0x5a0 [btrfs]
 btrfs_ioctl_balance+0x320/0x390 [btrfs]
 btrfs_ioctl+0x5a6/0x2490 [btrfs]
 ? lru_cache_add_active_or_unevictable+0x36/0xb0
 ? __handle_mm_fault+0x9fd/0x1290
 do_vfs_ioctl+0xa8/0x630
 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
 ? do_vfs_ioctl+0xa8/0x630
 ? __do_page_fault+0x2a1/0x4b0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x73/0x130
 entry_SYSCALL_64_after_hwframe+0x41/0xa6
RIP: 0033:0x7f48d7228317
RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317
RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003
RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000
R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000
Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87 
RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0

I don't see this behaviour on any upstream kernel, and the first kernel to show
this behaviour is 4.15.0-109-generic. The current 4.15.0-145-generic is still
affected.

I believe that this is a regression introduced in the fixing of CVE-2019-19036.

[Testcase]

I haven't reliably been able to create a script which places a btrfs filesystem
into the state necessary to reproduce this issue, so I have just provided my
qcow2 image with my btrfs filesystem which reproduces the issue 100% of the time.

Download the image from here (warning size is 8.0gb):

https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2

Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to a new
virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an operating system,
it is just a data only volume.

Mount the volume:

$ sudo mount /dev/vdb /mnt

Attempt to balance:

$ sudo btrfs filesystem balance start --full-balance /mnt
Segmentation fault (core dumped)

Check dmesg for kernel oops:
https://paste.ubuntu.com/p/wjJNqKBCfh/

If you install the test kernel from the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test

You should see this instead:

$ sudo btrfs filesystem balance start --full-balance /mnt
ERROR: error during balancing '/mnt': No space left on device
There may be more info in syslog - try dmesg | tail

Checking dmesg shows no kernel oops, and just info about the volume being too
full to balance:

https://paste.ubuntu.com/p/4J8Gq2dtz4/

[Fix]

I found the problem to be introduced in 4.15.0-109-generic, and 
4.15.0-108-generic and earlier worked fine, which means we introduced a
regression somewhere.

I bisected the problem down to the following commit:

ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436
Author: Qu Wenruo <wqu at suse.com>
Date:   Tue May 19 10:13:20 2020 +0800
Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/4qfWCM8ykh/

Unfortunately, I believe this is a bad backport. If you examine the original
upstream commit:

commit 51415b6c1b117e223bc083e30af675cb5c5498f3
Author: Qu Wenruo <wqu at suse.com>
Date:   Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3

You will see the 4.15 backport has calls to free_extent_buffer() and 
btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to btrfs_put_root()
in the newer patches, and contains logic to free relocated roots, so I think
we might not need the calls to free_extent_buffer() to free the extents first,
since it might be handled later. 

The core issue is that we hit a general protection fault when attempting to
access a root node, which means we have freed a root node we shouldn't have.

If we look at the backport in 5.4.y, aka, the one in Focal:

ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116
Author: Qu Wenruo <wqu at suse.com>
Date:   Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/

It seems upstream -stable omitted the calls to btrfs_put_root() entirely, and
we don't need the calls to free_extent_buffer() because of it. 

If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-bionic, and
cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from ubuntu-focal, and
build, the problem no longer reproduces.

[Where problems could occur]

If a regression were to occur, it would affect users of btrfs filesystems, and
would likely show during a routine balance operation. Since the issue is
triggered during the cancellation of a balance operation, problems might occur
for users with nearly full filesystems or filesystems that have existing
corruption.

We are replacing a patch that was backported during the fixing of CVE-2019-19036,
and replacing it with a backport provided by upstream developers, which cherry
picks from 5.4.y to Bionic. The patch in 5.4.y is well tested by the community
and is currently in the Focal kernel.

With all modifications to btrfs, there is a risk of data corruption and
filesystem corruption for all btrfs users, since balances happen automatically
and on a regular basis. If a regression does happen, users should remount
their filesystems with the "nobalance" flag, backup their data, and attempt a 
repair if necessary.

[Other info]

A community member has hit this issue before I did, and has reported it upstream
to linux-btrfs here, although no one knew what was happening:

https://www.spinics.net/lists/linux-btrfs/msg103367.html

Matthew Ruffell (1):
  Revert "btrfs: reloc: fix reloc root leak and NULL pointer
    dereference"

Qu Wenruo (1):
  btrfs: reloc: fix reloc root leak and NULL pointer dereference

 fs/btrfs/relocation.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

-- 
2.30.2