ACK: [B/F][PATCH 0/5] ext4/jbd2: data=journal: write-protect pages on transaction commit
Stefan Bader
stefan.bader at canonical.com
Mon Sep 20 07:22:47 UTC 2021
On 09.09.21 22:22, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1847340
>
> [Impact]
>
> With mmap()ed files on ext4's data journaling it's possible to change
> a mapped page's buffers contents during their jbd2 transaction commit
> (as currently nothing prevents/blocks the write access at that time.)
>
> This might happen between the buffers checksum calculation and actual
> write to journal, so the (old) checksum is invalid for the (new) data.
>
> If the system crashes after that, but before such journal entry makes
> it to the filesystem, the journal replay on the next mount just fails,
> and the filesystem now requires fsck. (apparently curtin might set up
> /etc/fstab with passno=0, requiring manual intervention.)
>
> [39751.096455] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
> [39751.114435] JBD2: Invalid checksum recovering block 87305 in log
> [39751.146133] JBD2: Invalid checksum recovering block 88039 in log
> [39751.195950] JBD2: Invalid checksum recovering block 49633 in log
> [39751.265158] JBD2: recovery failed
> [39751.265163] EXT4-fs (vdc): error loading journal
>
> [Fix]
>
> The fix is to write-protect the pages during journal transaction commit,
> so that writes to mapped pages hit a page fault, then ext4's page_mkwrite
> hook can block until the commit finishes and the buffers can be modified.
>
> In order to do that, add jbd2 journal callbacks that the filesystems can
> customize, called before/after the critical region in transaction commit,
> then have ext4 in data journaling mode to write-protect the pages whose
> buffers are being committed (and handle cases that need pages redirtied.)
>
> The changes are restricted to the data journaling mode and page_mkwrite
> hook, and other modes/paths use the same code/behavior in the callbacks.
>
> [Test Case]
>
> Set up an ext4 filesystem in data journaling mode, and run stress-ng's
> mmap file test on it, then crash the system after a bit; check whether
> the filesystem can mount again or not (i.e., with jbd2 checksum errors.)
>
> # mkfs.ext4 $DEV
> # mount -o data=journal $DEV $DIR
> # cd $DIR
> # stress-ng --mmap $((4*$(nproc))) --mmap-file &
> # sleep 60
> # echo c >/proc/sysrq-trigger
> ...
> # mount -o data=journal $DEV $DIR # PASS/FAIL.
> # dmesg | tail
>
> [Regression Potential]
>
> Regressions would likely manifest in ext4 data journaling mode (which
> is not the default mode, 'ordered') with memory mapped access, as the
> other modes/paths are largely unaffected by the changes/same behavior.
>
> This has been tested with (x)fstests, that showed no regressions on
> data=ordered and data=journal on both Bionic and Focal (with kernel
> versions 4.15.0-156-generic and 5.4.0-84-generic) w/in 10 runs each.
> And the stress-ng test-case as well. (Numbers/details in the LP bug.)
>
> [Other info]
>
> The patchset is applied on 5.10, so Hirsute (5.11) is already fixed;
> only Focal and Bionic need it.
>
> There are little changes in the patches between Focal and Bionic
> (mostly minor backport adjustments, mainly due to no vm_fault_t)
> but unfortunately that needs separate versions for most patches.
>
> Thanks!
>
Ok, modifying ext4 or its components generally is always scary. At least for F
all changes are cherry picks and required changes for B are small. Also there
has been some effort doing beforehand regression testing. That should limit the
risk.
Acked-by: Stefan Bader <stefan.bader at canonical.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20210920/f6fcd9c3/attachment.sig>
More information about the kernel-team
mailing list