ACK: [B/F][PATCH 0/5] ext4/jbd2: data=journal: write-protect pages on transaction commit

Stefan Bader stefan.bader at canonical.com
Mon Sep 20 07:22:47 UTC 2021


On 09.09.21 22:22, Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1847340
> 
> [Impact]
> 
> With mmap()ed files on ext4's data journaling it's possible to change
> a mapped page's buffers contents during their jbd2 transaction commit
> (as currently nothing prevents/blocks the write access at that time.)
> 
> This might happen between the buffers checksum calculation and actual
> write to journal, so the (old) checksum is invalid for the (new) data.
> 
> If the system crashes after that, but before such journal entry makes
> it to the filesystem, the journal replay on the next mount just fails,
> and the filesystem now requires fsck. (apparently curtin might set up
> /etc/fstab with passno=0, requiring manual intervention.)
> 
>      [39751.096455] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
>      [39751.114435] JBD2: Invalid checksum recovering block 87305 in log
>      [39751.146133] JBD2: Invalid checksum recovering block 88039 in log
>      [39751.195950] JBD2: Invalid checksum recovering block 49633 in log
>      [39751.265158] JBD2: recovery failed
>      [39751.265163] EXT4-fs (vdc): error loading journal
> 
> [Fix]
> 
> The fix is to write-protect the pages during journal transaction commit,
> so that writes to mapped pages hit a page fault, then ext4's page_mkwrite
> hook can block until the commit finishes and the buffers can be modified.
> 
> In order to do that, add jbd2 journal callbacks that the filesystems can
> customize, called before/after the critical region in transaction commit,
> then have ext4 in data journaling mode to write-protect the pages whose
> buffers are being committed (and handle cases that need pages redirtied.)
> 
> The changes are restricted to the data journaling mode and page_mkwrite
> hook, and other modes/paths use the same code/behavior in the callbacks.
> 
> [Test Case]
> 
> Set up an ext4 filesystem in data journaling mode, and run stress-ng's
> mmap file test on it, then crash the system after a bit; check whether
> the filesystem can mount again or not (i.e., with jbd2 checksum errors.)
> 
>      # mkfs.ext4 $DEV
>      # mount -o data=journal $DEV $DIR
>      # cd $DIR
>      # stress-ng --mmap $((4*$(nproc))) --mmap-file &
>      # sleep 60
>      # echo c >/proc/sysrq-trigger
>      ...
>      # mount -o data=journal $DEV $DIR   # PASS/FAIL.
>      # dmesg | tail
> 
> [Regression Potential]
> 
> Regressions would likely manifest in ext4 data journaling mode (which
> is not the default mode, 'ordered') with memory mapped access, as the
> other modes/paths are largely unaffected by the changes/same behavior.
> 
> This has been tested with (x)fstests, that showed no regressions on
> data=ordered and data=journal on both Bionic and Focal (with kernel
> versions 4.15.0-156-generic and 5.4.0-84-generic) w/in 10 runs each.
> And the stress-ng test-case as well. (Numbers/details in the LP bug.)
> 
> [Other info]
> 
> The patchset is applied on 5.10, so Hirsute (5.11) is already fixed;
> only Focal and Bionic need it.
> 
> There are little changes in the patches between Focal and Bionic
> (mostly minor backport adjustments, mainly due to no vm_fault_t)
> but unfortunately that needs separate versions for most patches.
> 
> Thanks!
> 
Ok, modifying ext4 or its components generally is always scary. At least for F 
all changes are cherry picks and required changes for B are small. Also there 
has been some effort doing beforehand regression testing. That should limit the 
risk.

Acked-by: Stefan Bader <stefan.bader at canonical.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20210920/f6fcd9c3/attachment.sig>


More information about the kernel-team mailing list