APPLIED[F]: [B/F][PATCH 0/5] ext4/jbd2: data=journal: write-protect pages on transaction commit
Kelsey Skunberg
kelsey.skunberg at canonical.com
Thu Sep 23 22:57:09 UTC 2021
Applied to Focal master-next. Thank you!
-Kelsey
On 2021-09-09 17:22:21 , Mauricio Faria de Oliveira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1847340
>
> [Impact]
>
> With mmap()ed files on ext4's data journaling it's possible to change
> a mapped page's buffers contents during their jbd2 transaction commit
> (as currently nothing prevents/blocks the write access at that time.)
>
> This might happen between the buffers checksum calculation and actual
> write to journal, so the (old) checksum is invalid for the (new) data.
>
> If the system crashes after that, but before such journal entry makes
> it to the filesystem, the journal replay on the next mount just fails,
> and the filesystem now requires fsck. (apparently curtin might set up
> /etc/fstab with passno=0, requiring manual intervention.)
>
> [39751.096455] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
> [39751.114435] JBD2: Invalid checksum recovering block 87305 in log
> [39751.146133] JBD2: Invalid checksum recovering block 88039 in log
> [39751.195950] JBD2: Invalid checksum recovering block 49633 in log
> [39751.265158] JBD2: recovery failed
> [39751.265163] EXT4-fs (vdc): error loading journal
>
> [Fix]
>
> The fix is to write-protect the pages during journal transaction commit,
> so that writes to mapped pages hit a page fault, then ext4's page_mkwrite
> hook can block until the commit finishes and the buffers can be modified.
>
> In order to do that, add jbd2 journal callbacks that the filesystems can
> customize, called before/after the critical region in transaction commit,
> then have ext4 in data journaling mode to write-protect the pages whose
> buffers are being committed (and handle cases that need pages redirtied.)
>
> The changes are restricted to the data journaling mode and page_mkwrite
> hook, and other modes/paths use the same code/behavior in the callbacks.
>
> [Test Case]
>
> Set up an ext4 filesystem in data journaling mode, and run stress-ng's
> mmap file test on it, then crash the system after a bit; check whether
> the filesystem can mount again or not (i.e., with jbd2 checksum errors.)
>
> # mkfs.ext4 $DEV
> # mount -o data=journal $DEV $DIR
> # cd $DIR
> # stress-ng --mmap $((4*$(nproc))) --mmap-file &
> # sleep 60
> # echo c >/proc/sysrq-trigger
> ...
> # mount -o data=journal $DEV $DIR # PASS/FAIL.
> # dmesg | tail
>
> [Regression Potential]
>
> Regressions would likely manifest in ext4 data journaling mode (which
> is not the default mode, 'ordered') with memory mapped access, as the
> other modes/paths are largely unaffected by the changes/same behavior.
>
> This has been tested with (x)fstests, that showed no regressions on
> data=ordered and data=journal on both Bionic and Focal (with kernel
> versions 4.15.0-156-generic and 5.4.0-84-generic) w/in 10 runs each.
> And the stress-ng test-case as well. (Numbers/details in the LP bug.)
>
> [Other info]
>
> The patchset is applied on 5.10, so Hirsute (5.11) is already fixed;
> only Focal and Bionic need it.
>
> There are little changes in the patches between Focal and Bionic
> (mostly minor backport adjustments, mainly due to no vm_fault_t)
> but unfortunately that needs separate versions for most patches.
>
> Thanks!
>
> --
> Mauricio
>
>
>
> Jan Kara (1):
> ext4: fix mmap write protection for data=journal mode
>
> Mauricio Faria de Oliveira (4):
> jbd2: introduce/export functions
> jbd2_journal_submit|finish_inode_data_buffers()
> jbd2, ext4, ocfs2: introduce/use journal callbacks
> j_submit|finish_inode_data_buffers()
> ext4: data=journal: fixes for ext4_page_mkwrite()
> ext4: data=journal: write-protect pages on
> j_submit_inode_data_buffers()
>
> fs/ext4/inode.c | 63 +++++++++++++++++++++++++++-----
> fs/ext4/super.c | 87 ++++++++++++++++++++++++++++++++++++++++++++
> fs/jbd2/commit.c | 62 ++++++++++++++++---------------
> fs/jbd2/journal.c | 2 +
> fs/ocfs2/journal.c | 4 ++
> include/linux/jbd2.h | 29 ++++++++++++++-
> 6 files changed, 207 insertions(+), 40 deletions(-)
>
> --
> 2.30.2
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list