[Bug 2036467] Re: superblock checksum mismatch in resize2fs

Krister Johansen 2036467 at bugs.launchpad.net
Sat Oct 7 01:55:42 UTC 2023


Thanks for all the responses.  I'm not sure how quickly I'll be able to
get to this either, so I'm hesitant to commit to fixing myself.  That
said, if I can get time to send patches before your team gets to fixing
it, I'll do my best.

To answer the question about how frequently we see this: it was about
4-5 times a day until I applied the patches to our forked version of
e2fsprogs.

A few other things to note about what's going on here.  In 1.45.7,
e2fsprogs added some additional retries to the checksum validation path
on open:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=6338a8467564c3a0a12e9fcb08bdd748d736ac2f

I picked up this patch as well, and found that it helped a bit, but I
was still able to reproduce the problem with the reproducer that I
shared.

My team is running on the linux-aws-5.15 HWE kernel that's from jammy
but shipped to focal.  There's a kernel fix that may help with this
problem too, and it has been present since 5.10.  That said, I haven't
tested this on systems that are running 5.4.  (We don't have very many
of these anymore.)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=05c2c00f3769abb9e323fcaca70d2de0b48af7ba

The 05c2c00f3769 ("ext4: protect superblock modifications with a buffer
lock") may help to ensure that the superblock contents are always
consistent on disk, prior to the DIO read, since the directio path
writes out any dirty cached sb pages prior to issuing the read.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Confirmed
Status in e2fsprogs source package in Focal:
  New
Status in e2fsprogs source package in Jammy:
  New
Status in e2fsprogs source package in Mantic:
  Confirmed

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch.  We debugged this internally, and were able to come up with the following reproducer:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
             parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
             sleep .5
             mkfs.ext4 /dev/nvme1n1p1
             mount -t ext4 /dev/nvme1n1p1 /mnt
             stress-ng --temp-path /mnt -D 4 &
             STRESS_PID=$!
             sleep 1
             growpart /dev/nvme1n1 1
             resize2fs /dev/nvme1n1p1
             kill $STRESS_PID
             wait $STRESS_PID
             umount /mnt
             wipefs -a /dev/nvme1n1p1
             wipefs -a /dev/nvme1n1
     done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions




More information about the foundations-bugs mailing list