[Bug 1801555] Re: [FATAL] mdadm --grow adds dirty disk to RAID1 without recovery

Mon Nov 5 17:50:33 UTC 2018

@caravena wrote:
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

This bug is NOT about btrfs!

The bug happens *below* btrfs, in the md RAID layer!
I merely used btrfs for its checksum capabilities so I can prove the corruption in the md layer.

Sorry if I hadn't made that clear enough.

Besides: btrfs' own RAID is not a suitable replacement for the md RAID
layer because it does not support full disk encryption yet.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1801555

Title:
  [FATAL] mdadm --grow adds dirty disk to RAID1 without recovery

Status in mdadm package in Ubuntu:
  New

Bug description:
  On Kubuntu 18.04.1 it is possible to cause a (non-bitmap!) RAID1 to consume an out-of-sync disk without recovery as if it were in sync.
  "$ cat /proc/mdstat" immediately shows the dirty disk as "U" = up after addition, WITHOUT a resync.

  I was able to reproduce this twice.

  This means arbitrary corruption to the filesystem can happen as you will have two mixed filesystem states in the RAID1.
  RAID1 balances reads across all disks so the state of either disk will be returned randomly.

  Steps to reproduce:

  1. Install via network installer, create the following partition layout manually:
  {sda1, sdb1} -> md RAID1 -> btrfs -> /boot
  {sda2, sdb2} -> md RAID1 -> dm-crypt -> btrfs -> /

  2. After the system is installed ensure the raid array has no bitmap.
  I won't provide instructions for this as my 16 GiB disks were small
  enough to avoid creation of a bitmap apparently. Check "$ cat
  /proc/mdstat" to confirm there is no bitmap.

  3. Boot with sdb physically disconnected. Boot will now hang at "Begin: Waiting for encrypted source device ...". That will timeout after a few minutes and drop to an initramfs shell, complaining that the disk doesn't exist. This is a separate bug, filed at #1196693
  To make it bootable again, do the following workaround in the initramfs shell:
  $ mdadm --run /dev/md0
  $ mdadm --run /dev/md1
  # Reduce size of array to stop the initramfstools from waiting for sdb forevery.
  $ mdadm --grow -n 1 --force /dev/md0
  $ mdadm --grow -n 1 --force /dev/md1
  $ reboot

  After "$ reboot", boot up the system fully with sdb still disconnected.
  Now the state of the two disks should be out of sync - booting surely produces at least one write.
  Reboot and apply the same procedure to sdb, with sda disconnected.

  4. Boot from one of the disks and do this:
  $ mdadm /dev/md0 --add /dev/sdb1
  $ mdadm /dev/md1 --add /dev/sdb2
  # The sdb partitions should now be listed as (S), i.e. spare
  $ cat /proc/mdstat
  # Grow the array to use up the spares
  $ mdadm /dev/md0 --grow -n 2 
  $ mdadm /dev/md1 --grow -n 2 
  # Now the bug shows: mdstat will say the array is in sync immediately:
  $ cat /proc/mdstat
  # And the kernel log will show that a recovery was started
  # - BUT completed within less than a second:
  $ dmesg
  [144.255918] md: recovery of RAID array md0
  [144.256176] md: md0: recovery done
  [151.776281] md: recovery of RAID array md1
  [151.776667] md: md1: recovery done

  Notice: I'm not sure whether this is a bug in mdadm or the kernel.
  Filing this as mdadm bug for now, if you figure out this is a kernel
  bug then please re-assign.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1801555/+subscriptions