[Bug 1568426] [NEW] md: detects stale members ahead of in-sync members

Peter Cordes peter at cordes.ca
Sun Apr 10 00:24:54 UTC 2016


Public bug reported:

My system boots from XFS on RAID10 on GPT partitions (no LVM).  The
RAID10 uses the "far2" layout, and has three component devices.  I use
grub-pc for non-EFI booting, because this system is old and doesn't
support EFI (Intel DG965WH from 2008).

I added a fourth hard drive and shuffled my data around so I could re-
partition the existing drives.
(http://unix.stackexchange.com/questions/74924/how-to-safely-replace-a
-not-yet-failed-disk-in-a-linux-raid5-array)

A few weeks the final `mdadm /dev/md0 --replace /dev/sda1 --with
/dev/sdd1`, grub failed to boot.  Error messages included "invalid arch-
independent ELF magic", and `insmod linux` giving "not a regular file".
Booting an Ubuntu live USB showed no problem with the FS, and none of
dpkg-reconfigure grub-pc;  grub-install /dev/sda ; update-grub helped.
Before those attempts to fix it, grub was loading a messed-up menu but
not quite booting Linux.  After re-running  grub-install, it stopped at
the  grub rescue> prompt.

sda is the first BIOS disk, but even having my BIOS boot a different
disk didn't help.  Presumably that doesn't affect the order GRUB detects
them in.

I eventually solved the problem by swapping the SATA cables so the drive
that didn't have a member of the boot array was not the first BIOS drive
anymore.  Now everything works perfectly.

I think GRUB's md code is including the first N members it sees, whether
they're stale or not.  Linux's MD code finds all candidates, and then
picks N in-sync ones if available.

This was really hard to diagnose, because disk churn hadn't got the data
so far out of sync that there were XFS errors.  Directory listings of
/boot/grub/i386-pc worked from the grub rescue shell, but the actual
data in some of the files didn't match.  (And even some of the inode
contents were different, too, hence the "not a regular file")

I think wiping the RAID signature would have solved the problem as well.
(mdadm --zero-superblock /dev/sda2, after making sure that was actually
the stale device in the live-USB environment)

Here's mdadm -E from the stale component (which was sda2 before swapping cables, now it's sdd2).
This is what a component looks like after a --replace and --remove is done with it.  After that: mdadm --detail /dev/md/root

peter at tesla:~$ sudo mdadm --examine /dev/sdd2
/dev/sdd2:    #######
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
           Name : tesla:root  (local to host tesla)
  Creation Time : Thu Apr 16 14:26:50 2015          ### note that's 2015, last year.
     Raid Level : raid10
   Raid Devices : 3

 Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
     Array Size : 23027712 (21.96 GiB 23.58 GB)
    Data Offset : 16384 sectors
   Super Offset : 8 sectors
   Unused Space : before=16296 sectors, after=0 sectors
          State : clean
    Device UUID : 8ae879d7:b5c6b0ad:f2d6c787:49284d4b

    Update Time : Wed Mar 16 02:49:17 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 1c62e134 - correct
         Events : 2708

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 2
   Array State : AAR ('A' == active, '.' == missing, 'R' == replacing)

/dev/sda2:           ##### An in-sync component
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
           Name : tesla:root  (local to host tesla)
  Creation Time : Thu Apr 16 14:26:50 2015
     Raid Level : raid10
   Raid Devices : 3

 Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
     Array Size : 23027712 (21.96 GiB 23.58 GB)
    Data Offset : 16384 sectors
   Super Offset : 8 sectors
   Unused Space : before=16296 sectors, after=0 sectors
          State : clean
    Device UUID : 5d6bb778:1700264b:bd7aadba:11336f0b

    Update Time : Sat Apr  9 16:48:18 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4e39b4c0 - correct
         Events : 2740

         Layout : far=2
     Chunk Size : 1024K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

peter at tesla:~$ sudo mdadm --detail /dev/md/root
/dev/md/root:
        Version : 1.2
  Creation Time : Thu Apr 16 14:26:50 2015
     Raid Level : raid10
     Array Size : 23027712 (21.96 GiB 23.58 GB)
  Used Dev Size : 15351808 (14.64 GiB 15.72 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sat Apr  9 21:19:32 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 1024K

           Name : tesla:root  (local to host tesla)
           UUID : e0ad8202:4c270099:9f28ddd6:b597231d
         Events : 2740

    Number   Major   Minor   RaidDevice State
       3       8       18        0      active sync   /dev/sdb2
       4       8        2        1      active sync   /dev/sda2
       6       8       34        2      active sync   /dev/sdc2

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: grub-pc 2.02~beta2-29ubuntu0.3
ProcVersionSignature: Ubuntu 4.2.0-35.40-generic 4.2.8-ckt5
Uname: Linux 4.2.0-35-generic x86_64
ApportVersion: 2.19.1-0ubuntu5
Architecture: amd64
CurrentDesktop: KDE
Date: Sat Apr  9 20:53:19 2016
SourcePackage: grub2
UpgradeStatus: Upgraded to wily on 2015-11-12 (149 days ago)

** Affects: grub2 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug wily

** Description changed:

  My system boots from XFS on RAID10 on GPT partitions (no LVM).  The
  RAID10 uses the "far2" layout, and has three component devices.  I use
  grub-pc for non-EFI booting, because this system is old and doesn't
  support EFI (Intel DG965WH from 2008).
  
  I added a fourth hard drive and shuffled my data around so I could re-
  partition the existing drives.
+ (http://unix.stackexchange.com/questions/74924/how-to-safely-replace-a
+ -not-yet-failed-disk-in-a-linux-raid5-array)
  
  A few weeks the final `mdadm /dev/md0 --replace /dev/sda1 --with
  /dev/sdd1`, grub failed to boot.  Error messages included "invalid arch-
  independent ELF magic", and `insmod linux` giving "not a regular file".
  Booting an Ubuntu live USB showed no problem with the FS, and none of
  dpkg-reconfigure grub-pc;  grub-install /dev/sda ; update-grub helped.
  Before those attempts to fix it, grub was loading a messed-up menu but
  not quite booting Linux.  After re-running  grub-install, it stopped at
  the  grub rescue> prompt.
  
  sda is the first BIOS disk, but even having my BIOS boot a different
  disk didn't help.  Presumably that doesn't affect the order GRUB detects
  them in.
  
  I eventually solved the problem by swapping the SATA cables so the drive
  that didn't have a member of the boot array was not the first BIOS drive
  anymore.  Now everything works perfectly.
  
  I think GRUB's md code is including the first N members it sees, whether
  they're stale or not.  Linux's MD code finds all candidates, and then
  picks N in-sync ones if available.
  
  This was really hard to diagnose, because disk churn hadn't got the data
  so far out of sync that there were XFS errors.  Directory listings of
  /boot/grub/i386-pc worked from the grub rescue shell, but the actual
  data in some of the files didn't match.  (And even some of the inode
  contents were different, too, hence the "not a regular file")
  
  I think wiping the RAID signature would have solved the problem as well.
  (mdadm --zero-superblock /dev/sda2, after making sure that was actually
  the stale device in the live-USB environment)
  
  Here's mdadm -E from the stale component (which was sda2 before swapping cables, now it's sdd2).
  This is what a component looks like after a --replace and --remove is done with it.  After that: mdadm --detail /dev/md/root
  
  peter at tesla:~$ sudo mdadm --examine /dev/sdd2
- /dev/sdd2:    ####### 
-           Magic : a92b4efc
-         Version : 1.2
-     Feature Map : 0x0
-      Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
-            Name : tesla:root  (local to host tesla)
-   Creation Time : Thu Apr 16 14:26:50 2015          ### note that's 2015, last year.
-      Raid Level : raid10
-    Raid Devices : 3
+ /dev/sdd2:    #######
+           Magic : a92b4efc
+         Version : 1.2
+     Feature Map : 0x0
+      Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
+            Name : tesla:root  (local to host tesla)
+   Creation Time : Thu Apr 16 14:26:50 2015          ### note that's 2015, last year.
+      Raid Level : raid10
+    Raid Devices : 3
  
-  Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
-      Array Size : 23027712 (21.96 GiB 23.58 GB)
-     Data Offset : 16384 sectors
-    Super Offset : 8 sectors
-    Unused Space : before=16296 sectors, after=0 sectors
-           State : clean
-     Device UUID : 8ae879d7:b5c6b0ad:f2d6c787:49284d4b
+  Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
+      Array Size : 23027712 (21.96 GiB 23.58 GB)
+     Data Offset : 16384 sectors
+    Super Offset : 8 sectors
+    Unused Space : before=16296 sectors, after=0 sectors
+           State : clean
+     Device UUID : 8ae879d7:b5c6b0ad:f2d6c787:49284d4b
  
-     Update Time : Wed Mar 16 02:49:17 2016
-   Bad Block Log : 512 entries available at offset 72 sectors
-        Checksum : 1c62e134 - correct
-          Events : 2708
+     Update Time : Wed Mar 16 02:49:17 2016
+   Bad Block Log : 512 entries available at offset 72 sectors
+        Checksum : 1c62e134 - correct
+          Events : 2708
  
-          Layout : far=2
-      Chunk Size : 1024K
+          Layout : far=2
+      Chunk Size : 1024K
  
-    Device Role : Active device 2
-    Array State : AAR ('A' == active, '.' == missing, 'R' == replacing)
- 
+    Device Role : Active device 2
+    Array State : AAR ('A' == active, '.' == missing, 'R' == replacing)
  
  /dev/sda2:           ##### An in-sync component
-           Magic : a92b4efc
-         Version : 1.2
-     Feature Map : 0x0
-      Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
-            Name : tesla:root  (local to host tesla)
-   Creation Time : Thu Apr 16 14:26:50 2015
-      Raid Level : raid10
-    Raid Devices : 3
+           Magic : a92b4efc
+         Version : 1.2
+     Feature Map : 0x0
+      Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
+            Name : tesla:root  (local to host tesla)
+   Creation Time : Thu Apr 16 14:26:50 2015
+      Raid Level : raid10
+    Raid Devices : 3
  
-  Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
-      Array Size : 23027712 (21.96 GiB 23.58 GB)
-     Data Offset : 16384 sectors
-    Super Offset : 8 sectors
-    Unused Space : before=16296 sectors, after=0 sectors
-           State : clean
-     Device UUID : 5d6bb778:1700264b:bd7aadba:11336f0b
+  Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
+      Array Size : 23027712 (21.96 GiB 23.58 GB)
+     Data Offset : 16384 sectors
+    Super Offset : 8 sectors
+    Unused Space : before=16296 sectors, after=0 sectors
+           State : clean
+     Device UUID : 5d6bb778:1700264b:bd7aadba:11336f0b
  
-     Update Time : Sat Apr  9 16:48:18 2016
-   Bad Block Log : 512 entries available at offset 72 sectors
-        Checksum : 4e39b4c0 - correct
-          Events : 2740
+     Update Time : Sat Apr  9 16:48:18 2016
+   Bad Block Log : 512 entries available at offset 72 sectors
+        Checksum : 4e39b4c0 - correct
+          Events : 2740
  
-          Layout : far=2
-      Chunk Size : 1024K
+          Layout : far=2
+      Chunk Size : 1024K
  
-    Device Role : Active device 1
-    Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
- 
+    Device Role : Active device 1
+    Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
  
  peter at tesla:~$ sudo mdadm --detail /dev/md/root
  /dev/md/root:
-         Version : 1.2
-   Creation Time : Thu Apr 16 14:26:50 2015
-      Raid Level : raid10
-      Array Size : 23027712 (21.96 GiB 23.58 GB)
-   Used Dev Size : 15351808 (14.64 GiB 15.72 GB)
-    Raid Devices : 3
-   Total Devices : 3
-     Persistence : Superblock is persistent
+         Version : 1.2
+   Creation Time : Thu Apr 16 14:26:50 2015
+      Raid Level : raid10
+      Array Size : 23027712 (21.96 GiB 23.58 GB)
+   Used Dev Size : 15351808 (14.64 GiB 15.72 GB)
+    Raid Devices : 3
+   Total Devices : 3
+     Persistence : Superblock is persistent
  
-     Update Time : Sat Apr  9 21:19:32 2016
-           State : clean 
-  Active Devices : 3
+     Update Time : Sat Apr  9 21:19:32 2016
+           State : clean
+  Active Devices : 3
  Working Devices : 3
-  Failed Devices : 0
-   Spare Devices : 0
+  Failed Devices : 0
+   Spare Devices : 0
  
-          Layout : far=2
-      Chunk Size : 1024K
+          Layout : far=2
+      Chunk Size : 1024K
  
-            Name : tesla:root  (local to host tesla)
-            UUID : e0ad8202:4c270099:9f28ddd6:b597231d
-          Events : 2740
+            Name : tesla:root  (local to host tesla)
+            UUID : e0ad8202:4c270099:9f28ddd6:b597231d
+          Events : 2740
  
-     Number   Major   Minor   RaidDevice State
-        3       8       18        0      active sync   /dev/sdb2
-        4       8        2        1      active sync   /dev/sda2
-        6       8       34        2      active sync   /dev/sdc2
+     Number   Major   Minor   RaidDevice State
+        3       8       18        0      active sync   /dev/sdb2
+        4       8        2        1      active sync   /dev/sda2
+        6       8       34        2      active sync   /dev/sdc2
  
  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: grub-pc 2.02~beta2-29ubuntu0.3
  ProcVersionSignature: Ubuntu 4.2.0-35.40-generic 4.2.8-ckt5
  Uname: Linux 4.2.0-35-generic x86_64
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: amd64
  CurrentDesktop: KDE
  Date: Sat Apr  9 20:53:19 2016
  SourcePackage: grub2
  UpgradeStatus: Upgraded to wily on 2015-11-12 (149 days ago)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1568426

Title:
  md: detects stale members ahead of in-sync members

Status in grub2 package in Ubuntu:
  New

Bug description:
  My system boots from XFS on RAID10 on GPT partitions (no LVM).  The
  RAID10 uses the "far2" layout, and has three component devices.  I use
  grub-pc for non-EFI booting, because this system is old and doesn't
  support EFI (Intel DG965WH from 2008).

  I added a fourth hard drive and shuffled my data around so I could re-
  partition the existing drives.
  (http://unix.stackexchange.com/questions/74924/how-to-safely-replace-a
  -not-yet-failed-disk-in-a-linux-raid5-array)

  A few weeks the final `mdadm /dev/md0 --replace /dev/sda1 --with
  /dev/sdd1`, grub failed to boot.  Error messages included "invalid
  arch-independent ELF magic", and `insmod linux` giving "not a regular
  file".  Booting an Ubuntu live USB showed no problem with the FS, and
  none of dpkg-reconfigure grub-pc;  grub-install /dev/sda ; update-grub
  helped.  Before those attempts to fix it, grub was loading a messed-up
  menu but not quite booting Linux.  After re-running  grub-install, it
  stopped at the  grub rescue> prompt.

  sda is the first BIOS disk, but even having my BIOS boot a different
  disk didn't help.  Presumably that doesn't affect the order GRUB
  detects them in.

  I eventually solved the problem by swapping the SATA cables so the
  drive that didn't have a member of the boot array was not the first
  BIOS drive anymore.  Now everything works perfectly.

  I think GRUB's md code is including the first N members it sees,
  whether they're stale or not.  Linux's MD code finds all candidates,
  and then picks N in-sync ones if available.

  This was really hard to diagnose, because disk churn hadn't got the
  data so far out of sync that there were XFS errors.  Directory
  listings of /boot/grub/i386-pc worked from the grub rescue shell, but
  the actual data in some of the files didn't match.  (And even some of
  the inode contents were different, too, hence the "not a regular
  file")

  I think wiping the RAID signature would have solved the problem as
  well.  (mdadm --zero-superblock /dev/sda2, after making sure that was
  actually the stale device in the live-USB environment)

  Here's mdadm -E from the stale component (which was sda2 before swapping cables, now it's sdd2).
  This is what a component looks like after a --replace and --remove is done with it.  After that: mdadm --detail /dev/md/root

  peter at tesla:~$ sudo mdadm --examine /dev/sdd2
  /dev/sdd2:    #######
            Magic : a92b4efc
          Version : 1.2
      Feature Map : 0x0
       Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
             Name : tesla:root  (local to host tesla)
    Creation Time : Thu Apr 16 14:26:50 2015          ### note that's 2015, last year.
       Raid Level : raid10
     Raid Devices : 3

   Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
       Array Size : 23027712 (21.96 GiB 23.58 GB)
      Data Offset : 16384 sectors
     Super Offset : 8 sectors
     Unused Space : before=16296 sectors, after=0 sectors
            State : clean
      Device UUID : 8ae879d7:b5c6b0ad:f2d6c787:49284d4b

      Update Time : Wed Mar 16 02:49:17 2016
    Bad Block Log : 512 entries available at offset 72 sectors
         Checksum : 1c62e134 - correct
           Events : 2708

           Layout : far=2
       Chunk Size : 1024K

     Device Role : Active device 2
     Array State : AAR ('A' == active, '.' == missing, 'R' == replacing)

  /dev/sda2:           ##### An in-sync component
            Magic : a92b4efc
          Version : 1.2
      Feature Map : 0x0
       Array UUID : e0ad8202:4c270099:9f28ddd6:b597231d
             Name : tesla:root  (local to host tesla)
    Creation Time : Thu Apr 16 14:26:50 2015
       Raid Level : raid10
     Raid Devices : 3

   Avail Dev Size : 30703616 (14.64 GiB 15.72 GB)
       Array Size : 23027712 (21.96 GiB 23.58 GB)
      Data Offset : 16384 sectors
     Super Offset : 8 sectors
     Unused Space : before=16296 sectors, after=0 sectors
            State : clean
      Device UUID : 5d6bb778:1700264b:bd7aadba:11336f0b

      Update Time : Sat Apr  9 16:48:18 2016
    Bad Block Log : 512 entries available at offset 72 sectors
         Checksum : 4e39b4c0 - correct
           Events : 2740

           Layout : far=2
       Chunk Size : 1024K

     Device Role : Active device 1
     Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

  peter at tesla:~$ sudo mdadm --detail /dev/md/root
  /dev/md/root:
          Version : 1.2
    Creation Time : Thu Apr 16 14:26:50 2015
       Raid Level : raid10
       Array Size : 23027712 (21.96 GiB 23.58 GB)
    Used Dev Size : 15351808 (14.64 GiB 15.72 GB)
     Raid Devices : 3
    Total Devices : 3
      Persistence : Superblock is persistent

      Update Time : Sat Apr  9 21:19:32 2016
            State : clean
   Active Devices : 3
  Working Devices : 3
   Failed Devices : 0
    Spare Devices : 0

           Layout : far=2
       Chunk Size : 1024K

             Name : tesla:root  (local to host tesla)
             UUID : e0ad8202:4c270099:9f28ddd6:b597231d
           Events : 2740

      Number   Major   Minor   RaidDevice State
         3       8       18        0      active sync   /dev/sdb2
         4       8        2        1      active sync   /dev/sda2
         6       8       34        2      active sync   /dev/sdc2

  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: grub-pc 2.02~beta2-29ubuntu0.3
  ProcVersionSignature: Ubuntu 4.2.0-35.40-generic 4.2.8-ckt5
  Uname: Linux 4.2.0-35-generic x86_64
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: amd64
  CurrentDesktop: KDE
  Date: Sat Apr  9 20:53:19 2016
  SourcePackage: grub2
  UpgradeStatus: Upgraded to wily on 2015-11-12 (149 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1568426/+subscriptions



More information about the foundations-bugs mailing list