[Bug 945786] [NEW] mdadm refuses to re-add failed member

iMac 945786 at bugs.launchpad.net
Sat Mar 3 19:04:25 UTC 2012


Public bug reported:

I have my /home in a three-disk RAID1 configuration (/dev/md1) with a
partition on my laptop and a second on an external disk connected via
eSATA; A third sits on a third external disk.    I booted up with two
members degraded (external drive not plugged in) and prior to login,
proceeded to use a console to  umount, remove and fail the active drive
(internal partition member) and stop the RAID1 disk, and then plug in my
external, re-starting the /dev/md1 device with the external partition
member active and remounting /home.   The process is one I have executed
many times before and is scripted from a couple of files in
/usr/local/bin.

However, this time after logging in with my external member active after
executing the process above, and attempting to re-add the internal drive
to bring the /dev/md1 device in sync with the external disk I received
an error suggesting the add failed.  I re-executed the remove, fail, re-
add manually with the same outcome as shown on my console below, and
filed this bug.

It seems the failed disk thinks it is still active, when I use -Q
--examine to interrogate it.

:~# mdadm /dev/md1 -r /dev/sda6
mdadm: hot remove failed for /dev/sda6: No such device or address
:~# mdadm /dev/md1 -f /dev/sda6
mdadm: set device faulty failed for /dev/sda6:  No such device
:~# mdadm /dev/md1 -a /dev/sda6
mdadm: /dev/sda6 reports being an active member for /dev/md1, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
:~# mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdc3[0]
      86003840 blocks [3/1] [U__]

unused devices: <none>
:~# apport-bug mdadm

Here is a quick summary of what I did,
a) My disks were synced on an 11.10 system
b) I upgraded from 11.10 to 12.04 with one member failed (external)
c) After upgrade I failed the active disk (internal), stopped the array, and restarted it with the external disk
d) Attempted to re-add the failed internal disk after logging in

:~# blkid | grep raid_member
/dev/sda6: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"
/dev/sdc3: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"

:~# mdadm -D /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sun Jul 27 22:53:23 2008
     Raid Level : raid1
     Array Size : 86003840 (82.02 GiB 88.07 GB)
  Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
   Raid Devices : 3
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Mar  3 13:56:05 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
         Events : 0.10186827

    Number   Major   Minor   RaidDevice State
       0       8       35        0      active sync   /dev/sdc3
       1       0        0        1      removed
       2       0        0        2      removed
:~# mdadm -Q /dev/sdc3
/dev/sdc3: is not an md array
/dev/sdc3: device 0 in 3 device active raid1 /dev/md1.  Use mdadm --examine for more detail.

:~# mdadm -Q /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 3 device mismatch raid1 /dev/md1.  Use mdadm --examine for more detail.

:~# mdadm -Q /dev/sda6 --examine
/dev/sda6:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
  Creation Time : Sun Jul 27 22:53:23 2008
     Raid Level : raid1
  Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
     Array Size : 86003840 (82.02 GiB 88.07 GB)
   Raid Devices : 3
  Total Devices : 1
Preferred Minor : 1

    Update Time : Sat Mar  3 13:28:57 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 60f50ddb - correct
         Events : 10128612

      Number   Major   Minor   RaidDevice State
this     1       8        6        1      active sync   /dev/sda6

   0     0       0        0        0      removed
   1     1       8        6        1      active sync   /dev/sda6
   2     2       0        0        2      faulty removed

clearly it is not active (0,8,35,0 is per -D output above), but it
thinks it is.

Captured enough.. time to reboot and see what happens; Hopefully an
auto-rebuild.   I have the third disk in the array separate should some
corruption happen here.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: mdadm 3.2.3-2ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-17.27-generic 3.2.6
Uname: Linux 3.2.0-17-generic x86_64
NonfreeKernelModules: fglrx
ApportVersion: 1.94-0ubuntu1
Architecture: amd64
Date: Sat Mar  3 13:33:11 2012
MDadmExamine.dev.sda:
 /dev/sda:
    MBR Magic : aa55
 Partition[0] :    121660182 sectors at           63 (type 07)
 Partition[1] :    503477100 sectors at    121660245 (type 05)
MDadmExamine.dev.sda2:
 /dev/sda2:
    MBR Magic : aa55
 Partition[0] :     78124032 sectors at           63 (type 83)
 Partition[1] :    172007893 sectors at     78124095 (type 05)
MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda5.
MDadmExamine.dev.sda7: Error: command ['/sbin/mdadm', '-E', '/dev/sda7'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda7.
MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: No medium found
MDadmExamine.dev.sdc:
 /dev/sdc:
    MBR Magic : aa55
 Partition[0] :    104438502 sectors at           63 (type 83)
 Partition[1] :     20498940 sectors at    104438565 (type 0b)
 Partition[2] :    172007893 sectors at    124937505 (type fd)
MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc1.
MDadmExamine.dev.sdc2:
 /dev/sdc2:
    MBR Magic : aa55
MachineType: Hewlett-Packard HP Pavilion dv5 Notebook PC
ProcEnviron:
 LANGUAGE=en
 TERM=xterm
 LANG=en_US.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic root=UUID=10f8a2ac-5ab7-43a2-bdf8-92eee349e09d ro quiet splash vt.handoff=7
ProcMDstat:
 Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
 md1 : active raid1 sdc3[0]
       86003840 blocks [3/1] [U__]

 unused devices: <none>
SourcePackage: mdadm
UpgradeStatus: Upgraded to precise on 2012-03-03 (0 days ago)
dmi.bios.date: 08/19/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: F.37
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: 30F2
dmi.board.vendor: Quanta
dmi.board.version: 98.36
dmi.chassis.type: 10
dmi.chassis.vendor: Quanta
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnHewlett-Packard:bvrF.37:bd08/19/2009:svnHewlett-Packard:pnHPPaviliondv5NotebookPC:pvrRev1:rvnQuanta:rn30F2:rvr98.36:cvnQuanta:ct10:cvrN/A:
dmi.product.name: HP Pavilion dv5 Notebook PC
dmi.product.version: Rev 1
dmi.sys.vendor: Hewlett-Packard
mtime.conffile..etc.udev.rules.d.85.mdadm.rules: 2009-01-02T11:08:01

** Affects: mdadm (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug precise

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/945786

Title:
  mdadm refuses to re-add failed member

Status in “mdadm” package in Ubuntu:
  New

Bug description:
  I have my /home in a three-disk RAID1 configuration (/dev/md1) with a
  partition on my laptop and a second on an external disk connected via
  eSATA; A third sits on a third external disk.    I booted up with two
  members degraded (external drive not plugged in) and prior to login,
  proceeded to use a console to  umount, remove and fail the active
  drive (internal partition member) and stop the RAID1 disk, and then
  plug in my external, re-starting the /dev/md1 device with the external
  partition member active and remounting /home.   The process is one I
  have executed many times before and is scripted from a couple of files
  in /usr/local/bin.

  However, this time after logging in with my external member active
  after executing the process above, and attempting to re-add the
  internal drive to bring the /dev/md1 device in sync with the external
  disk I received an error suggesting the add failed.  I re-executed the
  remove, fail, re-add manually with the same outcome as shown on my
  console below, and filed this bug.

  It seems the failed disk thinks it is still active, when I use -Q
  --examine to interrogate it.

  :~# mdadm /dev/md1 -r /dev/sda6
  mdadm: hot remove failed for /dev/sda6: No such device or address
  :~# mdadm /dev/md1 -f /dev/sda6
  mdadm: set device faulty failed for /dev/sda6:  No such device
  :~# mdadm /dev/md1 -a /dev/sda6
  mdadm: /dev/sda6 reports being an active member for /dev/md1, but a --re-add fails.
  mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
  mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
  :~# mdstat
  Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
  md1 : active raid1 sdc3[0]
        86003840 blocks [3/1] [U__]

  unused devices: <none>
  :~# apport-bug mdadm

  Here is a quick summary of what I did,
  a) My disks were synced on an 11.10 system
  b) I upgraded from 11.10 to 12.04 with one member failed (external)
  c) After upgrade I failed the active disk (internal), stopped the array, and restarted it with the external disk
  d) Attempted to re-add the failed internal disk after logging in

  :~# blkid | grep raid_member
  /dev/sda6: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"
  /dev/sdc3: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"

  :~# mdadm -D /dev/md1
  /dev/md1:
          Version : 0.90
    Creation Time : Sun Jul 27 22:53:23 2008
       Raid Level : raid1
       Array Size : 86003840 (82.02 GiB 88.07 GB)
    Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
     Raid Devices : 3
    Total Devices : 1
  Preferred Minor : 1
      Persistence : Superblock is persistent

      Update Time : Sat Mar  3 13:56:05 2012
            State : clean, degraded
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 0
    Spare Devices : 0

             UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
           Events : 0.10186827

      Number   Major   Minor   RaidDevice State
         0       8       35        0      active sync   /dev/sdc3
         1       0        0        1      removed
         2       0        0        2      removed
  :~# mdadm -Q /dev/sdc3
  /dev/sdc3: is not an md array
  /dev/sdc3: device 0 in 3 device active raid1 /dev/md1.  Use mdadm --examine for more detail.

  :~# mdadm -Q /dev/sda6
  /dev/sda6: is not an md array
  /dev/sda6: device 1 in 3 device mismatch raid1 /dev/md1.  Use mdadm --examine for more detail.

  :~# mdadm -Q /dev/sda6 --examine
  /dev/sda6:
            Magic : a92b4efc
          Version : 0.90.00
             UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
    Creation Time : Sun Jul 27 22:53:23 2008
       Raid Level : raid1
    Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
       Array Size : 86003840 (82.02 GiB 88.07 GB)
     Raid Devices : 3
    Total Devices : 1
  Preferred Minor : 1

      Update Time : Sat Mar  3 13:28:57 2012
            State : clean
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 1
    Spare Devices : 0
         Checksum : 60f50ddb - correct
           Events : 10128612

        Number   Major   Minor   RaidDevice State
  this     1       8        6        1      active sync   /dev/sda6

     0     0       0        0        0      removed
     1     1       8        6        1      active sync   /dev/sda6
     2     2       0        0        2      faulty removed

  clearly it is not active (0,8,35,0 is per -D output above), but it
  thinks it is.

  Captured enough.. time to reboot and see what happens; Hopefully an
  auto-rebuild.   I have the third disk in the array separate should
  some corruption happen here.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: mdadm 3.2.3-2ubuntu1
  ProcVersionSignature: Ubuntu 3.2.0-17.27-generic 3.2.6
  Uname: Linux 3.2.0-17-generic x86_64
  NonfreeKernelModules: fglrx
  ApportVersion: 1.94-0ubuntu1
  Architecture: amd64
  Date: Sat Mar  3 13:33:11 2012
  MDadmExamine.dev.sda:
   /dev/sda:
      MBR Magic : aa55
   Partition[0] :    121660182 sectors at           63 (type 07)
   Partition[1] :    503477100 sectors at    121660245 (type 05)
  MDadmExamine.dev.sda2:
   /dev/sda2:
      MBR Magic : aa55
   Partition[0] :     78124032 sectors at           63 (type 83)
   Partition[1] :    172007893 sectors at     78124095 (type 05)
  MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda5.
  MDadmExamine.dev.sda7: Error: command ['/sbin/mdadm', '-E', '/dev/sda7'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda7.
  MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: No medium found
  MDadmExamine.dev.sdc:
   /dev/sdc:
      MBR Magic : aa55
   Partition[0] :    104438502 sectors at           63 (type 83)
   Partition[1] :     20498940 sectors at    104438565 (type 0b)
   Partition[2] :    172007893 sectors at    124937505 (type fd)
  MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc1.
  MDadmExamine.dev.sdc2:
   /dev/sdc2:
      MBR Magic : aa55
  MachineType: Hewlett-Packard HP Pavilion dv5 Notebook PC
  ProcEnviron:
   LANGUAGE=en
   TERM=xterm
   LANG=en_US.utf8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic root=UUID=10f8a2ac-5ab7-43a2-bdf8-92eee349e09d ro quiet splash vt.handoff=7
  ProcMDstat:
   Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
   md1 : active raid1 sdc3[0]
         86003840 blocks [3/1] [U__]

   unused devices: <none>
  SourcePackage: mdadm
  UpgradeStatus: Upgraded to precise on 2012-03-03 (0 days ago)
  dmi.bios.date: 08/19/2009
  dmi.bios.vendor: Hewlett-Packard
  dmi.bios.version: F.37
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 30F2
  dmi.board.vendor: Quanta
  dmi.board.version: 98.36
  dmi.chassis.type: 10
  dmi.chassis.vendor: Quanta
  dmi.chassis.version: N/A
  dmi.modalias: dmi:bvnHewlett-Packard:bvrF.37:bd08/19/2009:svnHewlett-Packard:pnHPPaviliondv5NotebookPC:pvrRev1:rvnQuanta:rn30F2:rvr98.36:cvnQuanta:ct10:cvrN/A:
  dmi.product.name: HP Pavilion dv5 Notebook PC
  dmi.product.version: Rev 1
  dmi.sys.vendor: Hewlett-Packard
  mtime.conffile..etc.udev.rules.d.85.mdadm.rules: 2009-01-02T11:08:01

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/945786/+subscriptions




More information about the foundations-bugs mailing list