[Bug 945786] Re: mdadm refuses to re-add failed member

iMac 945786 at bugs.launchpad.net
Sat Mar 3 20:30:56 UTC 2012


Thanks, the --zero-superblock on the device I want to re-add worked.  I
think I understand what happened, possibly as a result of some mdadm
improvements.  A verbose explanation follows.

The complexity I did not share, is that my three-disk RAID1 array is
actually between two laptops, each which can sync-to/from an eSATA.

In my case, both my laptop disks were previously operating as active disks in an array at the same time (both technically degraded, one with two members, the other with only one member).  The external drive had been in sync with my second laptop, running 11.10.  My first laptop which was last synced a few days ago and has now been upgraded to 12.04B1 while operating with one member.  
 
As soon as I deemed 12.04 functionally great, I immediately shut them both down, and tried to do my usual pre-login swap of active disks on the first (12.04) laptop to my external drive where my current home directory was residing.  Normally this works just un-mounting/failing/removing/stopping the devices and re-assembling/re-mounting with the external member. 

Now, I believe mdadm is smart enough to know that both disks came from
active/clean (albeit degraded possibly) md disks, and so it chooses not
to let me just re-add one to another as I see fit.  Previously, the most
recent would be the active if I boot with both attached, and a re-sync
would start immediately OR I could start with one member, and switch to
another.

Now mdadm blocks if I try and add one previously active member to
another when they are out of sync, waiting for me to clear meta on the
old active member.  This is improved, I will just change my process in
this situation based on these assumptions.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/945786

Title:
  mdadm refuses to re-add failed member

Status in “mdadm” package in Ubuntu:
  New

Bug description:
  I have my /home in a three-disk RAID1 configuration (/dev/md1) with a
  partition on my laptop and a second on an external disk connected via
  eSATA; A third sits on a third external disk.    I booted up with two
  members degraded (external drive not plugged in) and prior to login,
  proceeded to use a console to  umount, remove and fail the active
  drive (internal partition member) and stop the RAID1 disk, and then
  plug in my external, re-starting the /dev/md1 device with the external
  partition member active and remounting /home.   The process is one I
  have executed many times before and is scripted from a couple of files
  in /usr/local/bin.

  However, this time after logging in with my external member active
  after executing the process above, and attempting to re-add the
  internal drive to bring the /dev/md1 device in sync with the external
  disk I received an error suggesting the add failed.  I re-executed the
  remove, fail, re-add manually with the same outcome as shown on my
  console below, and filed this bug.

  It seems the failed disk thinks it is still active, when I use -Q
  --examine to interrogate it.

  :~# mdadm /dev/md1 -r /dev/sda6
  mdadm: hot remove failed for /dev/sda6: No such device or address
  :~# mdadm /dev/md1 -f /dev/sda6
  mdadm: set device faulty failed for /dev/sda6:  No such device
  :~# mdadm /dev/md1 -a /dev/sda6
  mdadm: /dev/sda6 reports being an active member for /dev/md1, but a --re-add fails.
  mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
  mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
  :~# mdstat
  Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
  md1 : active raid1 sdc3[0]
        86003840 blocks [3/1] [U__]

  unused devices: <none>
  :~# apport-bug mdadm

  Here is a quick summary of what I did,
  a) My disks were synced on an 11.10 system
  b) I upgraded from 11.10 to 12.04 with one member failed (external)
  c) After upgrade I failed the active disk (internal), stopped the array, and restarted it with the external disk
  d) Attempted to re-add the failed internal disk after logging in

  :~# blkid | grep raid_member
  /dev/sda6: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"
  /dev/sdc3: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"

  :~# mdadm -D /dev/md1
  /dev/md1:
          Version : 0.90
    Creation Time : Sun Jul 27 22:53:23 2008
       Raid Level : raid1
       Array Size : 86003840 (82.02 GiB 88.07 GB)
    Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
     Raid Devices : 3
    Total Devices : 1
  Preferred Minor : 1
      Persistence : Superblock is persistent

      Update Time : Sat Mar  3 13:56:05 2012
            State : clean, degraded
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 0
    Spare Devices : 0

             UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
           Events : 0.10186827

      Number   Major   Minor   RaidDevice State
         0       8       35        0      active sync   /dev/sdc3
         1       0        0        1      removed
         2       0        0        2      removed
  :~# mdadm -Q /dev/sdc3
  /dev/sdc3: is not an md array
  /dev/sdc3: device 0 in 3 device active raid1 /dev/md1.  Use mdadm --examine for more detail.

  :~# mdadm -Q /dev/sda6
  /dev/sda6: is not an md array
  /dev/sda6: device 1 in 3 device mismatch raid1 /dev/md1.  Use mdadm --examine for more detail.

  :~# mdadm -Q /dev/sda6 --examine
  /dev/sda6:
            Magic : a92b4efc
          Version : 0.90.00
             UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
    Creation Time : Sun Jul 27 22:53:23 2008
       Raid Level : raid1
    Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
       Array Size : 86003840 (82.02 GiB 88.07 GB)
     Raid Devices : 3
    Total Devices : 1
  Preferred Minor : 1

      Update Time : Sat Mar  3 13:28:57 2012
            State : clean
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 1
    Spare Devices : 0
         Checksum : 60f50ddb - correct
           Events : 10128612

        Number   Major   Minor   RaidDevice State
  this     1       8        6        1      active sync   /dev/sda6

     0     0       0        0        0      removed
     1     1       8        6        1      active sync   /dev/sda6
     2     2       0        0        2      faulty removed

  clearly it is not active (0,8,35,0 is per -D output above), but it
  thinks it is.

  Captured enough.. time to reboot and see what happens; Hopefully an
  auto-rebuild.   I have the third disk in the array separate should
  some corruption happen here.

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: mdadm 3.2.3-2ubuntu1
  ProcVersionSignature: Ubuntu 3.2.0-17.27-generic 3.2.6
  Uname: Linux 3.2.0-17-generic x86_64
  NonfreeKernelModules: fglrx
  ApportVersion: 1.94-0ubuntu1
  Architecture: amd64
  Date: Sat Mar  3 13:33:11 2012
  MDadmExamine.dev.sda:
   /dev/sda:
      MBR Magic : aa55
   Partition[0] :    121660182 sectors at           63 (type 07)
   Partition[1] :    503477100 sectors at    121660245 (type 05)
  MDadmExamine.dev.sda2:
   /dev/sda2:
      MBR Magic : aa55
   Partition[0] :     78124032 sectors at           63 (type 83)
   Partition[1] :    172007893 sectors at     78124095 (type 05)
  MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda5.
  MDadmExamine.dev.sda7: Error: command ['/sbin/mdadm', '-E', '/dev/sda7'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda7.
  MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: No medium found
  MDadmExamine.dev.sdc:
   /dev/sdc:
      MBR Magic : aa55
   Partition[0] :    104438502 sectors at           63 (type 83)
   Partition[1] :     20498940 sectors at    104438565 (type 0b)
   Partition[2] :    172007893 sectors at    124937505 (type fd)
  MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc1.
  MDadmExamine.dev.sdc2:
   /dev/sdc2:
      MBR Magic : aa55
  MachineType: Hewlett-Packard HP Pavilion dv5 Notebook PC
  ProcEnviron:
   LANGUAGE=en
   TERM=xterm
   LANG=en_US.utf8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic root=UUID=10f8a2ac-5ab7-43a2-bdf8-92eee349e09d ro quiet splash vt.handoff=7
  ProcMDstat:
   Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
   md1 : active raid1 sdc3[0]
         86003840 blocks [3/1] [U__]

   unused devices: <none>
  SourcePackage: mdadm
  UpgradeStatus: Upgraded to precise on 2012-03-03 (0 days ago)
  dmi.bios.date: 08/19/2009
  dmi.bios.vendor: Hewlett-Packard
  dmi.bios.version: F.37
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 30F2
  dmi.board.vendor: Quanta
  dmi.board.version: 98.36
  dmi.chassis.type: 10
  dmi.chassis.vendor: Quanta
  dmi.chassis.version: N/A
  dmi.modalias: dmi:bvnHewlett-Packard:bvrF.37:bd08/19/2009:svnHewlett-Packard:pnHPPaviliondv5NotebookPC:pvrRev1:rvnQuanta:rn30F2:rvr98.36:cvnQuanta:ct10:cvrN/A:
  dmi.product.name: HP Pavilion dv5 Notebook PC
  dmi.product.version: Rev 1
  dmi.sys.vendor: Hewlett-Packard
  mtime.conffile..etc.udev.rules.d.85.mdadm.rules: 2009-01-02T11:08:01

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/945786/+subscriptions




More information about the foundations-bugs mailing list