[Bug 945786] Re: mdadm refuses to re-add failed member
iMac
945786 at bugs.launchpad.net
Sat Mar 3 20:30:56 UTC 2012
Thanks, the --zero-superblock on the device I want to re-add worked. I
think I understand what happened, possibly as a result of some mdadm
improvements. A verbose explanation follows.
The complexity I did not share, is that my three-disk RAID1 array is
actually between two laptops, each which can sync-to/from an eSATA.
In my case, both my laptop disks were previously operating as active disks in an array at the same time (both technically degraded, one with two members, the other with only one member). The external drive had been in sync with my second laptop, running 11.10. My first laptop which was last synced a few days ago and has now been upgraded to 12.04B1 while operating with one member.
As soon as I deemed 12.04 functionally great, I immediately shut them both down, and tried to do my usual pre-login swap of active disks on the first (12.04) laptop to my external drive where my current home directory was residing. Normally this works just un-mounting/failing/removing/stopping the devices and re-assembling/re-mounting with the external member.
Now, I believe mdadm is smart enough to know that both disks came from
active/clean (albeit degraded possibly) md disks, and so it chooses not
to let me just re-add one to another as I see fit. Previously, the most
recent would be the active if I boot with both attached, and a re-sync
would start immediately OR I could start with one member, and switch to
another.
Now mdadm blocks if I try and add one previously active member to
another when they are out of sync, waiting for me to clear meta on the
old active member. This is improved, I will just change my process in
this situation based on these assumptions.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/945786
Title:
mdadm refuses to re-add failed member
Status in “mdadm” package in Ubuntu:
New
Bug description:
I have my /home in a three-disk RAID1 configuration (/dev/md1) with a
partition on my laptop and a second on an external disk connected via
eSATA; A third sits on a third external disk. I booted up with two
members degraded (external drive not plugged in) and prior to login,
proceeded to use a console to umount, remove and fail the active
drive (internal partition member) and stop the RAID1 disk, and then
plug in my external, re-starting the /dev/md1 device with the external
partition member active and remounting /home. The process is one I
have executed many times before and is scripted from a couple of files
in /usr/local/bin.
However, this time after logging in with my external member active
after executing the process above, and attempting to re-add the
internal drive to bring the /dev/md1 device in sync with the external
disk I received an error suggesting the add failed. I re-executed the
remove, fail, re-add manually with the same outcome as shown on my
console below, and filed this bug.
It seems the failed disk thinks it is still active, when I use -Q
--examine to interrogate it.
:~# mdadm /dev/md1 -r /dev/sda6
mdadm: hot remove failed for /dev/sda6: No such device or address
:~# mdadm /dev/md1 -f /dev/sda6
mdadm: set device faulty failed for /dev/sda6: No such device
:~# mdadm /dev/md1 -a /dev/sda6
mdadm: /dev/sda6 reports being an active member for /dev/md1, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
:~# mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdc3[0]
86003840 blocks [3/1] [U__]
unused devices: <none>
:~# apport-bug mdadm
Here is a quick summary of what I did,
a) My disks were synced on an 11.10 system
b) I upgraded from 11.10 to 12.04 with one member failed (external)
c) After upgrade I failed the active disk (internal), stopped the array, and restarted it with the external disk
d) Attempted to re-add the failed internal disk after logging in
:~# blkid | grep raid_member
/dev/sda6: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"
/dev/sdc3: UUID="eeeb6708-d108-0847-57e9-714c01b7dbc8" TYPE="linux_raid_member"
:~# mdadm -D /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Array Size : 86003840 (82.02 GiB 88.07 GB)
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Mar 3 13:56:05 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
Events : 0.10186827
Number Major Minor RaidDevice State
0 8 35 0 active sync /dev/sdc3
1 0 0 1 removed
2 0 0 2 removed
:~# mdadm -Q /dev/sdc3
/dev/sdc3: is not an md array
/dev/sdc3: device 0 in 3 device active raid1 /dev/md1. Use mdadm --examine for more detail.
:~# mdadm -Q /dev/sda6
/dev/sda6: is not an md array
/dev/sda6: device 1 in 3 device mismatch raid1 /dev/md1. Use mdadm --examine for more detail.
:~# mdadm -Q /dev/sda6 --examine
/dev/sda6:
Magic : a92b4efc
Version : 0.90.00
UUID : eeeb6708:d1080847:57e9714c:01b7dbc8
Creation Time : Sun Jul 27 22:53:23 2008
Raid Level : raid1
Used Dev Size : 86003840 (82.02 GiB 88.07 GB)
Array Size : 86003840 (82.02 GiB 88.07 GB)
Raid Devices : 3
Total Devices : 1
Preferred Minor : 1
Update Time : Sat Mar 3 13:28:57 2012
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Checksum : 60f50ddb - correct
Events : 10128612
Number Major Minor RaidDevice State
this 1 8 6 1 active sync /dev/sda6
0 0 0 0 0 removed
1 1 8 6 1 active sync /dev/sda6
2 2 0 0 2 faulty removed
clearly it is not active (0,8,35,0 is per -D output above), but it
thinks it is.
Captured enough.. time to reboot and see what happens; Hopefully an
auto-rebuild. I have the third disk in the array separate should
some corruption happen here.
ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: mdadm 3.2.3-2ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-17.27-generic 3.2.6
Uname: Linux 3.2.0-17-generic x86_64
NonfreeKernelModules: fglrx
ApportVersion: 1.94-0ubuntu1
Architecture: amd64
Date: Sat Mar 3 13:33:11 2012
MDadmExamine.dev.sda:
/dev/sda:
MBR Magic : aa55
Partition[0] : 121660182 sectors at 63 (type 07)
Partition[1] : 503477100 sectors at 121660245 (type 05)
MDadmExamine.dev.sda2:
/dev/sda2:
MBR Magic : aa55
Partition[0] : 78124032 sectors at 63 (type 83)
Partition[1] : 172007893 sectors at 78124095 (type 05)
MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda5.
MDadmExamine.dev.sda7: Error: command ['/sbin/mdadm', '-E', '/dev/sda7'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda7.
MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: No medium found
MDadmExamine.dev.sdc:
/dev/sdc:
MBR Magic : aa55
Partition[0] : 104438502 sectors at 63 (type 83)
Partition[1] : 20498940 sectors at 104438565 (type 0b)
Partition[2] : 172007893 sectors at 124937505 (type fd)
MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc1.
MDadmExamine.dev.sdc2:
/dev/sdc2:
MBR Magic : aa55
MachineType: Hewlett-Packard HP Pavilion dv5 Notebook PC
ProcEnviron:
LANGUAGE=en
TERM=xterm
LANG=en_US.utf8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-17-generic root=UUID=10f8a2ac-5ab7-43a2-bdf8-92eee349e09d ro quiet splash vt.handoff=7
ProcMDstat:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdc3[0]
86003840 blocks [3/1] [U__]
unused devices: <none>
SourcePackage: mdadm
UpgradeStatus: Upgraded to precise on 2012-03-03 (0 days ago)
dmi.bios.date: 08/19/2009
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: F.37
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: 30F2
dmi.board.vendor: Quanta
dmi.board.version: 98.36
dmi.chassis.type: 10
dmi.chassis.vendor: Quanta
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnHewlett-Packard:bvrF.37:bd08/19/2009:svnHewlett-Packard:pnHPPaviliondv5NotebookPC:pvrRev1:rvnQuanta:rn30F2:rvr98.36:cvnQuanta:ct10:cvrN/A:
dmi.product.name: HP Pavilion dv5 Notebook PC
dmi.product.version: Rev 1
dmi.sys.vendor: Hewlett-Packard
mtime.conffile..etc.udev.rules.d.85.mdadm.rules: 2009-01-02T11:08:01
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/945786/+subscriptions
More information about the foundations-bugs
mailing list