[Bug 1014848] Re: Systems sometimes come up with RAID devices dropped, re-adding fails.

Dmitrijs Ledkovs 1014848 at bugs.launchpad.net
Mon Jun 18 22:32:36 UTC 2012


(launchpad email interface is failing me)

Thank you for filing a bug report, I was not subscribed to the launchpad
answers. I will now.

First things, first. There are are couple of issues you are outlining
here and I am going to mark this bug as invalid, on the basis that is it
a support type of question and not a real bug report. In any case if
some of these are real bugs they will need to be split into separate bug
reports.

Please take some time to read https://wiki.ubuntu.com/ReliableRaid
I am working on improving the situation.

Here we go.

On 18/06/12 22:28, Iordan Iordanov wrote:
> Public bug reported:
>
> We have 3 systems with RAID1 sets which sometimes come up with one
> device missing. Attempting to re-add the device fails with:
>

This is probably due to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/942106

Does adding settle help?

> # mdadm /dev/md2 --re-add /dev/sda6
> mdadm: --re-add for /dev/sda6 to /dev/md2 is not possible
>
> # mdadm /dev/md2 --add /dev/sda6
> mdadm: /dev/sda6 reports being an active member for /dev/md2, but a --re-add fails.
> mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.
>

This seems odd, did md2 got brought up degraded? So I would think this
should be possible. I will add this to my testing suite, which is yet to
be written.

> The only thing which "resolves" the problem is zeroing the superblock on
> the drive, and adding it to the RAID array. This causes an unnecessary
> rebuild of the drive, as it shouldn't have been dropped from the array
> in the first place.
>

Hmmm... I think I so a bug or two about requiring to zero out
superblock. Can you please search the bugs on launchpad?


> A similar, perhaps related problem occurred on a RAID6 system with 6
> devices. We failed/removed one device, and moved it to another slot on
> the system (probably irrelevant). Trying to add it back into the RAID6
> array resulted in the same error. This has happened with the newest
> kernel (3.2.0-25) as well as with 3.2.0-24 which was used to create this
> bug-report. It has also happened with different RAID levels, which is
> why I filed the bug against mdadm, rather than the kernel.
>

Are the disks regular SATA attached? Or something special like iSCSI?
I bet the superblock metadata was still old accross the device and there
may have been a dangling link somewhere.

I will digg through the collected debug info. And maybe respond with a
few more details.

> A question with more details was posted by a colleague of mine here:
> https://answers.launchpad.net/ubuntu/+source/mdadm/+question/199471
>

I will look into that.

> A question was posted by me in the linux-raid mailing list here:
> http://marc.info/?t=133953601900006&r=1&w=2
>

I will look, but in general I currently do not have the capacity to
follow linux-raild mailing list.


Regards,
Dmitrijs.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1014848

Title:
  Systems sometimes come up with RAID devices dropped, re-adding fails.

Status in “mdadm” package in Ubuntu:
  Invalid

Bug description:
  We have 3 systems with RAID1 sets which sometimes come up with one
  device missing. Attempting to re-add the device fails with:

  # mdadm /dev/md2 --re-add /dev/sda6
  mdadm: --re-add for /dev/sda6 to /dev/md2 is not possible

  # mdadm /dev/md2 --add /dev/sda6
  mdadm: /dev/sda6 reports being an active member for /dev/md2, but a --re-add fails.
  mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
  mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.

  The only thing which "resolves" the problem is zeroing the superblock
  on the drive, and adding it to the RAID array. This causes an
  unnecessary rebuild of the drive, as it shouldn't have been dropped
  from the array in the first place.

  A similar, perhaps related problem occurred on a RAID6 system with 6
  devices. We failed/removed one device, and moved it to another slot on
  the system (probably irrelevant). Trying to add it back into the RAID6
  array resulted in the same error. This has happened with the newest
  kernel (3.2.0-25) as well as with 3.2.0-24 which was used to create
  this bug-report. It has also happened with different RAID levels,
  which is why I filed the bug against mdadm, rather than the kernel.

  A question with more details was posted by a colleague of mine here:
  https://answers.launchpad.net/ubuntu/+source/mdadm/+question/199471

  A question was posted by me in the linux-raid mailing list here:
  http://marc.info/?t=133953601900006&r=1&w=2

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: mdadm 3.2.3-2ubuntu1
  ProcVersionSignature: Ubuntu 3.2.0-24.39-generic-pae 3.2.16
  Uname: Linux 3.2.0-24-generic-pae i686
  ApportVersion: 2.0.1-0ubuntu8
  Architecture: i386
  Date: Mon Jun 18 17:12:16 2012
  Lsusb:
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 001 Device 002: ID 04b4:6560 Cypress Semiconductor Corp. CY7C65640 USB-2.0 "TetraHub"
  MDadmExamine.dev.sda: Error: command ['/sbin/mdadm', '-E', '/dev/sda'] failed with exit code 1: mdadm: cannot open /dev/sda: Permission denied
  MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: cannot open /dev/sda1: Permission denied
  MDadmExamine.dev.sda2: Error: command ['/sbin/mdadm', '-E', '/dev/sda2'] failed with exit code 1: mdadm: cannot open /dev/sda2: Permission denied
  MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: cannot open /dev/sda5: Permission denied
  MDadmExamine.dev.sda6: Error: command ['/sbin/mdadm', '-E', '/dev/sda6'] failed with exit code 1: mdadm: cannot open /dev/sda6: Permission denied
  MDadmExamine.dev.sda7: Error: command ['/sbin/mdadm', '-E', '/dev/sda7'] failed with exit code 1: mdadm: cannot open /dev/sda7: Permission denied
  MDadmExamine.dev.sda8: Error: command ['/sbin/mdadm', '-E', '/dev/sda8'] failed with exit code 1: mdadm: cannot open /dev/sda8: Permission denied
  MDadmExamine.dev.sda9: Error: command ['/sbin/mdadm', '-E', '/dev/sda9'] failed with exit code 1: mdadm: cannot open /dev/sda9: Permission denied
  MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: Permission denied
  MDadmExamine.dev.sdb1: Error: command ['/sbin/mdadm', '-E', '/dev/sdb1'] failed with exit code 1: mdadm: cannot open /dev/sdb1: Permission denied
  MachineType: Dell Computer Corporation PowerEdge 860
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   SHELL=/bin/bash
  ProcKernelCmdLine: root=/dev/sda1 ro
  ProcMDstat:
   Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
   md0 : active raid1 sda9[2] sdb1[0]
         409598840 blocks super 1.2 [2/2] [UU]
         
   unused devices: <none>
  SourcePackage: mdadm
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/04/2007
  dmi.bios.vendor: Dell Computer Corporation
  dmi.bios.version: A03
  dmi.board.name: 0XM089
  dmi.board.vendor: Dell Computer Corporation
  dmi.board.version: A00
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Computer Corporation
  dmi.modalias: dmi:bvnDellComputerCorporation:bvrA03:bd04/04/2007:svnDellComputerCorporation:pnPowerEdge860:pvr:rvnDellComputerCorporation:rn0XM089:rvrA00:cvnDellComputerCorporation:ct23:cvr:
  dmi.product.name: PowerEdge 860
  dmi.sys.vendor: Dell Computer Corporation
  etc.blkid.tab:
   <device DEVNO="0x0801" TIME="1339993953.873769" UUID="153fb914-13c1-4ab2-9368-be196bbe589a" TYPE="ext4">/dev/sda1</device>
   <device DEVNO="0x0806" TIME="1339993967.110129" UUID="af709399-931c-4652-9565-5f53ccf8b5ce" TYPE="ext4">/dev/sda6</device>
   <device DEVNO="0x0900" TIME="1339994011.818058" UUID="e00278de-892f-4477-bf35-385af3725b28" TYPE="ext4">/dev/md0</device>

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1014848/+subscriptions




More information about the foundations-bugs mailing list