[Bug 16139] New: Software RAID Boot fails if array active, but out of sync

Fri Sep 23 14:06:48 UTC 2005

Please do not reply to this email.  You can add comments at
http://bugzilla.ubuntu.com/show_bug.cgi?id=16139
Ubuntu | linux

           Summary: Software RAID Boot fails if array active, but out of
                    sync
           Product: Ubuntu
           Version: unspecified
          Platform: amd64
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: linux
        AssignedTo: ben.collins at ubuntu.com
        ReportedBy: finley at anl.gov
         QAContact: kernel-bugs at lists.ubuntu.com

md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/dis
md: using maximum available idle IO bandwith (but not more than 2000000 KB/sec)
for reconstruction
md: using 128k window over a total of 151115520 blocks.
Stopping tasks: === [ there is a short pause ]
   stopping tasks failed (1 tasks remaining)

then, the system hangs.  That is, it still accepts keyboard input (<Enter> moves
the cursor down a line, etc.), but does not provide any indication of activity.

Please reference this other bug:
http://bugzilla.ubuntu.com/show_bug.cgi?id=10916

Bug #10916 above is currently listed as "Not a Bug", presumably because of the
"probably due to overheating" guess at the bottom, which I can dispell.  But it
has lots of other info that may prove relevant and useful.

I am experiencing this same issue on a new Sun Fire V40z Server in a well cooled
machine room.

Hardware:
- Quad CPU amd64 -- AMD Opteron(tm) Processor 852
- 32G memory
- 2x SCSI disks

RAID config:
- /dev/md0 RAID1
  Mount point: /boot
  Partitions:  /dev/sda1, /dev/sdb1
- /dev/md1 RAID1
  Physical device for LVM vg0
  $ mount | grep vg0
  /dev/mapper/vg0-root on / type ext3 (rw,errors=remount-ro)
  /dev/mapper/vg0-tmp on /tmp type ext3 (rw)
  /dev/mapper/vg0-var on /var type ext3 (rw)

I didn't try the "acpi=off" option, but was able to temporarily resolve the
situation in this way:
- multiple boots failed with hoary 2.6.10-5-amd64-k8-smp in the way described below
  - normal, just hit <Enter> boot failed
  - append "init=/bin/bash" boot boot failed
  - append "single" boot failed
- boot from "live" CD, then "watch cat /proc/mdstat" showed /dev/md1 re-syncing
- after the re-sync was complete, I was able to reboot with kernel 2.6.12.2-bef
without incident (smp kernel)
- then tried booting again from 2.6.10-5-amd64-k8-smp, and also had success

Another point of potential interest is the file "/script" on the initrd, as this
is where the RAID arrays are assembled.  It contains the following for my system:

   mdadm -A /devfs/md/1 -R -u 55f3a23c:a0ef4950:a0a11bfe:250e3f63 /dev/sda2
/dev/sdb2
   mkdir /devfs/vg0
   mount_tmpfs /var
   if [ -f /etc/lvm/lvm.conf ]; then
   cat /etc/lvm/lvm.conf > /var/lvm.conf
   fi
   mount_tmpfs /etc/lvm
   if [ -f /var/lvm.conf ]; then
   cat /var/lvm.conf > /etc/lvm/lvm.conf
   fi
   mount -nt devfs devfs /dev
   vgchange -a y vg0
   umount /dev
   umount -n /var
   umount -n /etc/lvm
   ROOT=/dev/mapper/vg0-root
   mdadm -A /devfs/md/1 -R -u 55f3a23c:a0ef4950:a0a11bfe:250e3f63 /dev/sda2
/dev/sdb2

The machine is in use now, and I am unable to perform further tests on it. 
However, I will be receiving another one soon, and will be able to re-create
this problem and do further testing on it.  Please let me know if there are
tests you would like me to perform.

-- 
Configure bugmail: http://bugzilla.ubuntu.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.