[Bug 16139] New: Software RAID Boot fails if array active, but out of sync
bugzilla-daemon at bugzilla.ubuntu.com
bugzilla-daemon at bugzilla.ubuntu.com
Fri Sep 23 14:06:48 UTC 2005
Please do not reply to this email. You can add comments at
http://bugzilla.ubuntu.com/show_bug.cgi?id=16139
Ubuntu | linux
Summary: Software RAID Boot fails if array active, but out of
sync
Product: Ubuntu
Version: unspecified
Platform: amd64
OS/Version: Linux
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: linux
AssignedTo: ben.collins at ubuntu.com
ReportedBy: finley at anl.gov
QAContact: kernel-bugs at lists.ubuntu.com
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/dis
md: using maximum available idle IO bandwith (but not more than 2000000 KB/sec)
for reconstruction
md: using 128k window over a total of 151115520 blocks.
Stopping tasks: === [ there is a short pause ]
stopping tasks failed (1 tasks remaining)
then, the system hangs. That is, it still accepts keyboard input (<Enter> moves
the cursor down a line, etc.), but does not provide any indication of activity.
Please reference this other bug:
http://bugzilla.ubuntu.com/show_bug.cgi?id=10916
Bug #10916 above is currently listed as "Not a Bug", presumably because of the
"probably due to overheating" guess at the bottom, which I can dispell. But it
has lots of other info that may prove relevant and useful.
I am experiencing this same issue on a new Sun Fire V40z Server in a well cooled
machine room.
Hardware:
- Quad CPU amd64 -- AMD Opteron(tm) Processor 852
- 32G memory
- 2x SCSI disks
RAID config:
- /dev/md0 RAID1
Mount point: /boot
Partitions: /dev/sda1, /dev/sdb1
- /dev/md1 RAID1
Physical device for LVM vg0
$ mount | grep vg0
/dev/mapper/vg0-root on / type ext3 (rw,errors=remount-ro)
/dev/mapper/vg0-tmp on /tmp type ext3 (rw)
/dev/mapper/vg0-var on /var type ext3 (rw)
I didn't try the "acpi=off" option, but was able to temporarily resolve the
situation in this way:
- multiple boots failed with hoary 2.6.10-5-amd64-k8-smp in the way described below
- normal, just hit <Enter> boot failed
- append "init=/bin/bash" boot boot failed
- append "single" boot failed
- boot from "live" CD, then "watch cat /proc/mdstat" showed /dev/md1 re-syncing
- after the re-sync was complete, I was able to reboot with kernel 2.6.12.2-bef
without incident (smp kernel)
- then tried booting again from 2.6.10-5-amd64-k8-smp, and also had success
Another point of potential interest is the file "/script" on the initrd, as this
is where the RAID arrays are assembled. It contains the following for my system:
mdadm -A /devfs/md/1 -R -u 55f3a23c:a0ef4950:a0a11bfe:250e3f63 /dev/sda2
/dev/sdb2
mkdir /devfs/vg0
mount_tmpfs /var
if [ -f /etc/lvm/lvm.conf ]; then
cat /etc/lvm/lvm.conf > /var/lvm.conf
fi
mount_tmpfs /etc/lvm
if [ -f /var/lvm.conf ]; then
cat /var/lvm.conf > /etc/lvm/lvm.conf
fi
mount -nt devfs devfs /dev
vgchange -a y vg0
umount /dev
umount -n /var
umount -n /etc/lvm
ROOT=/dev/mapper/vg0-root
mdadm -A /devfs/md/1 -R -u 55f3a23c:a0ef4950:a0a11bfe:250e3f63 /dev/sda2
/dev/sdb2
The machine is in use now, and I am unable to perform further tests on it.
However, I will be receiving another one soon, and will be able to re-create
this problem and do further testing on it. Please let me know if there are
tests you would like me to perform.
--
Configure bugmail: http://bugzilla.ubuntu.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
More information about the kernel-bugs
mailing list