Software RAID and races in the boot process
Eamonn Sullivan
eamonn.sullivan at gmail.com
Thu Dec 23 14:07:07 UTC 2004
I have the exact same problem on a simpler setup: just one SATA drive.
The only solution suggested so far is to create a script in rc.d to
run after the boot process to mount the drive instead of doing it in
fstab. That seems a bit broken to me, so please post any solution you
find.
On Thu, 23 Dec 2004 15:42:39 +0200, Marius Gedminas <marius at pov.lt> wrote:
> Hi,
>
> I am trying to configure Ubuntu with software RAID-1 on a server that is
> about 1000 km from my physical location. Here's how the setup looks
> like:
>
> * two SATA disks (/dev/sda and /dev/sdb) with identical partition
> tables (I did sfdisk -d /dev/sda | sfdisk /dev/sdb)
>
> * /dev/sda1 is a regular 2 gig partition containing a custom (i.e. no
> desktop) Ubuntu installation. No RAID here, this partition is left
> as a backup for recoveries if the main setup gets fubared.
>
> * /dev/sda2 and /dev/sdb2 comprise /dev/md0 which is the root.
> * /dev/sda5 and /dev/sdb5 comprise /dev/md1 which is mounted on /home
> * /dev/sda6 and /dev/sdb6 comprise /dev/md2 which is mounted on /var
>
> MBR of both disks contain the boot record from the 'mbr' package.
> /dev/sda1 contains GRUB that boots into the recovery partition (NB lilo
> did not work here for obscure reasons). /dev/md0 (i.e. both /dev/sda2
> and /dev/sdb2) contains LILO that boots the system from RAID. I used
> LILO because GRUB claims to not support RAID1. /dev/sda2 and /dev/sdb2 are
> the only partitions marked as bootable.
>
> The almost system works: BIOS starts the MBR which loads LILO from
> /dev/sda2. LILO loads the kernel and initrd. Thanks to judicious use
> of dpkg-reconfigure linux-image-$(uname -r) the scripts in the initrd
> load raid1.ko and start up /dev/md0 (known as /devfs/md/0 at that point)
> with mdadm. The real root filesystem (/dev/md0) is then mounted,
> checked, remounted read-write etc.
>
> PROBLEM: boot process stops in S30checkfs.sh with fsck.ext3 claiming
> that /dev/md1 and /dev/md2 do not exist. When someone on site comes up
> to the console and presses ^D, the system continues to boot and comes up
> normally. At that point I can ssh into the system and see that /dev/md1
> and /dev/md2 do exist, and, moreso, they are actually mounted.
>
> I suspect that there is a race condition: /etc/rcS.d/S25mdadm-raid
> starts up the raid devices, but udev creates the corresponding device
> nodes a little bit too late, causing fsck to fail, but subsequent mount
> to succeed.
>
> I have tried to reproduce the setup on a machine that I have right here
> in the office. It is a much older and slower server (dual 233 Mhz P2
> rather than 2.8 GHz P4). It does not fail in fsck, but S25mdadm-raid
> prints a couple of interesting error messages:
>
> * Starting RAID devices... [done]
> mdadm: error opening /dev/md1?: No such file or directory
> mdadm: error opening /dev/md1?: No such file or directory
> * Setting up LVM volume groups... [done]
> ...
>
> I suspect that this a symptom of the same problem, even though /dev/md1
> is apparently created soon enough for fsck to succeed.
>
> Can someone who understands udev and related issues tell me whether my
> suspicion about the race condition is plausible? Should I file a bug in
> bugzilla.ubuntu.com, and if so, for what package?
>
> In the mean time I will either disable fsck by changing the last column
> in /etc/fstab from 2 to 0, or try to add 'sleep 5' after mdadm starts
> but before fsck runs, or both.
>
> Marius Gedminas
> --
> "I may not understand what I'm installing, but that's not my job. I
> just need to click Next, Next, Finish here so I can walk to the next
> system and repeat the process"
> -- Anonymous NT Admin
>
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> http://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
>
>
>
More information about the ubuntu-users
mailing list