8.04-1 won't boot from degraded raid
Michael Hipp
Michael at Hipp.com
Tue Aug 26 16:10:49 UTC 2008
Soren Hansen wrote:
> On Tue, Aug 26, 2008 at 10:20:45AM -0500, Michael Hipp wrote:
>>>>> But in the meantime ... this is Intrepid. What do I do about the
>>>>> "production" Hardy that I is now known to ship with a broken RAID
>>>>> implementation?
>>>> Just because it doesn't boot without intervention from a degraded
>>>> RAID, that doesn't mean it won't carry on when the RAID degrades
>>>> right? Or am I missing the issue?
>>> No, you are quite right. I also don't particularly approve of such
>>> frivolous usage of the word "broken".
>> What word would *you* choose to describe a server that won't boot when
>> only one of it's (supposedly redundant) members is down?
>
> Apparantly, I should be calling it a server "that doesn't do what
> Michael Hipp expects it to".
>
>> It might help if you were aware that I've been fighting this issue
>> with Ubuntu releases ever since the days of 4.10:
>>
>> http://ubuntuforums.org/showthread.php?t=15655
>> https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/12052
>
> Ok, make that: a server "that *still* doesn't do what Michael Hipp
> expects it to".
>
> I'm quite happy that the server doesn't boot if my raid array is broken,
> actually.
>
> Imagine a scenario where the disk controller is flaky. Disk A goes away
> while the system is running, and is then out of date. You reboot the
> machine (or perhaps it rebooted itself because the flaky controller
> short circuited or whatever), and for whatever reason (flaky controller,
> remember?), the system boots from disk B instead. The changes to your
> filesystem since disk A disappeared away are not there, and new changes
> are being written to disk B, and there's no chance of merging the two.
> This is what I refer to as "having a very bad day".
>
> There are lots of other scenarios where you really don't want to boot if
> your RAID array is not in tip-top shape. If the system is already
> running, it knows something about its current state, which disk is the
> more trustworthy one, etc. When booting, this is not the case.
>
> I value data over uptime.
>
>> Every time I think it's fixed I seem to learn that it's uh, er, not
>> functional once again.
>
> "Not acting in the way you want" is not the same as "not functional".
>
>> (I'm pretty sure it works fine in 6.06 LTS tho it's been a long time
>> since I tested it.)
>
> Nope. It's the same.
>
>> I've been installing operating systems on RAID1 for my little LAN
>> servers for as long as I can remember. Before Ubuntu it never occurred
>> to me that getting a system to boot a RAID1 with a defunct member was
>> some rocket science.
>
> True, it's more difficult than it could be. Dustin has been working hard
> on getting that fixed in Intepid.
>
>> Why, pray tell, can't Ubuntu make this Just Work like most every thing
>> else in Ubuntu?
>
> "Just Work" in this context means different things to different people.
> To me, "Just Work" means that it above all doesn't corrupt my data. To
> others, it might mean "start the sucker no matter what, so that I can
> get on with my life". Neither is a malfunction, so both options should
> be available, but spare me the "broken" and "not functional" babble.
>
>> Would my rant be any better received if I pointed out that this stuff
>> has worked just fine in versions of Red Hat and Windows dating back
>> almost a decade.
>
> Not in particular, no.
In every single answer above you are focused on the fact that it does fine for
the use case where you don't want it to boot upon failure. As noted in the page
[1] linked to by Dustin's blog, that's a valid use case. (A bit hard for a
guy like me to imagine. But valid nevertheless.)
What you don't seem to grasp is that it utterly fails at the other use case
where the system needs to boot regardless. You seem to be declaring that use
case as being one that's invalid (evidently because *I* prefer it as you offer
no other.
It's broken because the second use case doesn't work. And evidently can't be
made to work under any circumstances. Tell me, once again, what word you use to
describe a system where a documented valid use case utterly fails? It is not
functional. It is broken. For that (seemingly, to me, more common) use case of
wanting the server to do what servers do and run.
Michael
[1] https://wiki.ubuntu.com/BootDegradedRaid
More information about the ubuntu-server
mailing list