How to clean up full /boot safely?

Colin Watson cjwatson at ubuntu.com
Mon Feb 12 18:52:25 UTC 2018


On Mon, Feb 12, 2018 at 06:57:09PM +0100, Liam Proven wrote:
> On 12 February 2018 at 18:16, Colin Watson <cjwatson at ubuntu.com> wrote:
> > But if you claim that something flat-out doesn't work when it does -
> > even if there are caveats - then I'll still point out the error.
> 
> My words were:
> 
> "Things GRUB might have problems with"
> 
> As I have emphasised before: *MIGHT*

Your words were also "the kernel must be on something GRUB can read,
i.e. a straightforward Linux filesystem", and that was what I considered
to be the main substance of what I was objecting to.  Do you see how
that reads as a claim that other configurations flat-out don't work?

> Let me clarify with an example.
> 
> / is on /dev/md1, which is a mirrored pair of drives, /dev/sda and /dev/sdb

I'd recommend that that actually be /dev/sda1 and /dev/sdb1 so that
there's a bit of room at the front for the boot loader; it can be made
to work otherwise, of course, but it's safer if you explicitly partition
the physical disks.

> (I have zero clue what the GRUB nomenclature is, which is the sort of
> reason I don't like it. hd(0,0) and hd(1,0) or something, who knows?

GRUB's nomenclature has traditionally been unpopular (though these days
you can use labels/UUIDs instead of having to care) but it's not without
rationale: since there's no reasonable way for a boot loader to exactly
replicate Linux's device naming, given that it's naming devices at all
it's best for it to have a naming scheme that isn't going to clash with
Linux.

> Root is on a mirror.  The /etc/grub directory is therefore on both, as
> is /boot and the kernel.
> 
> GRUB, to the best of my knowledge, supports this just fine. So does
> the kernel. It's not LVM, it's not GPT, it's nothing very fancy.
> 
> Must GRUB be installed to /dev/md1? I don't think that would boot
> because that's a Linux kernel device, only visible when the kernel is
> running.
> 
> So where is the boot sector? Is it on /dev/sda or /dev/sdb?
> 
> Is it on both?
> 
> Can it be on both?

It can and should be on both /dev/sda and /dev/sdb.  If you put it
somewhere else then you're at the mercy of your firmware managing to
find it; if you only put it on one of those then you don't have a truly
redundant setup.

> If it is on both, and /dev/sda fails, if the firmware is configured so
> that the secondary boot device is /dev/sdb, will GRUB automatically
> failover and boot off /dev/sdb?

[This is my understanding of the design, but it's a very long time since
I've personally tried it.  Anyone relying on this should test it on an
unimportant system first.]

In this scenario, the firmware will hand over control to the GRUB boot
image read from the start of /dev/sdb, which will proceed to read the
rest of the core image from near the start of the same disk (it does so
using sector addresses relative to the same disk, so it'll read that
from /dev/sdb).  The GRUB core image will then build its own view of the
RAID array; in the case you outline, in the absence of further hardware
failures, it will have enough members to be able to read from the array.
When asked to read /boot/grub/grub.cfg and any further files that that
references, it will do so using that view of the array, which consists
of only data on /dev/sdb1, so that amounts to automatic failover.

In short: yes.

> If so, will it bring the kernel up normally with a degraded RAID pair?

It will simply hand off to the kernel (and probably initramfs), and it's
their job to work that sort of thing out.  This is true of any boot
loader.

> Use a separate, non-mirrored /boot volume on a single drive. Have
> arrangements to replace this if needed, have spares, have backups, but
> if you need to _know_ it _will_ work, put /boot on a single unmirrored
> hard disk and then _and only then_ is it possible to say "this will
> work and it will boot your machine".

I can see your position, although if the rest of the system is RAID then
this means you're deliberately arranging for the drive that contains
/boot to be a single point of failure.  I don't really see a major
difference between having to recover /boot and having to recover the
boot loader in that kind of setup.

Personally, rather than deliberately eschewing boot loader support, I'd
prefer to set up a situation where the boot loader can automatically
deal with failover all the way from the firmware up, and test that on a
replica of the production environment.  That way, when it fails in the
middle of the night on $holiday eve, it can last until somebody's normal
working hours rather than being an on-call emergency.

But this sort of thing depends on the situation.

> I have been shouted at and told off for saying "this _MIGHT_ not work"
> because GRUB supports it. It doesn't matter if GRUB supports it. The
> question is more complicated.

As mentioned, you saying "this might not work" wasn't the part of your
message that I was saying was factually incorrect.

-- 
Colin Watson                                       [cjwatson at ubuntu.com]




More information about the ubuntu-users mailing list