event based initramfs
Dmitrijs Ledkovs
dmitrij.ledkov at ubuntu.com
Thu May 17 16:54:17 UTC 2012
Hello Phillip,
Thank you for bringing these questions up.
On 17/05/12 16:15, Phillip Susi wrote:
> On 5/17/2012 4:37 AM, James Hunt wrote:
>> Sounds like both systems are performing initialisation to me. And as we
>> know, there is symmetry between what happens in the initramfs and what
>> then has to happen again in the main system. By using Upstart, we get
>> one tool to do that job without duplication of code.
>
> Sure, but upstart is performing a lot more initialization. The only
> initialization the initramfs needs is to bring up the root device, which
> is handled mostly by udev rules, and thus, already using common event
> driven code.
>
Well it does a lot of crafty stuff
* bring up root device
* mounts /dev
* bring up network & mount nfsroot
* bring up multipath/iSCSI
* unlocks LUKS partitions
* starts udev
* sets up plymouth framebuffer consoles
At some point (after bring up the root device) it hands over to the real
system at this point it does some hackery when passing mountpoints /
state to upstart. (as upstart doesn't currently support a re-execution
with state preservation)
If bringing up rootfs doesn't require:
* networking
* multipath/iSCSI
* user input/output (LUKS)
* any feedback to the user (framebuffer/plymouth)
* udev
* nested RAIDs
Then yeah there is no need for anything to handle event loops. In fact,
in that case you probably do not need/want initramfs at all.
Unfortunately, in most general case we might/do need: networking, LUKS,
udev event handling, passing the resulting state back to the real
upstart... etc
See:
https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-event-based-initramfs
---8<---
a. The invocation of cryptsetup is procedural and not completely event
based.
b. There is a repetition of code for mounting, checking filesystems,
crypt devices, LVM devices and so on - one code path is found in upstart
jobs and the other in initramfs.
---8<---
See: https://wiki.ubuntu.com/ReliableRaid
---8<---
The initramfs boot process is not (a state machine) capable of
assembling the base system from devices appearing in any order and
starting necessary raids degraded if they are not complete after some time.
---8<---
I need to check, but I'm suspecting that rootfs in LUKS which is on top
of RAID 1+0 may not boot reliably without manual interaction.
Also, debian-installer can produce arbitrary nesting of LUKS/LVM/RAID.
Not sure if there is an n-th limit or not.
At this point for boot performance, you do want to run udev/init as soon
as possible. And pass the state of udev/init as soon as possible to real
rootfs/real init.
Do you have suggestions on how to make booting reliable with rootfs on
arbitrary nested LUKS/LVM/RAID without doing it event driven, nor
designing a poor-man's-udev-based init system for initramfs?
On a side note:
(Personally I want dropbear ssh server to work reliably for unlocking
LUKS partitions over the network)
>> Upstart provides a "udev bridge" that allows arbitrary jobs to react to
>> udev events without having to install udev rules in one of the 3 udev
>> locations (although I guess that's really potentially 5 if you include
>> the initramfs locations (but exclude /run)). For the cases where you
>> aren't modifying devices, it makes working with udev much simpler.
>
> Yes, I know upstart can build more layers on top of the udev events, but
> what I don't understand is why you would want to. What I am looking for
> is a specific example of something you want upstart to do above the udev
> rules that both needs done while the system is fully up and running, and
> also needs done to find the root fs.
>
(roughly in order of descending order of importance)
* pausing rootwait timeout during password prompt to unlock a LUKS device
* Unlocking nested LUKS partitions
* Running multipath daemon & mounting iSCSI rootfs drives
* Incremental nested RAID assembly / Booting from Degraded RAIDs
* Bring up networking & mount nfsroot and not drop network during
initrmafs->real system transition
* Booting of a RAID array brought over from another system
* If hardware token for unlocking LUKS was not initially present, accept
it when we fell back to a password prompt, and get on with things.
>> The code for mountall is admittedly more complex than simply calling
>> 'mount', but it has the following advantages:
>>
>> - it actually performs error checking (unlike the initramfs mount code).
>
> And what would you do in the event of an error? If the fs is so damaged
> that it can't be mounted, then about all you can do is report the error
> to the user and halt or drop them to the busybox shell. So I guess what
> you are saying is that you want a pretty plymouth error message instead
> of plain console text?
>
In the event of an error:
* you can attempt automatic recovery with fsck / similar tools
* drop into busybox
* Interesting suggestion: have emacs & w3m to give a fully-functional
minimal operating system for the user to get onto irc / help forums /
wiki pages, or even boot into something like chromium-os
>> - it provides user feedback on the operations via Plymouth.
>> - it runs fsck and provides feedback to the user via Plymouth.
>
> So you also want to put all of the fscks and their dependencies in the
> initramfs? Or it would only do this for non root filesystems? Why
> would we want to do this in the initramfs instead of when the real init
> runs?
>
Speed.
We can start & continue fsck on e.g. unencrypted partitions, while the
user is slowly trying to unlock some other LUKS partition with a 64
character password. And magically, handover / continue fsck.
At that point if fsck is already NN% done, user may actually invest in
waiting a little longer for it to complete.
> AFAIK, plymouth is not currently in the initramfs because it was decided
> that adding it and the rather sizable graphics drivers that it depends
> on would bloat the initramfs too much, so why do we now want to not only
> add all of that, but upstart and mountall as well?
>
My initramfs on precise has plymouth already. Am I special? (I have LUKS
and loads of other crap). (I am personally not concerned with plymouth
and the whole graphical stuff in initramfs, but a pretty bootspash is
nice, not seeing black screen is nice, having normal fonts & correct
screen resolution in busybox is also nice. Nice, not critical. But for
desktop users, it might actually be critical to have pretty bootsplash,
etc.)
drop - cryptic shell scripts
add - upstart / mountall
gain (hopefully) - reliable boot for complex cases
I long term I think maintenance decrease will outweigh the size
increase. Maybe I'm wrong. Anyway, we are not going drop current
initramfs anytime soon.
> So far the only advantage I'm seeing is that we can make a rare class of
> error messages and the luks password prompt look pretty. I can see the
> argument for that, but does it really require upstart and mountall too?
>
...and actually reliably bringing up luks promts, when needed, and not
timeout & drop into busybox while typing the password in... </rant> ;-)
I don't know yet what is actually required to fix all cases for 'just
bring up rootfs, but I have it on ...'. But if we come up with some
proof of concepts which are a least safe to test on non-critical
machines then at least we will have something to evaluate. One idea
which has been floating around for awhile was to bring first class event
handling into initramfs and start doing useful things early.
Currently we are working on getting a proof of concept together, and see
if it:
- does not regress
- reduces maintenance (ie gives more developer hours to add features)
- gives options to do fun stuff more reliably
If you have some code to do it within current framework, especially with
respect to unlocking LUKS devices. Then I am all ears.
>> That's not actually what we're talking about here - the upstreams are
>> moving tooling currently in / to /usr. We might be able to work around
>> this, but it sounds like it will become increasingly difficult to do so.
>
> What upstream doesn't support the normal configure switches to set the
> install directory, and that mountall depends on?
>
We'd rather carry less patches / diverging packaging from
upstream/debian. There are plenty of upstreams who don't support things
and require developer time to patch software & maintain the said patchsets.
Please point out/correct any technical inaccuracies. This is a complex
topic.
--
Regards,
Dmitrijs
More information about the ubuntu-devel
mailing list