[Oneiric-Topic] Server Boot

Alvin info at alvin.be
Wed Mar 30 15:21:04 UTC 2011


On Wednesday 30 March 2011 14:52:14 Serge E. Hallyn wrote:
> Quoting Scott Kitterman (ubuntu at kitterman.com):
> > There was a lot of discussion around improving the server boot experience
> > before the UDS-M.  A number of people expressed interest in seeing more
> > useful diagnostic information during boot.  Others expressed concerns
> > with boot reliability on the more complex hardware typically found in
> > servers.
> > 
> > How are we doing on this?  Personally, I can't remember the last time I
> > rebooted a server and it wasn't via SSH and the hardware I use is the
> > sort there were problems with.  Are these still issues for the Ubuntu
> > Server community?
> > 
> > Scott K
> 
> I think right now these issues are oveshadowed by the fact that a
> great deal of server software is not yet upstartified.  I think that
> needs to be addressed for O.

Yes, they are certainly still issues (and the primary reason the company I 
work for is abandoning Ubuntu.)

I agree that a lot of servers are not often rebooted, but not every server is 
a webserver. Some are used only during certain hours and can be booted 
automatically (BIOS or WOL) when needed in order to keep the electricity bill 
down. Booting should be a reliable and automated process. Accurate logging is 
important in order to know what went wrong in case the unthinkable happens.

The current boot.log looks like:
> mount.nfs: DNS resolution failed for 192.168.xxx.3: Name or service not 
known
> mount.nfs4: Failed to resolve server exampleserver: Name or service not 
known
> mountall: mount /srv/example [1134] terminated with status 32
> mount error(101): Network is unreachable
while in reality filesystems are mounted. Now, when something goes wrong, the 
log is identical. conclusion: boot.log is useless. (actually, the log is 
probably correct. it can't resolve server names at that specific time.)
Proper boot logging would be popular[1].

Take the following example of a server boot. Let's also assume that nothing 
goes wrong that could lead to a busybox console. (It certainly can![2][3])
So, you're now sitting in front of a nice prompt. Everything looks ok, but is 
it? The server mounts NFS shares from another server, it runs KVM/libvirt with 
a netfs storage pool for its virtual machines and a quasselcore for IRC that 
stores it's data on a postgresql on another server. The local filesystem uses 
mdadm for RAID1 and LVM on op of that. Very server-like. (I once made this 
setup to test some things.) In order to keep things under control, there are 
/no/ LVM snapshots. That is another ugly story.

So, what happens now:
- The RAID will be broken! [4][5]
- The NFS shares in /etc/fstab might not be mounted, [6][7]
  even when you told the system to wait with _netdev. [8]
- Your virtual machines on netfs will not be running. [9]
- The quasselcore with external db will not be started. [10]

The array can be assembled by running a command and all of the above daemons 
can be started manually.

I talked about some of those topics on IRC, and the following workarounds came 
up. There are also some workarounds in the bug reports.
- Put NFS shares in /etc/fstab, and don't configure them as netfs storage 
pools.
- Put the IP addresses of your NFS servers in /etc/hosts.

For most servers, speeding up the boot process is less important than 
reliability. Why not take a look at how Debian does it? You can disable 
running the boot scripts in parallel with 'CONCURRENCY=none' in 
/etc/default/rcS.

Also, think about daemons of commercial software without upstart scripts. You 
never know whether they will start at boot or not.

Links:
[1] "init: support logging of job output"
    https://bugs.launchpad.net/bugs/328881

[2] "Gave up waiting for root device after upgrade then busybox console"
    https://bugs.launchpad.net/bugs/360378

[3] "karmic rc: root device sometimes not found"
    https://bugs.launchpad.net/bugs/460914

[4] mdadm cannot assemble array as cannot open drive with O_EXCL
    https://bugs.launchpad.net/bugs/27037

[5] "mdadm cannot assemble array"
    https://bugs.launchpad.net/bugs/599135

[6] "nfs mounts specified in fstab is not mounted on boot."
    https://bugs.launchpad.net/bugs/275451

[7] "nfs shares are not automounted anymore in intrepid"
    https://bugs.launchpad.net/bugs/285013

[8] "_netdev not working"
    https://bugs.launchpad.net/bugs/384347

[9] "Libvirt NFS mount on boot."
    https://bugs.launchpad.net/bugs/351307

[10] "quasselcore does not connect to database at boot"
     https://bugs.launchpad.net/bugs/612729

-- 
Alvin




More information about the ubuntu-server mailing list