[Oneiric-Topic] Server Boot
Alvin
info at alvin.be
Wed Mar 30 15:21:04 UTC 2011
On Wednesday 30 March 2011 14:52:14 Serge E. Hallyn wrote:
> Quoting Scott Kitterman (ubuntu at kitterman.com):
> > There was a lot of discussion around improving the server boot experience
> > before the UDS-M. A number of people expressed interest in seeing more
> > useful diagnostic information during boot. Others expressed concerns
> > with boot reliability on the more complex hardware typically found in
> > servers.
> >
> > How are we doing on this? Personally, I can't remember the last time I
> > rebooted a server and it wasn't via SSH and the hardware I use is the
> > sort there were problems with. Are these still issues for the Ubuntu
> > Server community?
> >
> > Scott K
>
> I think right now these issues are oveshadowed by the fact that a
> great deal of server software is not yet upstartified. I think that
> needs to be addressed for O.
Yes, they are certainly still issues (and the primary reason the company I
work for is abandoning Ubuntu.)
I agree that a lot of servers are not often rebooted, but not every server is
a webserver. Some are used only during certain hours and can be booted
automatically (BIOS or WOL) when needed in order to keep the electricity bill
down. Booting should be a reliable and automated process. Accurate logging is
important in order to know what went wrong in case the unthinkable happens.
The current boot.log looks like:
> mount.nfs: DNS resolution failed for 192.168.xxx.3: Name or service not
known
> mount.nfs4: Failed to resolve server exampleserver: Name or service not
known
> mountall: mount /srv/example [1134] terminated with status 32
> mount error(101): Network is unreachable
while in reality filesystems are mounted. Now, when something goes wrong, the
log is identical. conclusion: boot.log is useless. (actually, the log is
probably correct. it can't resolve server names at that specific time.)
Proper boot logging would be popular[1].
Take the following example of a server boot. Let's also assume that nothing
goes wrong that could lead to a busybox console. (It certainly can![2][3])
So, you're now sitting in front of a nice prompt. Everything looks ok, but is
it? The server mounts NFS shares from another server, it runs KVM/libvirt with
a netfs storage pool for its virtual machines and a quasselcore for IRC that
stores it's data on a postgresql on another server. The local filesystem uses
mdadm for RAID1 and LVM on op of that. Very server-like. (I once made this
setup to test some things.) In order to keep things under control, there are
/no/ LVM snapshots. That is another ugly story.
So, what happens now:
- The RAID will be broken! [4][5]
- The NFS shares in /etc/fstab might not be mounted, [6][7]
even when you told the system to wait with _netdev. [8]
- Your virtual machines on netfs will not be running. [9]
- The quasselcore with external db will not be started. [10]
The array can be assembled by running a command and all of the above daemons
can be started manually.
I talked about some of those topics on IRC, and the following workarounds came
up. There are also some workarounds in the bug reports.
- Put NFS shares in /etc/fstab, and don't configure them as netfs storage
pools.
- Put the IP addresses of your NFS servers in /etc/hosts.
For most servers, speeding up the boot process is less important than
reliability. Why not take a look at how Debian does it? You can disable
running the boot scripts in parallel with 'CONCURRENCY=none' in
/etc/default/rcS.
Also, think about daemons of commercial software without upstart scripts. You
never know whether they will start at boot or not.
Links:
[1] "init: support logging of job output"
https://bugs.launchpad.net/bugs/328881
[2] "Gave up waiting for root device after upgrade then busybox console"
https://bugs.launchpad.net/bugs/360378
[3] "karmic rc: root device sometimes not found"
https://bugs.launchpad.net/bugs/460914
[4] mdadm cannot assemble array as cannot open drive with O_EXCL
https://bugs.launchpad.net/bugs/27037
[5] "mdadm cannot assemble array"
https://bugs.launchpad.net/bugs/599135
[6] "nfs mounts specified in fstab is not mounted on boot."
https://bugs.launchpad.net/bugs/275451
[7] "nfs shares are not automounted anymore in intrepid"
https://bugs.launchpad.net/bugs/285013
[8] "_netdev not working"
https://bugs.launchpad.net/bugs/384347
[9] "Libvirt NFS mount on boot."
https://bugs.launchpad.net/bugs/351307
[10] "quasselcore does not connect to database at boot"
https://bugs.launchpad.net/bugs/612729
--
Alvin
More information about the ubuntu-server
mailing list