[Oneiric-Topic] Server Boot
ubuntu at kitterman.com
Wed Mar 30 15:26:43 UTC 2011
On Wednesday, March 30, 2011 11:21:04 AM Alvin wrote:
> On Wednesday 30 March 2011 14:52:14 Serge E. Hallyn wrote:
> > Quoting Scott Kitterman (ubuntu at kitterman.com):
> > > There was a lot of discussion around improving the server boot
> > > experience before the UDS-M. A number of people expressed interest in
> > > seeing more useful diagnostic information during boot. Others
> > > expressed concerns with boot reliability on the more complex hardware
> > > typically found in servers.
> > >
> > > How are we doing on this? Personally, I can't remember the last time I
> > > rebooted a server and it wasn't via SSH and the hardware I use is the
> > > sort there were problems with. Are these still issues for the Ubuntu
> > > Server community?
> > >
> > > Scott K
> > I think right now these issues are oveshadowed by the fact that a
> > great deal of server software is not yet upstartified. I think that
> > needs to be addressed for O.
> Yes, they are certainly still issues (and the primary reason the company I
> work for is abandoning Ubuntu.)
> I agree that a lot of servers are not often rebooted, but not every server
> is a webserver. Some are used only during certain hours and can be booted
> automatically (BIOS or WOL) when needed in order to keep the electricity
> bill down. Booting should be a reliable and automated process. Accurate
> logging is important in order to know what went wrong in case the
> unthinkable happens.
> The current boot.log looks like:
> > mount.nfs: DNS resolution failed for 192.168.xxx.3: Name or service not
> > mount.nfs4: Failed to resolve server exampleserver: Name or service not
> > mountall: mount /srv/example  terminated with status 32
> > mount error(101): Network is unreachable
> while in reality filesystems are mounted. Now, when something goes wrong,
> the log is identical. conclusion: boot.log is useless. (actually, the log
> is probably correct. it can't resolve server names at that specific time.)
> Proper boot logging would be popular.
> Take the following example of a server boot. Let's also assume that nothing
> goes wrong that could lead to a busybox console. (It certainly can!)
> So, you're now sitting in front of a nice prompt. Everything looks ok, but
> is it? The server mounts NFS shares from another server, it runs
> KVM/libvirt with a netfs storage pool for its virtual machines and a
> quasselcore for IRC that stores it's data on a postgresql on another
> server. The local filesystem uses mdadm for RAID1 and LVM on op of that.
> Very server-like. (I once made this setup to test some things.) In order
> to keep things under control, there are /no/ LVM snapshots. That is
> another ugly story.
> So, what happens now:
> - The RAID will be broken! 
> - The NFS shares in /etc/fstab might not be mounted, 
> even when you told the system to wait with _netdev. 
> - Your virtual machines on netfs will not be running. 
> - The quasselcore with external db will not be started. 
> The array can be assembled by running a command and all of the above
> daemons can be started manually.
> I talked about some of those topics on IRC, and the following workarounds
> came up. There are also some workarounds in the bug reports.
> - Put NFS shares in /etc/fstab, and don't configure them as netfs storage
> - Put the IP addresses of your NFS servers in /etc/hosts.
> For most servers, speeding up the boot process is less important than
> reliability. Why not take a look at how Debian does it? You can disable
> running the boot scripts in parallel with 'CONCURRENCY=none' in
> Also, think about daemons of commercial software without upstart scripts.
> You never know whether they will start at boot or not.
>  "init: support logging of job output"
>  "Gave up waiting for root device after upgrade then busybox console"
>  "karmic rc: root device sometimes not found"
>  mdadm cannot assemble array as cannot open drive with O_EXCL
>  "mdadm cannot assemble array"
>  "nfs mounts specified in fstab is not mounted on boot."
>  "nfs shares are not automounted anymore in intrepid"
>  "_netdev not working"
>  "Libvirt NFS mount on boot."
>  "quasselcore does not connect to database at boot"
This is exactly the kind of detailed feedback I was hoping to get. Thank you.
I suspect we'll need to have several UDS sessions around server boot in order
to lay out a comprehensive plan of attack. The release before the next LTS is
definitely the cycle to hit this.
More information about the ubuntu-server