hang in failsafe.conf on precise

Clint Byrum clint at ubuntu.com
Wed Apr 25 16:59:19 UTC 2012


Excerpts from Christoph Mathys's message of Wed Apr 25 00:26:59 -0700 2012:
> I've been refered here from ubuntu-devel-discuss, see below for the problem.
> 
> On Tue, Apr 24, 2012 at 3:47 PM, Evan Huus <eapache at gmail.com> wrote:
> > On Tue, Apr 24, 2012 at 6:39 AM, Christoph Mathys <eraserix at gmail.com>
> > wrote:
> >>
> >> I just encountered some problems with very long boottimes on precise.
> >> failsafe.conf just hangs until the timeout has elapsed.
> >>
> >> The culprit seems to be that I define interfaces in
> >> /etc/network/interfaces that do not exist when I'm testing in kvm
> >> (ifup -a fails). This then seems to prevent static-network-up to be
> >> emitted. I'm not quite sure why this event is never emitted. Is
> >> static-network-up only emitted, if the job networking ("exec ifup -a")
> >> runs successfully? (I've disable network-interface.conf)
> >
> >
> > The static-network-up event is emitted by ifup using the
> > /etc/network/if-up.d/upstart script, and I believe it's only emitted when
> > all the 'auto' interfaces in /etc/network/interfaces are successfully
> > brought up (upstart is more my area than networking). The event is necessary
> > for boot to proceed safely, so if it doesn't happen, nothing past that point
> > will run until the failsafe kicks in after 120 seconds (which is what you're
> > seeing).
> >
> >> As a workaround I think I'll just disable failsafe.conf and write my
> >> own job which immediately emits the static-network-up event.
> >
> >
> > You don't have to disable failsafe.conf. As long as something emits
> > static-network-up in a reasonable amount of time it won't cause any
> > problems, and it's useful to have active in other cases.
> 
> True.
> 
> > Writing another job that immediately emits the static-network-up event is
> > problematic in that it may be run before any network interfaces have
> > actually been brought up. This will cause all sorts of trouble for jobs that
> > start expecting to find active interfaces but then can't.
> 
> Just to be clear: immediately meens "on stopped networking".
> ifup/ifdown seem to run pretty synchronous. If I put a script with
> "sleep 10" into if-up.d (didn't know about this folder btw, thanks!),
> ifup -a will just block for 10s before returning. So I assume the
> interfaces are up after ifup -a has returned (from ifups point of
> view). If they are not, no amount of waiting will change that, unless
> we have some hotplug events that come in and trigger
> network-interface.conf, but I don't know if this does happen.
> 
> So, whats the reasoning behind implementing this failsafe network configuration?
> 
> > I believe the correct thing to do in this case is to remove the offending
> > entries from /etc/network/interfaces, since they're apparently unnecessary
> > in this particular environment. If there's some reason you'd rather not, try
> > sending an email to the upstart-devel [1] mailing list. Someone there might
> > know a better workaround.
> 
> I can't remove them, they are just not needed for certain tests inside
> kvm (e.g. I don't need the fieldbus interface to check software
> deployment). In my particular case, I see no benefit in failsafe. All
> NICs are there from the start, and if ifup fails, no amount of waiting
> will change that. All this failsafe stuff seems to do is penalise me
> when some unexpected problem needs to be debugged.
> 

First off, thanks for having the patience and determination to work
through this and open up the discussion.

It may not seem that way, but you are benefiting from failsafe. Without
it, your system would *never* boot. As its name implies, it is there so
that no matter what happens with the network, your system will at least
boot and be available for local login.

The reason we can't just move on immediately if all interfaces fail is
that we don't know if some of them haven't been detected and reported
by the kernel yet. ifup -a is not the only way interfaces come up. Also
there are chains of interfaces, some of which will be brought up by udev
events, and some by the 'ifup -a' later, depending on those events.

We absolutely must provide administrators with a way to bring up their
systems reliably, and many services which are started in "runlevel 2"
require that all interfaces be configured when they start.

You want the interfaces brought up at boot time, but not waited on. This
is a bit of an odd configuration, as you have basically specified an
invalid configuration. However, you can do this by removing them from the
'auto' group, and then adding them to a group which you manually bring
up later. This job would do it:

start on stopped networking

task
exec ifup -a --allow=othergroup

That should be right after udev and the kernel has detected all cold
boot hardware (and filesystem is needed so you can write the state of
the interfaces).

I do think we may want to consider adding another class like 'auto-nowait'
so that users can do this more easily, but as yet, waiting the extra 2
minutes for these corner cases is the simplest solution.

There's a bug for this already:

https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/964775

I've just commented on one possible solution to shrinking the failsafe
timeout when its obvious there's nothing left to wait for. Not sure how
reliable that method would be though.



More information about the upstart-devel mailing list