[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Clint Byrum
clint at fewbar.com
Thu Dec 16 08:00:26 GMT 2010
So I've done some more thinking about this, and I had a bit of an aha!
moment.
While we *should* in fact stop using 'stop on runlevel [016]' or 'stop
on runlevel [!2345]', I think we can solve this without touching all of
those jobs.
/etc/init.d/sendsigs has this code:
# Upstart jobs have their own "stop on" clauses that sends
# SIGTERM/SIGKILL just like this, so if they're still running,
# they're supposed to be
for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process //p"); do
OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid"
done
It uses this to determine which pids not to kill because, presumably, upstart should be managing them.
However, this code is flawed. killall5 will kill the children of all of
these if they are multi process daemons or scripts running things. This
would only be solved by walking through /proc looking for these as
parent pids (and then doing the same again with the new list.. ).
However, this technique can actually be used to determine if there are
still jobs that are supposed to be stopped, but haven't finished
stopping yet. Since they should be listed as stop/(pre-stop|post-
stop|killed), we can determine exactly which pids we expect to go away.
Since upstart has its own idea of how long to wait before it kills
these, we should actually wait indefinitely.
I'm attaching a debdiff that solves the race as far as I can tell,
though I think it needs a good long look, since it could mean shutdowns
hang for a long time waiting (I'm especially curious if the pre-stop
/post-stop's are subject to kill timeout)
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541
Title:
race condition on shutdown (leads to corrupted fs)
More information about the Ubuntu-server-bugs
mailing list