Need app-specific shutdown delay to allow saving state

Jim Avera james_avera at yahoo.com
Thu Feb 19 23:27:35 GMT 2009


Hi, Does anyone know if work is being done to allow applications to
control how much time they get to gracefully exit during system
shut-down?   Currently it is 10 seconds, hard-coded
in /etc/init.d/{sendsigs,killprocs} .  This is the max time between kill
-15 and kill -9.   If so, please point me to the appropriate place.

If answer is "no", then I'd like to start a discussion.   Apologies if
this is not the right forum...
--------------------
Applications should be able to request an arbitrary (within limits)
amount of time to respond to system shut-downs.  Currently, shut-down
causes sig 15 to be sent to all processes, followed by sig 9 at most 10
seconds later (see /etc/init.d/sendsigs).  This is not enough time for
some critical apps (e.g. Virtual Machines).  Some way is needed for apps
to declare how much time they need and be given at least that much to
respond to a shutdown event.

To protect against malicious programs delaying shutdown forever,
requests for more than the (e.g. 10 seconds) could be restricted to
EUIDs in a certain group.

One implementation idea is to move the kill15-wait-kill9 logic
from /etc/init.d scripts to a daemon process which runs continuously.
Application processes could register with the daemon, which would keep
track of the time limit for each process individually (the daemon would
keep it's state in a file in case it died and had to be restarted).
Init scripts could tell this "grimreaperd" process when to initiate the
killing, and wait until it was done before continuing with the shut-down
(perhaps using signals).
---------------------
This issue has recent urgency because of the advent of Virtual Machines.

With VMs, the host OS can respond to a shutdown event, e.g. low-battery
detected, but there is currently no way for guest OSs running inside
Virtual Machines to do anything.  The result is that VMs are simply
killed, and the guests experience a "virtual plug-pulled-out" rather
than a controlled shut-down.   VM managers *could* catch sig15 but there
is not enough time for them to save the VM state or, in general, for the
guest OS to shut down cleanly.  This may be catastrophic  (worse than
loss of power in a real system) because virtual disk emulators may hold
arbitrary buffered data which is never written to the host's file
system; this has reportedly caused  unrecoverable fs corruption (see
this thread in the VirtualBox forum).  Even if guests use journalled
file systems they might be hosed because flush-to-disk in a guest does
not guarantee that the data makes it to disk in the host; writes might
even be re-ordered.  So some way is needed for VM managers to have
enough time to respond when the host is shutting down.

IMO it is not just Virtual Machines; any application might need more
time to shut down nowadays because of huge RAM sizes.  When 256M was a
lot of memory, 10 seconds was enough.   But with 24G of ram it could
take several minutes to cleanly turn off large databases or multiple
VMs. 
------------------
All of the above could also be said about killing processes at log-out.
Currently, as far as I know, nobody cares if user processes never exit
-- but we should care, to prevent run-away loopers, etc. (legitimately
persisting processes should setsid() to exit the session if they want to
not be killed).  If a "send sig 9 eventually" policy is implemented for
logout, it would also have to wait the specified delays for apps which
asked for more time.   The above-described daemon could handle
logout-related kills too if it accepted a SessionID parameter (to kill
only  processes with that SID).

Any thoughts?
-Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20090219/ea9d897b/attachment.htm 


More information about the ubuntu-devel mailing list