[RFC] [PATCH] notify init daemon when children are reparented to it

Tue Dec 16 16:27:26 UTC 2008

Please review and comment on the attached patch.

Background (UNIX 101):

All processes must have a parent.  When a child dies, the parent is
notified by SIGCHLD and must use the wait() system call to reap the
remaining zombie.

When a process dies, its children are reparented to the init daemon so
that there's always a process to be notified of their eventual death.

(The init daemon cannot die.)

As well as a parent, processes also have a process group and a session.
This is quite complicated, so much so that it takes up an entire chapter
of Stevens which few people claim to have read, let alone understood.

It all comes down to connecting the life of a process to a life of a
terminal.  Daemons don't want to be such connected, so they perform a
little dance:

 - they fork(), creating a child
 - the original process (child of the shell) exits, the child carries on
   but is now reparented to init
 - the process calls setsid() to change to a new session and process
   group, it's now completely unconnected from the shell or terminal
 - *but* due to a quirk of POSIX, if it were to be made to open() a tty
   device, it would end up owning it!  FAIL.
 - so the process fork()s again, creating a new child
 - the process (child of the child of the shell) exits, the new child
   carries on and is reparented to init

Thus the daemon is a child of init, and in its own process group and
session which is not connected to any shell or terminal.  Win.

Well, almost a win.  The trouble is that this dance also happens to
completely disconnect it from any kind of process supervisor.

It wouldn't be so bad, except that most well-written daemons don't
actually daemonise until after they've finished initialisation - they're
even usually listening on the right socket and everything.  The
daemonisation is more than just an escape from the shell and terminal,
it's notification that they are ready.

We want to be able to supervise daemons.

Init has a head-start; it's the eventual parent of daemon processes
anyway, so it will be notified of their death by SIGCHLD and receives
their exit status information through wait().

So you can't escape from init.  But this isn't ideal, while init can see
the process death, it has no idea what process that was, and what it was
supposed to do about it.

If there's two apache2 daemons running (in different chroots, or for
different IPs or ports?), it doesn't know which of the two died because
the PID that died is unknown to it.

Likewise it can't provide status information as to whether either is
running or not, since the only PIDs it knew exited immediately after it
ran them.

Why do it in the kernel?:

Frankly because this cannot be done in userspace without the kernel's
help, or without modifying daemon code to behave differently (and
incompatibly with other systems).

The closest I've come to a race-free way to do this so far is by having
init ptrace() every process it runs so it can follow calls to fork() and
exec().

People look at me strangely when they find out about that (plus it
doesn't work so well).

About the patch:

The patch adds a new PR_{GET,SET}_ADOPTSIG prctl, similar to the
existing PR_{GET,SET}_PDEATHSIG control and with similar semantics.

 - When non-zero, the process will receive the given signal if another
   process is reparented to it.

 - This signal has the pid of the reparented process in the si_pid field
   of the siginfo_t.

 - The signal also has the pid of the *previous* parent process in the
   si_status field.

 - Notification is disabled after exec() or setsid().

The functionality only affects the init daemon, and only if the init
daemon activates the prctl().  [There is already other init-daemon
specific code in the kernel, and there are already other specialist
signals activated by prctl() - so this is consistent].

Since the siginfo_t contains useful information, the signal should
generally be >= SIGRTMIN; otherwise only the information from the first
will be received.

From userspace:

The init daemon requests notification of process adoption by realtime
signal, and then assumedly uses sigaction or signalfd to read the
siginfo_t structures.

It tracks the pid of any process it spawns.

Should that process die, it will receive SIGCHLD.  However also pending
will be the requested signal (SIGRTMIN?) with si_status set to the pid
of the original SIGCHLD.

The init daemon reaps the original child, and updates its pid to that of
the new child obtained from the si_pid of the requested signal.

And thus we have simple, race-free supervision of daemon processes by an
init daemon.

Scott
-- 
Scott James Remnant
scott at canonical.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: notify-adoption-3.patch
Type: text/x-patch
Size: 5818 bytes
Desc: 
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20081216/8f521ad7/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20081216/8f521ad7/attachment.sig>