[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

Serge Hallyn 1168526 at bugs.launchpad.net
Wed Jul 3 21:10:40 UTC 2013

There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below.  However, this alone did not fix it for me, so more is wrong.  There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by using the waitpid loop as below.

>From 7333b77759794f5420ea6898494073a28cac445f Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn at ubuntu.com>
Date: Wed, 3 Jul 2013 14:13:08 -0500
Subject: [PATCH 1/1] start.c:signal_handler: fix wrong assumption about

The monitor process adds a signalfd to the set of fds it watches
with epoll.  It calls signal_handler() when a signal is found in
the sigfd.  That returns 1 if the container init was found to
be the exiting process.

The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one.  So when we get one,
we need to reap all the children we have and check if any is the
container init.

Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
 src/lxc/start.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8c8af9c..3fa50ad 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data,
 			   struct lxc_epoll_descr *descr)
 	struct signalfd_siginfo siginfo;
-	int ret;
+	int ret, status, retval = 0;
 	pid_t *pid = data;
 	ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
 		return 0;
-	/* more robustness, protect ourself from a SIGCHLD sent
-	 * by a process different from the container init
+	/*
+	 * wait for any and all children which are ours which are reapable,
+	 * since if >1 children have exited, we'll only get one sigchld.
-	if (siginfo.ssi_pid != *pid) {
-		WARN("invalid pid for SIGCHLD");
-		return 0;
-	}
+	while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+		if (ret == *pid)
+			retval = 1; // we reaped the container init
 	DEBUG("container init process exited");
-	return 1;
+	return retval;
 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t state)

** Changed in: lxc (Ubuntu)
       Status: Confirmed => Triaged

You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.

  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:

More information about the Ubuntu-server-bugs mailing list