[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

Serge Hallyn 1168526 at bugs.launchpad.net
Wed Jul 3 21:10:40 UTC 2013


There is at least one fundamental bug in start.c's signal_handler, as should
be fixed by the below.  However, this alone did not fix it for me, so more is wrong.  There is a minimal testcase at http://people.canonical.com/~serge/signalfd.c which originally reproduced this bug, then was fixed by using the waitpid loop as below.

>From 7333b77759794f5420ea6898494073a28cac445f Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn at ubuntu.com>
Date: Wed, 3 Jul 2013 14:13:08 -0500
Subject: [PATCH 1/1] start.c:signal_handler: fix wrong assumption about
 sigchld

The monitor process adds a signalfd to the set of fds it watches
with epoll.  It calls signal_handler() when a signal is found in
the sigfd.  That returns 1 if the container init was found to
be the exiting process.

The flaw in reasoning (pointed out by Andy Whitcroft - thanks!)
here is that if two children have exited, we assume we will get
two sigchilds - in fact we may only get one.  So when we get one,
we need to reap all the children we have and check if any is the
container init.

Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
---
 src/lxc/start.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8c8af9c..3fa50ad 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -162,7 +162,7 @@ static int signal_handler(int fd, void *data,
 			   struct lxc_epoll_descr *descr)
 {
 	struct signalfd_siginfo siginfo;
-	int ret;
+	int ret, status, retval = 0;
 	pid_t *pid = data;
 
 	ret = read(fd, &siginfo, sizeof(siginfo));
@@ -188,16 +188,16 @@ static int signal_handler(int fd, void *data,
 		return 0;
 	}
 
-	/* more robustness, protect ourself from a SIGCHLD sent
-	 * by a process different from the container init
+	/*
+	 * wait for any and all children which are ours which are reapable,
+	 * since if >1 children have exited, we'll only get one sigchld.
 	 */
-	if (siginfo.ssi_pid != *pid) {
-		WARN("invalid pid for SIGCHLD");
-		return 0;
-	}
+	while ((ret = waitpid(-1, &status, WNOHANG)) > 0)
+		if (ret == *pid)
+			retval = 1; // we reaped the container init
 
 	DEBUG("container init process exited");
-	return 1;
+	return retval;
 }
 
 int lxc_set_state(const char *name, struct lxc_handler *handler, lxc_state_t state)
-- 
1.8.1.2


** Changed in: lxc (Ubuntu)
       Status: Confirmed => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions



More information about the Ubuntu-server-bugs mailing list