[berrange at redhat.com: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace]

Serge E. Hallyn serge at hallyn.com
Tue Aug 7 15:01:27 UTC 2012


(Hopefully my unsubscribed account can email kernel-team)

Hi,

this patch will probably not hit upstream, because the 'proper' fix is
user namespaces.  User namespaces however won't be ready until after
quantal.  So I'd like this patch to be applied in precise and quantal
if possible.

Problem:

Containers are granted CAP_SYS_BOOT.  The reboot path in the kernel checks
whether you are in the initial pidns, and, if not, sends a signal to your
parent indicating you were 'rebooted' or 'shut down'.  So there is no
danger of a container rebooting the host.

However, CAP_SYS_BOOT also authorized kexec, without the pidns check.
Therefore, containers are able to kexec a new kernel, which is obviously
a bad thing.

This patch prevents that by only allowing kexec from the initial pid
namespace.  It is nacked by Eric Biederman (but acked by me) because
he feels this should be stopped by having the container in a private
user namespace, with the kexec cap_sys_boot check targeted to the initial
user namespace.  As I said, that won't be doable during quantal timeframe.

thanks,
-serge

----- Forwarded message from "Daniel P. Berrange" <berrange at redhat.com> -----

Date: Fri,  3 Aug 2012 11:53:04 +0100
From: "Daniel P. Berrange" <berrange at redhat.com>
To: linux-kernel at vger.kernel.org
Cc: containers at lists.linux-foundation.org,
	"Daniel P. Berrange" <berrange at redhat.com>,
	Serge Hallyn <serge.hallyn at canonical.com>,
	Daniel Lezcano <daniel.lezcano at free.fr>,
	Michael Kerrisk <mtk.manpages at gmail.com>,
	"Eric W. Biederman" <ebiederm at xmission.com>,
	Tejun Heo <tj at kernel.org>, Oleg Nesterov <oleg at redhat.com>
Subject: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace

From: "Daniel P. Berrange" <berrange at redhat.com>

The following commit

    commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
    Author: Daniel Lezcano <daniel.lezcano at free.fr>
    Date:   Wed Mar 28 14:42:51 2012 -0700

    pidns: add reboot_pid_ns() to handle the reboot syscall

introduced custom handling of the reboot() syscall when invoked
from a non-initial PID namespace. The intent was that a process
in a container can be allowed to keep CAP_SYS_BOOT and execute
reboot() to shutdown/reboot just their private container, rather
than the host.

Unfortunately the kexec_load() syscall also relies on the
CAP_SYS_BOOT capability. So by allowing a container to keep
this capability to safely invoke reboot(), they mistakenly
also gain the ability to use kexec_load(). The solution is
to make kexec_load() return -EPERM if invoked from a PID
namespace that is not the initial namespace

Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
Cc: Serge Hallyn <serge.hallyn at canonical.com>
Cc: Daniel Lezcano <daniel.lezcano at free.fr>
Cc: Michael Kerrisk <mtk.manpages at gmail.com>
Cc: "Eric W. Biederman" <ebiederm at xmission.com>
Cc: Tejun Heo <tj at kernel.org>
Cc: Oleg Nesterov <oleg at redhat.com>
---
 kernel/kexec.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 0668d58..b152bde 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
 	if (!capable(CAP_SYS_BOOT))
 		return -EPERM;
 
+	/* Processes in containers must not be allowed to load a new
+	 * kernel, even if they have CAP_SYS_BOOT */
+	if (task_active_pid_ns(current) != &init_pid_ns)
+		return -EPERM;
+
 	/*
 	 * Verify we have a legal set of flags
 	 * This leaves us room for future extensions.
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

----- End forwarded message -----




More information about the kernel-team mailing list