ACK w/cmnt: [berrange at [PATCH] Forbid invocation of kexec_load() outside initial PID namespace]

Stefan Bader stefan.bader at
Tue Aug 7 15:27:33 UTC 2012

On 07.08.2012 17:01, Serge E. Hallyn wrote:
> (Hopefully my unsubscribed account can email kernel-team)
> Hi,
> this patch will probably not hit upstream, because the 'proper' fix is
> user namespaces.  User namespaces however won't be ready until after
> quantal.  So I'd like this patch to be applied in precise and quantal
> if possible.

On one hand I hate to deviate, but I guess I'd hate even more if someone started
a new kernel from a container. And at least for Precise I would
not really want to backport user namespaces, maybe not even for quantal.
Given that, the patch looks sensible enough to SRU. But there should be
a launchpad bug for that, or did I miss the link?

> Problem:
> Containers are granted CAP_SYS_BOOT.  The reboot path in the kernel checks
> whether you are in the initial pidns, and, if not, sends a signal to your
> parent indicating you were 'rebooted' or 'shut down'.  So there is no
> danger of a container rebooting the host.
> However, CAP_SYS_BOOT also authorized kexec, without the pidns check.
> Therefore, containers are able to kexec a new kernel, which is obviously
> a bad thing.
> This patch prevents that by only allowing kexec from the initial pid
> namespace.  It is nacked by Eric Biederman (but acked by me) because
> he feels this should be stopped by having the container in a private
> user namespace, with the kexec cap_sys_boot check targeted to the initial
> user namespace.  As I said, that won't be doable during quantal timeframe.
> thanks,
> -serge
> ----- Forwarded message from "Daniel P. Berrange" <berrange at> -----
> Date: Fri,  3 Aug 2012 11:53:04 +0100
> From: "Daniel P. Berrange" <berrange at>
> To: linux-kernel at
> Cc: containers at,
> 	"Daniel P. Berrange" <berrange at>,
> 	Serge Hallyn <serge.hallyn at>,
> 	Daniel Lezcano <daniel.lezcano at>,
> 	Michael Kerrisk <mtk.manpages at>,
> 	"Eric W. Biederman" <ebiederm at>,
> 	Tejun Heo <tj at>, Oleg Nesterov <oleg at>
> Subject: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace
> From: "Daniel P. Berrange" <berrange at>
> The following commit
>     commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>     Author: Daniel Lezcano <daniel.lezcano at>
>     Date:   Wed Mar 28 14:42:51 2012 -0700
>     pidns: add reboot_pid_ns() to handle the reboot syscall
> introduced custom handling of the reboot() syscall when invoked
> from a non-initial PID namespace. The intent was that a process
> in a container can be allowed to keep CAP_SYS_BOOT and execute
> reboot() to shutdown/reboot just their private container, rather
> than the host.
> Unfortunately the kexec_load() syscall also relies on the
> CAP_SYS_BOOT capability. So by allowing a container to keep
> this capability to safely invoke reboot(), they mistakenly
> also gain the ability to use kexec_load(). The solution is
> to make kexec_load() return -EPERM if invoked from a PID
> namespace that is not the initial namespace
> Signed-off-by: Daniel P. Berrange <berrange at>
> Cc: Serge Hallyn <serge.hallyn at>
> Cc: Daniel Lezcano <daniel.lezcano at>
> Cc: Michael Kerrisk <mtk.manpages at>
> Cc: "Eric W. Biederman" <ebiederm at>
> Cc: Tejun Heo <tj at>
> Cc: Oleg Nesterov <oleg at>
> ---
>  kernel/kexec.c | 5 +++++
>  1 file changed, 5 insertions(+)
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 0668d58..b152bde 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
>  	if (!capable(CAP_SYS_BOOT))
>  		return -EPERM;
> +	/* Processes in containers must not be allowed to load a new
> +	 * kernel, even if they have CAP_SYS_BOOT */
> +	if (task_active_pid_ns(current) != &init_pid_ns)
> +		return -EPERM;
> +
>  	/*
>  	 * Verify we have a legal set of flags
>  	 * This leaves us room for future extensions.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <>

More information about the kernel-team mailing list