[berrange at redhat.com: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace]

St├ęphane Graber stgraber at ubuntu.com
Tue Aug 7 15:04:37 UTC 2012

On 08/07/2012 11:01 AM, Serge E. Hallyn wrote:
> (Hopefully my unsubscribed account can email kernel-team)
> Hi,
> this patch will probably not hit upstream, because the 'proper' fix is
> user namespaces.  User namespaces however won't be ready until after
> quantal.  So I'd like this patch to be applied in precise and quantal
> if possible.
> Problem:
> Containers are granted CAP_SYS_BOOT.  The reboot path in the kernel checks
> whether you are in the initial pidns, and, if not, sends a signal to your
> parent indicating you were 'rebooted' or 'shut down'.  So there is no
> danger of a container rebooting the host.
> However, CAP_SYS_BOOT also authorized kexec, without the pidns check.
> Therefore, containers are able to kexec a new kernel, which is obviously
> a bad thing.
> This patch prevents that by only allowing kexec from the initial pid
> namespace.  It is nacked by Eric Biederman (but acked by me) because
> he feels this should be stopped by having the container in a private
> user namespace, with the kexec cap_sys_boot check targeted to the initial
> user namespace.  As I said, that won't be doable during quantal timeframe.
> thanks,
> -serge

Sounds reasonable to include that one as we've been including Daniel
Lezcano's reboot patch in 12.04, that's just making things consistent
and fixing a pretty serious security hole in our usually fairly secure
defaults for containers in 12.04 and 12.10.

> ----- Forwarded message from "Daniel P. Berrange" <berrange at redhat.com> -----
> Date: Fri,  3 Aug 2012 11:53:04 +0100
> From: "Daniel P. Berrange" <berrange at redhat.com>
> To: linux-kernel at vger.kernel.org
> Cc: containers at lists.linux-foundation.org,
> 	"Daniel P. Berrange" <berrange at redhat.com>,
> 	Serge Hallyn <serge.hallyn at canonical.com>,
> 	Daniel Lezcano <daniel.lezcano at free.fr>,
> 	Michael Kerrisk <mtk.manpages at gmail.com>,
> 	"Eric W. Biederman" <ebiederm at xmission.com>,
> 	Tejun Heo <tj at kernel.org>, Oleg Nesterov <oleg at redhat.com>
> Subject: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace
> From: "Daniel P. Berrange" <berrange at redhat.com>
> The following commit
>     commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>     Author: Daniel Lezcano <daniel.lezcano at free.fr>
>     Date:   Wed Mar 28 14:42:51 2012 -0700
>     pidns: add reboot_pid_ns() to handle the reboot syscall
> introduced custom handling of the reboot() syscall when invoked
> from a non-initial PID namespace. The intent was that a process
> in a container can be allowed to keep CAP_SYS_BOOT and execute
> reboot() to shutdown/reboot just their private container, rather
> than the host.
> Unfortunately the kexec_load() syscall also relies on the
> CAP_SYS_BOOT capability. So by allowing a container to keep
> this capability to safely invoke reboot(), they mistakenly
> also gain the ability to use kexec_load(). The solution is
> to make kexec_load() return -EPERM if invoked from a PID
> namespace that is not the initial namespace
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> Cc: Serge Hallyn <serge.hallyn at canonical.com>
> Cc: Daniel Lezcano <daniel.lezcano at free.fr>
> Cc: Michael Kerrisk <mtk.manpages at gmail.com>
> Cc: "Eric W. Biederman" <ebiederm at xmission.com>
> Cc: Tejun Heo <tj at kernel.org>
> Cc: Oleg Nesterov <oleg at redhat.com>
> ---
>  kernel/kexec.c | 5 +++++
>  1 file changed, 5 insertions(+)
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 0668d58..b152bde 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
>  	if (!capable(CAP_SYS_BOOT))
>  		return -EPERM;
> +	/* Processes in containers must not be allowed to load a new
> +	 * kernel, even if they have CAP_SYS_BOOT */
> +	if (task_active_pid_ns(current) != &init_pid_ns)
> +		return -EPERM;
> +
>  	/*
>  	 * Verify we have a legal set of flags
>  	 * This leaves us room for future extensions.

St├ęphane Graber
Ubuntu developer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20120807/fe4d4893/attachment.sig>

More information about the kernel-team mailing list