ACK w/cmnt: [berrange at redhat.com: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace]

Stefan Bader stefan.bader at canonical.com
Tue Aug 7 15:27:33 UTC 2012


On 07.08.2012 17:01, Serge E. Hallyn wrote:
> (Hopefully my unsubscribed account can email kernel-team)
> 
> Hi,
> 
> this patch will probably not hit upstream, because the 'proper' fix is
> user namespaces.  User namespaces however won't be ready until after
> quantal.  So I'd like this patch to be applied in precise and quantal
> if possible.

On one hand I hate to deviate, but I guess I'd hate even more if someone started
a new kernel from a container. And at least for Precise I would
not really want to backport user namespaces, maybe not even for quantal.
Given that, the patch looks sensible enough to SRU. But there should be
a launchpad bug for that, or did I miss the link?

> 
> Problem:
> 
> Containers are granted CAP_SYS_BOOT.  The reboot path in the kernel checks
> whether you are in the initial pidns, and, if not, sends a signal to your
> parent indicating you were 'rebooted' or 'shut down'.  So there is no
> danger of a container rebooting the host.
> 
> However, CAP_SYS_BOOT also authorized kexec, without the pidns check.
> Therefore, containers are able to kexec a new kernel, which is obviously
> a bad thing.
> 
> This patch prevents that by only allowing kexec from the initial pid
> namespace.  It is nacked by Eric Biederman (but acked by me) because
> he feels this should be stopped by having the container in a private
> user namespace, with the kexec cap_sys_boot check targeted to the initial
> user namespace.  As I said, that won't be doable during quantal timeframe.
> 
> thanks,
> -serge
> 
> ----- Forwarded message from "Daniel P. Berrange" <berrange at redhat.com> -----
> 
> Date: Fri,  3 Aug 2012 11:53:04 +0100
> From: "Daniel P. Berrange" <berrange at redhat.com>
> To: linux-kernel at vger.kernel.org
> Cc: containers at lists.linux-foundation.org,
> 	"Daniel P. Berrange" <berrange at redhat.com>,
> 	Serge Hallyn <serge.hallyn at canonical.com>,
> 	Daniel Lezcano <daniel.lezcano at free.fr>,
> 	Michael Kerrisk <mtk.manpages at gmail.com>,
> 	"Eric W. Biederman" <ebiederm at xmission.com>,
> 	Tejun Heo <tj at kernel.org>, Oleg Nesterov <oleg at redhat.com>
> Subject: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace
> 
> From: "Daniel P. Berrange" <berrange at redhat.com>
> 
> The following commit
> 
>     commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>     Author: Daniel Lezcano <daniel.lezcano at free.fr>
>     Date:   Wed Mar 28 14:42:51 2012 -0700
> 
>     pidns: add reboot_pid_ns() to handle the reboot syscall
> 
> introduced custom handling of the reboot() syscall when invoked
> from a non-initial PID namespace. The intent was that a process
> in a container can be allowed to keep CAP_SYS_BOOT and execute
> reboot() to shutdown/reboot just their private container, rather
> than the host.
> 
> Unfortunately the kexec_load() syscall also relies on the
> CAP_SYS_BOOT capability. So by allowing a container to keep
> this capability to safely invoke reboot(), they mistakenly
> also gain the ability to use kexec_load(). The solution is
> to make kexec_load() return -EPERM if invoked from a PID
> namespace that is not the initial namespace
> 
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> Cc: Serge Hallyn <serge.hallyn at canonical.com>
> Cc: Daniel Lezcano <daniel.lezcano at free.fr>
> Cc: Michael Kerrisk <mtk.manpages at gmail.com>
> Cc: "Eric W. Biederman" <ebiederm at xmission.com>
> Cc: Tejun Heo <tj at kernel.org>
> Cc: Oleg Nesterov <oleg at redhat.com>
> ---
>  kernel/kexec.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 0668d58..b152bde 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
>  	if (!capable(CAP_SYS_BOOT))
>  		return -EPERM;
>  
> +	/* Processes in containers must not be allowed to load a new
> +	 * kernel, even if they have CAP_SYS_BOOT */
> +	if (task_active_pid_ns(current) != &init_pid_ns)
> +		return -EPERM;
> +
>  	/*
>  	 * Verify we have a legal set of flags
>  	 * This leaves us room for future extensions.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20120807/9cc24fd3/attachment.sig>


More information about the kernel-team mailing list