user namespace delta over 3.7
Colin Ian King
colin.king at canonical.com
Mon Nov 19 14:49:48 UTC 2012
On 14/11/12 20:55, Serge Hallyn wrote:
> Quoting Tim Gardner (tim.gardner at canonical.com):
>> On 11/06/2012 09:36 AM, Serge Hallyn wrote:
>>> the core of user namespace code has landed upstream, however some more
>>> is needed to run full ubuntu containers in a user namespace. Some of
>>> this will land in 3.8, but probably not all. Eric's development tree
>>> is at http://git.kernel.org/?p=linux/kernel/git/ebiederm/user-namespace.git;a=summary
>>> I have pushed that tree on top of a recent raring tree at
>>> git://kernel.ubuntu.com/serge/quantal-userns.git in branch
>>> master.oct25.userns-v70. It consists of 84 patches (including 5 just
>>> updating under debian/, one by me fix to account for ubuntu delta, and
>>> one not (yet) in Eric's tree to allow tmpfs mounts in a container),
>>> which I can git-email if desired. The built kernel is in
>>> ppa:serge-hallyn/userns-natty and does allow me to boot a full ubuntu
>>> container in a user namespace - meaning every root owned process and
>>> file is actually owned by userid 100000 on the host and contained.
>>> I'm sending this now in the hopes that whatever bits don't land in
>>> 3.8 can be pushed onto the raring kernel. Our goal this cycle is to
>>> support user namespaces, and next cycle to support completely
>>> unprivileged creation and starting of containers.
>> Serge - how about a pull request for a branch that has been rebased
>> on Raring master-next ? I took a quick stab at it and quickly ran
>> into uapi transition conflicts (I think).
> A successfully built kernel is at
> git://kernel.ubuntu.com/serge/quantal-userns.git (branch
> master-next.nov14.userns which should be the default).
I've got some questions and/or observations about the following commits:
debian changes to build in ppa
..this fiddles around with the skipabi, skipmodules to allow building
in a PPA, but we should not pull that into the raring kernel.
net: Allow opening an af_unix socket
/* file->f_flags??? */
//file->f_flags = O_RDWR | (flags & O_NONBLOCK);
..the comment seems to be alluding to the fact we're not sure if we
should be setting f->f_flags and that the code was put in during
development (for testig?) and then commented out. Anyhow, it's
confusing and I'm now not sure what this is meant to be doing. Should
this be removed?
fuse: Teach fuse how to handle the pid namespace.
fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
is it seems possible (but unlikely) for find_pid_ns() to return NULL
which passes NULL into pid_vnr() which in turn passes NULL into
pid_nr_ns() which returns 0. Is a zero pid of fl->fl_pid valid?
devpts: Remove the devpty cleanup special case.
/* Find the devpts instance we are working with */
mnt = devpts_mntget(filp);
getpts_mntget() can return ERR_PTR(-ENODEV), so mnt probably needs
checking for this kind of unlikely failure case.
devpts: Make the newinstance option historical
case Opt_newinstance: this is now a historical mount option and now
silently does nothing. Perhaps we should print some kind of warning or
info message to indicate this just to warn users that this is now being
ignored, however this is documented in the changes in
Documentation/filesystems/devpts.txt so maybe this is totally unnecessary.
net: Push capable(CAP_NET_ADMIN) into the rtnl methods
Comment in this patch:
"Later patches will remove the extra capable calls from methods
that are safe for unprivilged users."
..are these later patches in this patch set, if so, which ones are they?
net: Don't export sysctls to unprivileged users
the following change added a '\' which looks like a typo:
userns: Convert nfs and nfsd to use kuid/kgid where appropriate
When a gid is not found then a new one is added to the aces array. I
don't see any bounds checking on this, so can this potentially fall off
the end of the aces array at some point?
More information about the kernel-team