On 14/11/12 20:55, Serge Hallyn wrote:
> Quoting Tim Gardner (tim.gardner at
>> On 11/06/2012 09:36 AM, Serge Hallyn wrote:
>>> Hi,
>>> the core of user namespace code has landed upstream, however some more
>>> is needed to run full ubuntu containers in a user namespace.  Some of
>>> this will land in 3.8, but probably not all.  Eric's development tree
>>> is at;a=summary
>>> I have pushed that tree on top of a recent raring tree at
>>> git:// in branch
>>> master.oct25.userns-v70.  It consists of 84 patches (including 5 just
>>> updating under debian/, one by me fix to account for ubuntu delta, and
>>> one not (yet) in Eric's tree to allow tmpfs mounts in a container),
>>> which I can git-email if desired.  The built kernel is in
>>> ppa:serge-hallyn/userns-natty and does allow me to boot a full ubuntu
>>> container in a user namespace - meaning every root owned process and
>>> file is actually owned by userid 100000 on the host and contained.
>>> I'm sending this now in the hopes that whatever bits don't land in
>>> 3.8 can be pushed onto the raring kernel.  Our goal this cycle is to
>>> support user namespaces, and next cycle to support completely
>>> unprivileged creation and starting of containers.
>>> -serge
>> Serge - how about a pull request for a branch that has been rebased
>> on Raring master-next ? I took a quick stab at it and quickly ran
>> into uapi transition conflicts (I think).
> A successfully built kernel is at
> git:// (branch
> master-next.nov14.userns which should be the default).
> -serge

I've got some questions and/or observations about the following commits:

debian changes to build in ppa

	..this fiddles around with the skipabi, skipmodules to allow building 
in a PPA, but we should not pull that into the raring kernel.

net: Allow opening an af_unix socket

sock_open() has:
         /* file->f_flags??? */
         //file->f_flags = O_RDWR | (flags & O_NONBLOCK);

..the comment seems to be alluding to the fact we're not sure if we 
should be setting f->f_flags and that the code was put in during 
development (for testig?) and then commented out.  Anyhow, it's 
confusing and I'm now not sure what this is meant to be doing. Should 
this be removed?

fuse: Teach fuse how to handle the pid namespace.

convert_fuse_file_lock() has:

	fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));

is it seems possible (but unlikely) for find_pid_ns() to return NULL 
which passes NULL into pid_vnr() which in turn passes NULL into
pid_nr_ns() which returns 0. Is a zero pid of fl->fl_pid valid?

devpts: Remove the devpty cleanup special case.

in ptmx_open():

	/* Find the devpts instance we are working with */
	mnt = devpts_mntget(filp);

getpts_mntget() can return ERR_PTR(-ENODEV), so mnt probably needs 
checking for this kind of unlikely failure case.

devpts: Make the newinstance option historical

case Opt_newinstance:  this is now a historical mount option and now 
silently does nothing.  Perhaps we should print some kind of warning or 
info message to indicate this just to warn users that this is now being 
ignored, however this is documented in the changes in 
Documentation/filesystems/devpts.txt so maybe this is totally unnecessary.

net: Push capable(CAP_NET_ADMIN) into the rtnl methods

Comment in this patch:
     "Later patches will remove the extra capable calls from methods
     that are safe for unprivilged users."

..are these later patches in this patch set, if so, which ones are they?

net: Don't export sysctls to unprivileged users


the following change added a '\' which looks like a typo:

-                       kfree(ipvs->lblc_ctl_table);
+                       kfree(ipvs->lblc_ctl_table);\
                 return -ENOMEM;

userns: Convert nfs and nfsd to use kuid/kgid where appropriate


When a gid is not found then a new one is added to the aces[] array. I 
don't see any bounds checking on this, so can this potentially fall off 
the end of the aces[] array at some point?



