Lucid SRU - UBUNTU: SAUCE: netns: Add quota for number of NET_NS instances.
tim.gardner at canonical.com
Fri Dec 2 01:05:04 UTC 2011
On 12/01/2011 03:39 PM, Brad Figg wrote:
> On 12/01/2011 01:48 PM, Tim Gardner wrote:
>> Please consider this (untested) patch for inclusion in Lucid. See the
>> discussion in http://bugs.launchpad.net/bugs/790863 for arguments
>> proposing to restore CONFIG_NET_NS.
>> I'll post a test kernel to the bug in awhile.
>> One of the issues I have with this patch is that it appears that any
>> consumer of network name spaces will have to initially write a
>> non-zero value to netns_max before _any_ name spaces can be
>> successfully allocated. If copy_net_ns() fails in
>> create_new_namespaces(), then it seems the whole allocation is buggered.
> If you follow the thread that starts at:
> you will see that Tetsuo actually proposed a modified
> version of this patch: http://www.spinics.net/lists/netdev/msg180360.html.
I did see the second version, but its more complicated and I'm not
convinced that it solves the OOM better then the first (simpler) patch.
This is my (perhaps incorrect) model of the problem.
Consider the workload that first brought this issue to light. vsftpd
receives a login request for which it forks a process and indirectly
allocates a network name space. Eventually the login process terminates
and synchronously frees all of its resources except the network
namespace (which is now on an RCU list to be freed later). Now imagine
this happening at a sufficiently high rate that the lower priority RCU
thread never gets to run and free its list elements. Eventually all slab
space is exhausted and the OOM killer cranks up.
So, the first patch simply synchronously returns an error if the number
of network name spaces exceeds the specified maximum. This happens
within the context of the fork, the login process is aborted, and the
remote user is told to buzz off.
With the second patch, once the maximum number of network name spaces
has been reached, the fork _waits_ until a name space is free (having
already consumed some non-zero amount of task structure memory). In the
meantime login requests continue to pour in and vsftpd attempts to fork
still more processes which consume still more memory. If the login
attempt rate is sufficiently high, then I think the forks will
eventually start to fail when they cannot allocate task structure memory.
Of course, with either patch failure recovery is deferred to user space,
but I'm not convinced that the end result is any different.
With both patches, vsftpd fails a login attempt when there are
insufficient resources, so why not use the simpler approach ?
Tim Gardner tim.gardner at canonical.com
More information about the kernel-team