[RFC/Review] Prevent network namespace memory exhaution
Tim Gardner
tim.gardner at canonical.com
Fri Mar 25 12:57:35 UTC 2011
On 03/25/2011 03:46 AM, Daniel Lezcano wrote:
> On 03/25/2011 04:00 AM, Tim Gardner wrote:
>> On 03/24/2011 09:41 AM, Stefan Bader wrote:
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
>>>
>>> This series of patches tries to cover a problem that we caught by
>>> enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
>>> (although the feature was still marked experimental) to support
>>> containerize usecases (and we would get some complaints by
>>> removing it now).
>>>
>>> I tried to come up with some usable solution. Unfortunately picking the
>>> minimal set of patches which prevents the memory buildup, also causes
>>> the
>>> rate of connects (which in that case makes use of network namespace
>>> cloning
>>> a lot) to go down noticeably.
>>>
>>> The second half would improve the situation slightly but still not as
>>> much as it has been achieved in Maverick. And using the Maverick
>>> backport causes other problems in that specific case the bug is
>>> reported.
>>>
>>> To quantify that a bit better:
>>>
>>> Lucid current 10 connections per second
>>> Lucid set 1 1 connection every 2 seconds
>>> Lucid set 2 2 connections every 3 seconds
>>> Maverick 2 connections per second
>>>
>>> There has not been a way to verify how bad the impact of the slowdown
>>> would be in a real production environment. So it might be a viable
>>> approach to limit changes to the first set. Assuming that creating
>>> and destroying namespaces is not the common usecase we have.
>>>
>>> Should there be performance complaints, we still could think of
>>> having a closer look at the second set (or more).
>>>
>>> So generally, does this sound like an approach we can SRU? And
>>> second, more eyes looking at the set(s) would be appreciated.
>>>
>>> -Stefan
>>>
>>> Those are enough to prevent memory being eaten:
>>> * net: Introduce unregister_netdevice_queue()
>>> * net: Introduce unregister_netdevice_many()
>>> * net: add a list_head parameter to dellink() method
>>> * veth: Fix veth_dellink method
>>> * veth: Fix unregister_netdevice_queue for veth
>>> * net: Implement for_each_netdev_reverse.
>>> * net: Batch network namespace destruction.
>>>
>>> Those seem to speed up the number of connects to vsftp per time (though
>>> not as much as Maverick):
>>> * net: Automatically allocate per namespace data.
>>> * net: Add support for batching network namespace cleanups
>>> * netns: Add an explicit rcu_barrier to
>>> unregister_pernet_{device|subsys}
>>> * net: Use rcu lookups in inet_twsk_purge.
>>> * tcp: fix inet_twsk_deschedule()
>>> * net: Batch inet_twsk_purge
>>>
>>> The following changes since commit
>>> 054b34d3a38dc2a775ab722411b934b52a33707f:
>>> Brad Figg (1):
>>> UBUNTU: Ubuntu-2.6.32-31.60
>>>
>>> are available in the git repository at:
>>>
>>> git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2
>>>
>>> Eric Dumazet (5):
>>> net: Introduce unregister_netdevice_queue()
>>> net: Introduce unregister_netdevice_many()
>>> net: add a list_head parameter to dellink() method
>>> veth: Fix veth_dellink method
>>> tcp: fix inet_twsk_deschedule()
>>>
>>> Eric W. Biederman (8):
>>> veth: Fix unregister_netdevice_queue for veth
>>> net: Implement for_each_netdev_reverse.
>>> net: Batch network namespace destruction.
>>> net: Automatically allocate per namespace data.
>>> net: Add support for batching network namespace cleanups
>>> netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
>>> net: Use rcu lookups in inet_twsk_purge.
>>> net: Batch inet_twsk_purge
>>>
>>> drivers/net/macvlan.c | 6 +-
>>> drivers/net/veth.c | 6 +-
>>> include/linux/netdevice.h | 12 ++-
>>> include/net/inet_timewait_sock.h | 6 +-
>>> include/net/net_namespace.h | 32 ++++-
>>> include/net/rtnetlink.h | 3 +-
>>> net/8021q/vlan.c | 8 +-
>>> net/8021q/vlan.h | 2 +-
>>> net/core/dev.c | 120 ++++++++++-----
>>> net/core/net_namespace.c | 296 +++++++++++++++++++++++---------------
>>> net/core/rtnetlink.c | 14 +-
>>> net/ipv4/inet_timewait_sock.c | 47 ++++---
>>> net/ipv4/tcp_ipv4.c | 11 +-
>>> net/ipv6/tcp_ipv6.c | 11 +-
>>> 14 files changed, 369 insertions(+), 205 deletions(-)
>>>
>>
>> Thats a honking big patch set for an SRU. Its not clear to me from the
>> commit logs, but I assume they are all clean cherry-picks ?
>>
>> I'm still not convinced that CONFIG_NET_NS=n isn't the best solution,
>> despite the complaints that change might elicit. I'd like to hear from
>> the consumers of network name spaces about how they are using the
>> feature, and possible workarounds if it were to go away.
>
> The users are heavily using all the namespaces and the cgroup through
> the Linux Containers http://lxc.sourceforge.net
> There is not workaround if it is not set. If you remove this feature,
> IMO people will really complain.
>
> The patchset providing the batching was introduced to speed up the
> network namespace destruction. Before this patch, destroying thousand of
> network namespace was taking a very long time (AFAIR, about 20 minutes).
> With this patchset it takes 2 mins.
>
You aren't telling me _how_ network namespaces are being used. For
example, I know of a commercial workload using vsftp. They didn't set
out to use NET_NS, they just got it for free because the option is
turned on. Not only does NET_NS slow down socket teardown, but it leaks
enough memory that eventually the OOM killer cranks up. While vsftp is
using NET_NS, its not dependent on it and would function perfectly fine
without it, and quite a bit faster.
NET_NS was not ready for prime time in 2.6.32 and should never have been
enabled in an LTS kernel.
rtg
--
Tim Gardner tim.gardner at canonical.com
More information about the kernel-team
mailing list