[Bug 1453264] [NEW] iptables_manager can run very slowly when a large number of security group rules are present

Launchpad Bug Tracker 1453264 at bugs.launchpad.net
Mon Aug 29 22:14:39 UTC 2016


You have been subscribed to a public bug by Billy Olsen (billy-olsen):

[Impact]

We have customers that typically add a few hundred security group rules
or more.  We also typically run 30+ VMs per compute node.  When about
10+ VMs with a large SG set all get scheduled to the same node, the L2
agent (OVS) can spend many minutes in the iptables_manager.apply() code,
so much so that by the time all the rules are updated, the VM has
already tried DHCP and failed, leaving it in an unusable state.

While there have been some patches that tried to address this in Juno
and Kilo, they've either not helped as much as necessary, or broken SGs
completely due to re-ordering the of the iptables rules.

I've been able to show some pretty bad scaling with just a handful of
VMs running in devstack based on today's code (May 8th, 2015) from
upstream Openstack.


[Test Case]

Here's what I tested:

1. I created a security group with 1000 TCP port rules (you could
alternately have a smaller number of rules and more VMs, but it's
quicker this way)

2. I booted VMs, specifying both the default and "large" SGs, and timed
from the second it took Neutron to "learn" about the port until it
completed it's work

3. I got a :( pretty quickly

And here's some data:

1-3 VM - didn't time, less than 20 seconds
4th VM - 0:36
5th VM - 0:53
6th VM - 1:11
7th VM - 1:25
8th VM - 1:48
9th VM - 2:14

While it's busy adding the rules, the OVS agent is consuming pretty
close to 100% of a CPU for most of this time (from top):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
25767 stack     20   0  157936  76572   4416 R  89.2  0.5  50:14.28 python

And this is with only ~10K rules at this point!  When we start crossing
the 20K point VM boot failures start to happen.

I'm filing this bug since we need to take a closer look at this in
Liberty and fix it, it's been this way since Havana and needs some TLC.

I've attached a simple script I've used to recreate this, and will start
taking a look at options here.


[Regression Potential]

Minimal since this has been running in upstream stable for several
releases now (Kilo, Liberty, Mitaka).

** Affects: neutron
     Importance: Undecided
     Assignee: Kevin Benton (kevinbenton)
         Status: Fix Released

** Affects: neutron/kilo
     Importance: Undecided
         Status: Fix Released

** Affects: neutron (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: in-feature-pecan in-stable-kilo
-- 
iptables_manager can run very slowly when a large number of security group rules are present
https://bugs.launchpad.net/bugs/1453264
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list