[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale
1749425 at bugs.launchpad.net
Wed Feb 14 23:32:00 UTC 2018
You described a lot of issues in comment #14:
* Missing interfaces inside qrouter namespaces (OVS taps)
* Missing iptables rules
* Missing floating IP aliases on OVS interfaces inside the qrouter namespaces
Some of those might be fixed in master, especially the iptables one, and
should have been cherry-picked to the stable branches but probably only
to Ocata. The "add floating ip" path should re-queue the message and
retry in a second or two, if it doesn't then please see if there is a
trackback and put the info here or another bug.
There could also be something happening with keepalived where it's not
getting things done, since it is managing the VIPs when HA is enabled.
Finally, regarding the traceback, I've never seen that before. My first
thought is to sprinkle "if instance" in all those code paths, but maybe
there's something else going on here that we should figure out. For
example, if the initial creation of the instance failed, then a message
came to add a floating IP, returning without doing anything (not
instance case) isn't what we want to do. This would require some log
examination to figure out what exactly happened.
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
Neutron integrated with OpenVSwitch drops packets and fails to
plug/unplug interfaces from OVS on router interfaces at scale
Status in neutron:
Status in openvswitch package in Ubuntu:
Description: Ubuntu 16.04.3 LTS
Linux 4.4.0-96-generic on AMD64
Neutron 2:10.0.4-0ubuntu2~cloud0 from Cloud Archive xenial-updates/ocata
OpenVSwitch 2.6.1-0ubuntu5.2~cloud0 from Cloud Archive xenial-upates/ocata
In an environment with three bare-metal Neutron deployments, hosting
upward of 300 routers, with approximately the same number of
instances, typically one router per instance, packet loss on instances
accessed via floating IPs, including complete connectivity loss, is
experienced. The problem is exacerbated by enabling L3HA, likely due
to the increase in router namespaces to be scheduled and managed, and
the additional scheduling work of bringing up keepalived and
monitoring the keepalived VIP.
Reducing the number of routers and rescheduling routers on new hosts,
causing the routers to undergo a full recreation of namespace,
iptables rules, and replugging of interfaces into OVS will correct
packet loss or connectivity loss on impacted routers.
On Neutron hosts in this environment, we have used systemtap to trace
calls to kfree_skb which reveals the majority of dropped packets occur
in the openvswitch module, notably on the br-int bridge. Inspecting
the state of OVS shows many qtap interfaces which are no longer
present on the Neutron host which are still plugged in to OVS.
Diagnostic outputs in following comments.
To manage notifications about this bug go to:
More information about the Ubuntu-openstack-bugs