[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale
James Page
james.page at ubuntu.com
Wed Feb 14 17:33:23 UTC 2018
Some sequencing:
unbind from agent message inbound:
2018-02-13 14:12:57.931 2065380 DEBUG neutron.agent.l3.agent [req-
13456a6c-5583-4147-9b47-94026ea7f3b4 b327544aba2a482b9f12f1e6e615c394
9a4311b33381401fbc835c739981ce03 - - -] Got router removed from agent
:{u'router_id': u'213b6544-ab4b-4e46-a5c6-5d8d587a0c6d'}
router_removed_from_agent /usr/lib/python2.7/dist-
packages/neutron/agent/l3/agent.py:419
some sort of update message:
2018-02-13 14:13:00.667 2065380 DEBUG neutron.agent.l3.agent [req-
13456a6c-5583-4147-9b47-94026ea7f3b4 b327544aba2a482b9f12f1e6e615c394
9a4311b33381401fbc835c739981ce03 - - -] Got routers updated notification
:[u'213b6544-ab4b-4e46-a5c6-5d8d587a0c6d'] routers_updated
/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:409
and then we see a state change on the router (goes to master) inferring
that no teardown has occurred since the first message above:
2018-02-13 14:14:33.113 2065380 DEBUG neutron.agent.l3.ha [-] Handling notification for router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d, state master enqueue /usr/lib/python2.7/dist-packages/neutron/agent/l3/ha.py:50
2018-02-13 14:14:33.113 2065380 INFO neutron.agent.l3.ha [-] Router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d transitioned to master
and then:
2018-02-13 14:14:40.380 2065380 DEBUG neutron.agent.l3.ha [-] Spawning
metadata proxy for router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d
_update_metadata_proxy /usr/lib/python2.7/dist-
packages/neutron/agent/l3/ha.py:156
(I think the agent still thinks this is a HA router...)
and then:
2018-02-13 14:14:59.839 2065380 DEBUG neutron.agent.l3.ha [-] Updating
server with HA routers states {'28015629-217b-4eec-b557-6f93a2bb0230':
'active', '237d2839-5687-4104-9270-ff974de59800': 'active',
'266167d5-39e5-4d97-a3a8-45d4ddad9407': 'active', '213b6544-ab4b-
4e46-a5c6-5d8d587a0c6d': 'active', '2367f6bd-
02c1-4ef1-a642-fe54d916fe2e': 'active'} notify_server /usr/lib/python2.7
/dist-packages/neutron/agent/l3/ha.py:177
and slight after:
2018-02-13 14:18:08.275 2065380 ERROR neutron.agent.l3.router_info [-]
'NoneType' object has no attribute 'remove_vip_by_ip_address'
This seems to support the theory that the HA router never actually gets
torn down before the new non-ha router is scheduled to the same network
node, resulting in the agent not really knowing whether the router is
arthur or marther.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1749425
Title:
Neutron integrated with OpenVSwitch drops packets and fails to
plug/unplug interfaces from OVS on router interfaces at scale
Status in neutron:
New
Status in openvswitch package in Ubuntu:
New
Bug description:
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Linux 4.4.0-96-generic on AMD64
Neutron 2:10.0.4-0ubuntu2~cloud0 from Cloud Archive xenial-updates/ocata
OpenVSwitch 2.6.1-0ubuntu5.2~cloud0 from Cloud Archive xenial-upates/ocata
In an environment with three bare-metal Neutron deployments, hosting
upward of 300 routers, with approximately the same number of
instances, typically one router per instance, packet loss on instances
accessed via floating IPs, including complete connectivity loss, is
experienced. The problem is exacerbated by enabling L3HA, likely due
to the increase in router namespaces to be scheduled and managed, and
the additional scheduling work of bringing up keepalived and
monitoring the keepalived VIP.
Reducing the number of routers and rescheduling routers on new hosts,
causing the routers to undergo a full recreation of namespace,
iptables rules, and replugging of interfaces into OVS will correct
packet loss or connectivity loss on impacted routers.
On Neutron hosts in this environment, we have used systemtap to trace
calls to kfree_skb which reveals the majority of dropped packets occur
in the openvswitch module, notably on the br-int bridge. Inspecting
the state of OVS shows many qtap interfaces which are no longer
present on the Neutron host which are still plugged in to OVS.
Diagnostic outputs in following comments.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list