[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

James Hebden 1749425 at bugs.launchpad.net
Wed Feb 14 21:01:42 UTC 2018


@james-page, @axino -

Just a +1 to the HA property being changed requiring the router to be
set down prior, and back up after to start the recreation of the router
as HA.

We have seen various other side effects in Neutron/OVS environments and specifically the environment in question, such as -
* Missing interfaces inside qrouter namespaces (OVS taps)
* Missing iptables rules 
* Missing floating IP aliases on OVS interfaces inside the qrouter namespaces
All of which are tasks which are performed during bringup of HA routers. We have seen fewer of these issues on non-HA routers, and whether the router is HA or not, rescheduling the router or converting from HA to non-HA or vice versa will rebuild and as a result repair the router.

I should also point out that at the time of these issues, we have rarely
observed high system load, but I do also agree that the number of
routers and therefore the workload on both Neutron and OVS to
orchestrate interface plugging and unplugging and namespace (and
associated network stack plumbing) work is much higher than a typical
environment. Having three servers doing this work rather than scaling
horizontally seems like it might be exposing bottlenecks in either
Neutron or OVS when it comes to the orchestration of these tasks.

I'm not sure if you are seeing the following traceback in the logs
provided, but the below traceback has also been common when this issue
crops up, and shows an example of a task performed during the bringup of
a router (the IPTablesManager initialisation) falling over.

2018-02-14 05:04:32.101 1352665 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:158
2018-02-14 05:04:32.103 1352665 DEBUG neutron.agent.linux.iptables_manager [-] IPTablesManager.apply completed with success. 0 iptables commands were issued _apply_synchronized /usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py:576
2018-02-14 05:04:32.103 1352665 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "iptables-qrouter-43801324-72ce-469f-a628-a5c645041e30" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:228
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 253, in call
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     return func(*args, **kwargs)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 1115, in process
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     self.process_external()
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 890, in process_external
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     self._process_external_gateway(ex_gw_port)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 777, in _process_external_gateway
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     self.external_gateway_updated(ex_gw_port, interface_name)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 403, in external_gateway_updated
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     self._remove_vip(old_gateway_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 202, in _remove_vip
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info     instance.remove_vip_by_ip_address(ip_cidr)
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.103 1352665 ERROR neutron.agent.l3.router_info 
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: 43801324-72ce-469f-a628-a5c645041e30
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 517, in _process_router_update
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self._process_router_if_compatible(router)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 454, in _process_router_if_compatible
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self._process_updated_router(router)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 469, in _process_updated_router
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     ri.process()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 426, in process
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     super(HaRouter, self).process()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 256, in call
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self.logger(e)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self.force_reraise()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     six.reraise(self.type_, self.value, self.tb)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 253, in call
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     return func(*args, **kwargs)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 1115, in process
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self.process_external()
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 890, in process_external
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self._process_external_gateway(ex_gw_port)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 777, in _process_external_gateway
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self.external_gateway_updated(ex_gw_port, interface_name)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 403, in external_gateway_updated
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     self._remove_vip(old_gateway_cidr)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 202, in _remove_vip
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent     instance.remove_vip_by_ip_address(ip_cidr)
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'remove_vip_by_ip_address'
2018-02-14 05:04:32.104 1352665 ERROR neutron.agent.l3.agent

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

Status in neutron:
  New
Status in openvswitch package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 16.04.3 LTS
  Release:        16.04
  Linux 4.4.0-96-generic on AMD64
  Neutron 2:10.0.4-0ubuntu2~cloud0 from Cloud Archive xenial-updates/ocata
  OpenVSwitch 2.6.1-0ubuntu5.2~cloud0 from Cloud Archive xenial-upates/ocata

  In an environment with three bare-metal Neutron deployments, hosting
  upward of 300 routers, with approximately the same number of
  instances, typically one router per instance, packet loss on instances
  accessed via floating IPs, including complete connectivity loss, is
  experienced. The problem is exacerbated by enabling L3HA, likely due
  to the increase in router namespaces to be scheduled and managed, and
  the additional scheduling work of bringing up keepalived and
  monitoring the keepalived VIP.

  Reducing the number of routers and rescheduling routers on new hosts,
  causing the routers to undergo a full recreation of namespace,
  iptables rules, and replugging of interfaces into OVS will correct
  packet loss or connectivity loss on impacted routers.

  On Neutron hosts in this environment, we have used systemtap to trace
  calls to kfree_skb which reveals the majority of dropped packets occur
  in the openvswitch module, notably on the br-int bridge. Inspecting
  the state of OVS shows many qtap interfaces which are no longer
  present on the Neutron host which are still plugged in to OVS.

  Diagnostic outputs in following comments.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list