[Bug 1749425] Re: Neutron integrated with OpenVSwitch drops packets and fails to plug/unplug interfaces from OVS on router interfaces at scale

James Page james.page at ubuntu.com
Wed Feb 14 17:39:11 UTC 2018


OK and now:

2018-02-13 14:18:08.281 2065380 DEBUG neutron.agent.l3.agent [-]
Starting router update for 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d, action
1, priority 0 _process_router_update /usr/lib/python2.7/dist-
packages/neutron/agent/l3/agent.py:483

which I think is the actual processing of the message from the work
queue from 14:12:57 (wow).

action 1 is a DELETE_ROUTER.

this fails:

2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent [-] Error while deleting router 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 366, in _safe_router_removed
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     self._router_removed(router_id)
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 385, in _router_removed
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     ri.delete()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 420, in delete
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     self.destroy_state_change_monitor(self.process_monitor)
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 355, in destroy_state_change_monitor
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     pm = self._get_state_change_monitor_process_manager()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 326, in _get_state_change_monitor_process_manager
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     default_cmd_callback=self._get_state_change_monitor_callback())
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 330, in _get_state_change_monitor_callback
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     ha_cidr = self._get_primary_vip()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 166, in _get_primary_vip
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent     return self._get_keepalived_instance().get_primary_vip()
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_primary_vip'
2018-02-13 14:18:09.959 2065380 ERROR neutron.agent.l3.agent
2018-02-13 14:18:09.960 2065380 DEBUG neutron.agent.l3.agent [-] Finished a router update for 213b6544-ab4b-4e46-a5c6-5d8d587a0c6d _process_router_update /usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:513

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1749425

Title:
  Neutron integrated with OpenVSwitch drops packets and fails to
  plug/unplug interfaces from OVS on router interfaces at scale

Status in neutron:
  New
Status in openvswitch package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 16.04.3 LTS
  Release:        16.04
  Linux 4.4.0-96-generic on AMD64
  Neutron 2:10.0.4-0ubuntu2~cloud0 from Cloud Archive xenial-updates/ocata
  OpenVSwitch 2.6.1-0ubuntu5.2~cloud0 from Cloud Archive xenial-upates/ocata

  In an environment with three bare-metal Neutron deployments, hosting
  upward of 300 routers, with approximately the same number of
  instances, typically one router per instance, packet loss on instances
  accessed via floating IPs, including complete connectivity loss, is
  experienced. The problem is exacerbated by enabling L3HA, likely due
  to the increase in router namespaces to be scheduled and managed, and
  the additional scheduling work of bringing up keepalived and
  monitoring the keepalived VIP.

  Reducing the number of routers and rescheduling routers on new hosts,
  causing the routers to undergo a full recreation of namespace,
  iptables rules, and replugging of interfaces into OVS will correct
  packet loss or connectivity loss on impacted routers.

  On Neutron hosts in this environment, we have used systemtap to trace
  calls to kfree_skb which reveals the majority of dropped packets occur
  in the openvswitch module, notably on the br-int bridge. Inspecting
  the state of OVS shows many qtap interfaces which are no longer
  present on the Neutron host which are still plugged in to OVS.

  Diagnostic outputs in following comments.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1749425/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list