[Bug 1584647] Re: [SRU] "Interface monitor is not active" can be observed at ovs-agent start

Edward Hope-Morley edward.hope-morley at canonical.com
Wed Mar 28 15:31:06 UTC 2018


** Description changed:

  [Impact]
  
  Requesting to backport to Mitaka since we are seeing this issue in
  Mitaka clouds (and fix landed in Newton) whereby some compute nodes fail
  to have their flows added to br-tun following restart of openvswitch-
  switch.
  
  [Test Case]
  
  * Deploy Openstack Mitaka with one compute host
  * Create an instance with overlay network (gre)
  * Make a note of flows added to br-tun (ovs-vsctl dump-flows br-tun)
  * systemctl restart openvswitch-switch
  * Check that flows are re-added to br-tun (compare with previous output)
  * Ensure you do not see "Interface monitor is not active" in /var/log/neutron/neutron-openvswitch-agent
  
  NOTE: the root cause of this issue is that ovsdb monitor async process
  that neutron-openvswitch-agent starts takes too long to start and is not
  active by the time the rpc_loop tries to poll for updates. It is hard to
  simulate this scenario and as such it is difficult to know whether it
  has happened and resolved by this patch. Nevertheless this patch is
  small and known to have resolved the issue for newer versions of
  Openstack.
  
  [Regression Potential]
  
+ I can't think how this patch could cause a regression. The only possible
+ difference could be that the rpc_loop might take longer to update flows
+ on ovs restart but that in itself would indicate a wider system issue
+ beyond the neutron service that would not constitute a regression.
  
  ---------------
  
  I noticed this error message in neutron-ovs-agent log when start
  neutron-openvswitch-agent
  
  ERROR neutron.agent.linux.ovsdb_monitor [req-a7c7a398-a13b-490e-
  adf8-c5afb24b4b9c None None] Interface monitor is not active.
  
  ovs-agent will start ovsdb_monitor at [1], and first use it at [2].
  There is no guarantee that ovsdb_monitor is ready at [2]. So, I can see
  the error when start neutron-openvswitch-agent.
  
  We should block the start to wait for the process to be active, and then
  use it. Or else, the use of ovsdb_monitor will be meaningless.
  
  [1]
  https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/agent/linux/polling.py#L35
  
  [2]
  https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1994

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1584647

Title:
  [SRU] "Interface monitor is not active" can be observed at ovs-agent
  start

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Invalid
Status in neutron source package in Xenial:
  Triaged

Bug description:
  [Impact]

  Requesting to backport to Mitaka since we are seeing this issue in
  Mitaka clouds (and fix landed in Newton) whereby some compute nodes
  fail to have their flows added to br-tun following restart of
  openvswitch-switch.

  [Test Case]

  * Deploy Openstack Mitaka with one compute host
  * Create an instance with overlay network (gre)
  * Make a note of flows added to br-tun (ovs-vsctl dump-flows br-tun)
  * systemctl restart openvswitch-switch
  * Check that flows are re-added to br-tun (compare with previous output)
  * Ensure you do not see "Interface monitor is not active" in /var/log/neutron/neutron-openvswitch-agent

  NOTE: the root cause of this issue is that ovsdb monitor async process
  that neutron-openvswitch-agent starts takes too long to start and is
  not active by the time the rpc_loop tries to poll for updates. It is
  hard to simulate this scenario and as such it is difficult to know
  whether it has happened and resolved by this patch. Nevertheless this
  patch is small and known to have resolved the issue for newer versions
  of Openstack.

  [Regression Potential]

  I can't think how this patch could cause a regression. The only
  possible difference could be that the rpc_loop might take longer to
  update flows on ovs restart but that in itself would indicate a wider
  system issue beyond the neutron service that would not constitute a
  regression.

  ---------------

  I noticed this error message in neutron-ovs-agent log when start
  neutron-openvswitch-agent

  ERROR neutron.agent.linux.ovsdb_monitor [req-a7c7a398-a13b-490e-
  adf8-c5afb24b4b9c None None] Interface monitor is not active.

  ovs-agent will start ovsdb_monitor at [1], and first use it at [2].
  There is no guarantee that ovsdb_monitor is ready at [2]. So, I can
  see the error when start neutron-openvswitch-agent.

  We should block the start to wait for the process to be active, and
  then use it. Or else, the use of ovsdb_monitor will be meaningless.

  [1]
  https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/agent/linux/polling.py#L35

  [2]
  https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1994

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1584647/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list