[Bug 1462533] Re: ovs agent fails to recover gracefully from nic outage

Wed Sep 27 16:20:13 UTC 2017

I suspect that this has more todo the with fact the neutron at icehouse was
not using oslo.messaging and hence some of the re-connect handling for
AMQP connections was not super-fantastic.

Jill - this bug has not had an update in a while; do you see the same
issue on newer OpenStack versions? or was this specific to the
trusty/icehouse deployment detailed in the original bug report?

I'm going to bug task manage this to neutron in ubuntu, rather than on
the charm as I think this is a neutron issue.

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: charm-neutron-openvswitch
       Status: New => Invalid

** Changed in: neutron (Ubuntu)
       Status: New => Incomplete

** Changed in: neutron (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1462533

Title:
  ovs agent fails to recover gracefully from nic outage

Status in OpenStack neutron-openvswitch charm:
  Invalid
Status in neutron package in Ubuntu:
  Incomplete
Status in neutron-openvswitch package in Juju Charms Collection:
  Invalid

Bug description:
  First saw this after an extended maas-dhcp outage and have been able
  to reliably reproduce (in the course of testing for lp:1439649).
  Neutron gateway is smooshed to the same metal as other openstack-ha
  services and juju which is possibly causing a cascade effect.  Trusty,
  Icehouse.

  To reproduce, take down or expire the lease on the private interface
  on the openstack controller node (br0).  We've let this go for at
  least 10 minutes when testing.  When bringing back up the interface,
  services and existing VMs are accessible but newly spawned nova VMs
  are unreachable by ping/ssh (but can be console'd to).  Tcpdump in the
  qrouter netns shows ICMP req going out but no reply.

  The openvswitch agent logs had "error: [Errno 104] Connection reset by
  peer" from a few minutes past in it.  After restarting the agent, the
  nova-compute service and rebuilding new VMs; could reach the VMs after
  that.

  It looks like the openvswitch agent tried, but failed to, reconnect to
  the rabbitmq.  Restarting rabbits has had no impact.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1462533/+subscriptions