[Bug 1462533] Re: ovs agent fails to recover gracefully from nic outage
James Page
james.page at ubuntu.com
Wed Sep 27 16:20:13 UTC 2017
I suspect that this has more todo the with fact the neutron at icehouse was
not using oslo.messaging and hence some of the re-connect handling for
AMQP connections was not super-fantastic.
Jill - this bug has not had an update in a while; do you see the same
issue on newer OpenStack versions? or was this specific to the
trusty/icehouse deployment detailed in the original bug report?
I'm going to bug task manage this to neutron in ubuntu, rather than on
the charm as I think this is a neutron issue.
** Also affects: neutron (Ubuntu)
Importance: Undecided
Status: New
** Changed in: charm-neutron-openvswitch
Status: New => Invalid
** Changed in: neutron (Ubuntu)
Status: New => Incomplete
** Changed in: neutron (Ubuntu)
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1462533
Title:
ovs agent fails to recover gracefully from nic outage
Status in OpenStack neutron-openvswitch charm:
Invalid
Status in neutron package in Ubuntu:
Incomplete
Status in neutron-openvswitch package in Juju Charms Collection:
Invalid
Bug description:
First saw this after an extended maas-dhcp outage and have been able
to reliably reproduce (in the course of testing for lp:1439649).
Neutron gateway is smooshed to the same metal as other openstack-ha
services and juju which is possibly causing a cascade effect. Trusty,
Icehouse.
To reproduce, take down or expire the lease on the private interface
on the openstack controller node (br0). We've let this go for at
least 10 minutes when testing. When bringing back up the interface,
services and existing VMs are accessible but newly spawned nova VMs
are unreachable by ping/ssh (but can be console'd to). Tcpdump in the
qrouter netns shows ICMP req going out but no reply.
The openvswitch agent logs had "error: [Errno 104] Connection reset by
peer" from a few minutes past in it. After restarting the agent, the
nova-compute service and rebuilding new VMs; could reach the VMs after
that.
It looks like the openvswitch agent tried, but failed to, reconnect to
the rabbitmq. Restarting rabbits has had no impact.
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1462533/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list