[Bug 1460164] Re: restart of openvswitch-switch causes instance network down when l2population enabled
James Page
james.page at ubuntu.com
Fri Dec 18 14:42:51 UTC 2015
On a full agent restart, tunnels are setup:
2015-12-18 14:41:51.505 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Port 826251b8-cc26-435e-9488-16ae67bde4f6 updated. Details: {u'admin_state_up': True, u'network_id': u'15b42697-cf68-4b78-9e19-2d167d0b37cc', u'segmentation_id': 5, u'physical_network': None, u'device': u'826251b8-cc26-435e-9488-16ae67bde4f6', u'port_id': u'826251b8-cc26-435e-9488-16ae67bde4f6', u'network_type': u'gre'}
2015-12-18 14:41:51.506 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Assigning 1 as local vlan for net-id=15b42697-cf68-4b78-9e19-2d167d0b37cc
2015-12-18 14:41:51.757 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-c53ce242-0dc1-41d2-9ab9-de88980dc3ab None] setup_tunnel_port: gre-0a052634 10.5.38.52 gre
2015-12-18 14:41:51.769 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Configuration for device 826251b8-cc26-435e-9488-16ae67bde4f6 completed.
2015-12-18 14:41:51.868 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] setup_tunnel_port: gre-0a052630 10.5.38.48 gre
2015-12-18 14:41:51.974 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] setup_tunnel_port: gre-0a052633 10.5.38.51 gre
but on a openvswitch-switch restart, this does not happen:
5-12-18 14:42:21.836 17767 ERROR neutron.agent.linux.ovsdb_monitor [-] Error received from ovsdb monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End of file)
2015-12-18 14:42:23.103 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Mapping physical network physnet1 to bridge br-data
2015-12-18 14:42:24.923 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Agent tunnel out of sync with plugin!
2015-12-18 14:42:25.188 17767 INFO neutron.agent.securitygroups_rpc [-] Preparing filters for devices set([u'826251b8-cc26-435e-9488-16ae67bde4f6'])
2015-12-18 14:42:25.664 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Port 826251b8-cc26-435e-9488-16ae67bde4f6 updated. Details: {u'admin_state_up': True, u'network_id': u'15b42697-cf68-4b78-9e19-2d167d0b37cc', u'segmentation_id': 5, u'physical_network': None, u'device': u'826251b8-cc26-435e-9488-16ae67bde4f6', u'port_id': u'826251b8-cc26-435e-9488-16ae67bde4f6', u'network_type': u'gre'}
2015-12-18 14:42:25.665 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Assigning 1 as local vlan for net-id=15b42697-cf68-4b78-9e19-2d167d0b37cc
2015-12-18 14:42:26.054 17767 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Configuration for device 826251b8-cc26-435e-9488-16ae67bde4f6 completed.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1460164
Title:
restart of openvswitch-switch causes instance network down when
l2population enabled
Status in neutron:
New
Status in neutron package in Ubuntu:
Triaged
Bug description:
On 2015-05-28, our Landscape auto-upgraded packages on two of our
OpenStack clouds. On both clouds, but only on some compute nodes, the
upgrade of openvswitch-switch and corresponding downtime of
ovs-vswitchd appears to have triggered some sort of race condition
within neutron-plugin-openvswitch-agent leaving it in a broken state;
any new instances come up with non-functional network but pre-existing
instances appear unaffected. Restarting n-p-ovs-agent on the affected
compute nodes is sufficient to work around the problem.
The packages Landscape upgraded (from /var/log/apt/history.log):
Start-Date: 2015-05-28 14:23:07
Upgrade: nova-compute-libvirt:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-login0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), nova-compute-kvm:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), systemd-services:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-common:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), nova-common:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), python-nova:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), libsystemd-daemon0:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), libpam-systemd:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), udev:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), grub2-common:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), openvswitch-switch:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2), libudev1:amd64 (204-5ubuntu20.11, 204-5ubuntu20.12), isc-dhcp-client:amd64 (4.2.4-7ubuntu12.1, 4.2.4-7ubuntu12.2), python-eventlet:amd64 (0.13.0-1ubuntu2, 0.13.0-1ubuntu2.1), python-novaclient:amd64 (2.17.0-0ubuntu1.1, 2.17.0-0ubuntu1.2), grub-pc-bin:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), grub-pc:amd64 (2.02~beta2-9ubuntu1.1, 2.02~beta2-9ubuntu1.2), nova-compute:amd64 (2014.1.4-0ubuntu2, 2014.1.4-0ubuntu2.1), openvswitch-common:amd64 (2.0.2-0ubuntu0.14.04.1, 2.0.2-0ubuntu0.14.04.2)
End-Date: 2015-05-28 14:24:47
From /var/log/neutron/openvswitch-agent.log:
2015-05-28 14:24:18.336 47866 ERROR neutron.agent.linux.ovsdb_monitor
[-] Error received from ovsdb monitor: ovsdb-client:
unix:/var/run/openvswitch/db.sock: receive failed (End of file)
Looking at a stuck instances, all the right tunnels and bridges and
what not appear to be there:
root at vector:~# ip l l | grep c-3b
460002: qbr7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
460003: qvo7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
460004: qvb7ed8b59c-3b: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UP mode DEFAULT group default qlen 1000
460005: tap7ed8b59c-3b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr7ed8b59c-3b state UNKNOWN mode DEFAULT group default qlen 500
root at vector:~# ovs-vsctl list-ports br-int | grep c-3b
qvo7ed8b59c-3b
root at vector:~#
But I can't ping the unit from within the qrouter-${id} namespace on
the neutron gateway. If I tcpdump the {q,t}*c-3b interfaces, I don't
see any traffic.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1460164/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list