[Bug 1927868] Re: vRouter not working after update to 16.3.1
Jorge Niedbalski
1927868 at bugs.launchpad.net
Thu Jun 24 20:30:53 UTC 2021
Hello,
I reviewed the code path and upgrade in my reproducer, following the approach
of upgrading neutron-gateway and subsequently neutron-api doesn't works because of a mismatch
in the migrations/rpc versions that causes the ha port to fail to be created/updated,
then the keepalived process cannot be spawned and finally the state-change-monitor
fails to find the PID for that keepalived process.
If I upgrade neutron-api, run the migrations to head and then upgrade
the gateways, all seems correct.
I upgraded from the following versions
root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep keepalived
ii keepalived 1:1.3.9-1ubuntu0.18.04.2 amd64 Failover and monitoring daemon for LVS clusters
root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii neutron-common 2:15.3.3-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - common
--> To
root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii neutron-common 2:16.3.2-0ubuntu3~cloud0 all Neutron is a virtual network service for Openstack - common
I created a router with HA enabled as follows
$ openstack router list
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| ID | Name | Status | State | Project | Distributed | HA |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| 09fa811f-410c-4360-8cae-687e7e73ff21 | provider-router | ACTIVE | UP | 6f5aaf5130764305a5d37862e3ff18ce | False | True |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
===> Prior to upgrade I can list the keepalived processed linked to the ha-router
root 22999 0.0 0.0 91816 3052 ? Ss 19:17 0:00
keepalived -P -f /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-
687e7e73ff21/keepalived.conf -p /var/lib/neutron/ha_confs/09fa811f-
410c-4360-8cae-687e7e73ff21.pid.keepalived -r /var/lib/neutron/ha_confs
/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived-vrrp -D
root 23001 0.0 0.1 92084 4088 ? S 19:17 0:00
keepalived -P -f /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-
687e7e73ff21/keepalived.conf -p /var/lib/neutron/ha_confs/09fa811f-
410c-4360-8cae-687e7e73ff21.pid.keepalived -r /var/lib/neutron/ha_confs
/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived-vrrp -D
===> After upgrading -- None is returned, and in fact the keepalived processes aren't spawned
after neutron-* is upgraded.
Pre-upgrade:
Jun 24 19:17:07 juju-da864d-1927868-5 Keepalived[22997]: Starting Keepalived v1.3.9 (10/21,2017)
Jun 24 19:17:07 juju-da864d-1927868-5 Keepalived[22999]: Starting VRRP child process, pid=23001
Post - upgrade -- Not started
Jun 24 19:30:41 juju-da864d-1927868-5 Keepalived[22999]: Stopping
Jun 24 19:30:42 juju-da864d-1927868-5 Keepalived_vrrp[23001]: Stopped
Jun 24 19:30:42 juju-da864d-1927868-5 Keepalived[22999]: Stopped Keepalived v1.3.9 (10/21,2017)
The reason for those keepalived processes not re-spawned is
1) The ml2 process starts the router devices by requesting a rpc call on the device details. This
one fails with different oslo target versions.
Therefore is required for the neutron-api migrations to be applied
before the gateways.
9819:2021-06-24 19:31:09.935 31744 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] Starting to process
devices in:{'current': {'87cfdd45-fea7-4c06-aa13-174cb71b294f',
'b8e18ba0-c65b-498e-9a8b-34c0fcc42d07',
'926b7377-30f4-4b2c-9064-8aab3918a385'}, 'added':
{'87cfdd45-fea7-4c06-aa13-174cb71b294f'}, 'removed': set(), 'updated':
set(), 're_added': set()} rpc_loop /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:2685
9821:2021-06-24 19:31:10.028 31744 ERROR neutron.agent.rpc [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] Failed to get details
for device 87cfdd45-fea7-4c06-aa13-174cb71b294f:
oslo_messaging.rpc.client.RemoteError: Remote error:
InvalidTargetVersion Invalid target version 1.1
9869:2021-06-24 19:31:10.510 31744 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] retrying failed devices
{'87cfdd45-fea7-4c06-aa13-174cb71b294f'}
_update_port_info_failed_devices_stats /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1674
2) Then the l3 ha router creation mechanism can't process the HA router because the HA port id 87cfdd45-fea7-4c06-aa13-174cb71b294f is down
and keepalived cannot be spawned [0] [1]
[0] https://github.com/openstack/neutron/blob/1ad9ca56b07ffdc9f7e0bc6a62af61961b9128eb/neutron/agent/l3/ha_router.py#L519
[1] https://github.com/openstack/neutron/blob/1ad9ca56b07ffdc9f7e0bc6a62af61961b9128eb/neutron/agent/linux/keepalived.py#L455
1971:2021-06-24 19:31:15.034 32459 DEBUG neutron.agent.l3.ha_router [-]
Processing HA router with HA port: {'id':
'87cfdd45-fea7-4c06-aa13-174cb71b294f', 'name': 'HA port tenant
6f5aaf5130764305a5d37862e3ff18ce', 'network_id':
'1a2e73c3-1587-4417-be96-40fde935474b', 'tenant_id': '', 'mac_address':
'fa:16:3e:e2:e0:56', 'admin_state_up': True, 'status': 'DOWN',
'device_id': '09fa811f-410c-4360-8cae-687e7e73ff21', 'device_owner':
'network:router_ha_interface', 'fixed_ips': [{'subnet_id': '6f8bfdbf-
ca04-4847-ac83-f4bd90c089b6', 'ip_address': '169.254.193.135',
'prefixlen': 18}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [],
'security_groups': [], 'description': '', 'binding:vnic_type': 'normal',
'binding:profile': {}, 'binding:host_id': 'juju-da864d-1927868-5',
'binding:vif_type': 'ovs', 'binding:vif_details': {'connectivity': 'l2',
'port_filter': True, 'ovs_hybrid_plug': True, 'datapath_type': 'system',
'bridge_name': 'br-int'}, 'port_security_enabled': False, 'dns_name':
'', 'dns_assignment': [{'ip_address': '169.254.193.135', 'hostname':
'host-169-254-193-135', 'fqdn':
'host-169-254-193-135.1927868.stsstack.qa.1ss.'}], 'dns_domain': '',
'ip_allocation': 'immediate', 'tags': [], 'created_at':
'2021-06-24T19:16:35Z', 'updated_at': '2021-06-24T19:30:59Z',
'revision_number': 5, 'project_id': '', 'subnets': [{'id': '6f8bfdbf-
ca04-4847-ac83-f4bd90c089b6', 'cidr': '169.254.192.0/18', 'gateway_ip':
None, 'dns_nameservers': [], 'ipv6_ra_mode': None, 'subnetpool_id':
None}], 'extra_subnets': [], 'address_scopes': {'4': None, '6': None},
'mtu': 1500} process /usr/lib/python3/dist-
packages/neutron/agent/l3/ha_router.py:513
3) Since the port is down, the keepalived process cannot be started, the 'neutron-keepalived-state-change' agent fails with:
11166:2021-06-24 20:12:53.600 8839 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'neutron-keepalived-state-change', '--router_id=09fa811f-410c-4360-8cae-687e7e73ff21', '--namespace=qrouter-09fa811f-410c-4360-8cae-687e7e73ff21', '--conf_dir=/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21', '--log-file=/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21/neutron-keepalived-state-change.log', '--monitor_interface=ha-87cfdd45-fe', '--monitor_cidr=169.254.0.203/24', '--pid_file=/var/lib/neutron/external/pids/09fa811f-410c-4360-8cae-687e7e73ff21.monitor.pid.neutron-keepalived-state-change-monitor', '--state_path=/var/lib/neutron', '--user=113', '--group=117'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:88
11167:2021-06-24 20:12:55.379 8839 DEBUG neutron.agent.l3.ha_router [-] Router 09fa811f-410c-4360-8cae-687e7e73ff21 neutron-keepalived-state-change-monitor pid 8961 spawn_state_change_monitor /usr/lib/python3/dist-packages/neutron/agent/l3/ha_router.py:428
11182:2021-06-24 20:12:55.611 8839 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived; Error: [Errno 2] No such file or directory: '/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived' get_value_from_file /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:263
11214:2021-06-24 20:12:56.172 8839 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived; Error: [Errno 2] No such file or directory: '/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived' get_value_from_file /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:263
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1927868
Title:
vRouter not working after update to 16.3.1
Status in neutron:
New
Status in neutron package in Ubuntu:
New
Bug description:
We run a juju managed Openstack Ussuri on Bionic. After updating
neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped
working. It seems that most (not all) namespaces are created but have
only the lo interface and sometime the ha-XYZ interface in DOWN state.
The underlying tap interfaces are also in down.
neutron-l3-agent has many logs similar to the following:
2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; router will not work properly
and journal logs report at around the same time
May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down
May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted concurrently
The neutron packages installed are:
ii neutron-common 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - common
ii neutron-dhcp-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - DHCP agent
ii neutron-l3-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - l3 agent
ii neutron-metadata-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metadata agent
ii neutron-metering-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - metering agent
ii neutron-openvswitch-agent 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii python3-neutron 2:16.3.1-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Python library
ii python3-neutron-lib 2.3.0-0ubuntu1~cloud0 all Neutron shared routines and utilities - Python 3.x
ii python3-neutronclient 1:7.1.1-0ubuntu1~cloud0 all client API library for Neutron - Python 3.x
Downgrading to 16.3.0 resolves the issues.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1927868/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list