[Bug 1927868] Re: vRouter not working after update to 16.3.1

Jorge Niedbalski 1927868 at bugs.launchpad.net
Thu Jun 24 20:30:53 UTC 2021


Hello,

I reviewed the code path and upgrade in my reproducer, following the approach
of upgrading neutron-gateway and subsequently neutron-api doesn't works because of a mismatch
in the migrations/rpc versions that causes the ha port to fail to be created/updated,
then the keepalived process cannot be spawned and finally the state-change-monitor
fails to find the PID for that keepalived process.  

If I upgrade neutron-api, run the migrations to head and then upgrade
the gateways, all seems correct.

I upgraded from the following versions

root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep keepalived
ii  keepalived                           1:1.3.9-1ubuntu0.18.04.2                                    amd64        Failover and monitoring daemon for LVS clusters

root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii  neutron-common                       2:15.3.3-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - common

--> To

root at juju-da864d-1927868-5:/home/ubuntu# dpkg -l |grep neutron-common
ii  neutron-common                       2:16.3.2-0ubuntu3~cloud0                                    all          Neutron is a virtual network service for Openstack - common


I created a router with HA enabled as follows


$ openstack router list
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| ID                                   | Name            | Status | State | Project                          | Distributed | HA   |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+
| 09fa811f-410c-4360-8cae-687e7e73ff21 | provider-router | ACTIVE | UP    | 6f5aaf5130764305a5d37862e3ff18ce | False       | True |
+--------------------------------------+-----------------+--------+-------+----------------------------------+-------------+------+


===> Prior to upgrade I can list the keepalived processed linked to the ha-router

root     22999  0.0  0.0  91816  3052 ?        Ss   19:17   0:00
keepalived -P -f /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-
687e7e73ff21/keepalived.conf -p /var/lib/neutron/ha_confs/09fa811f-
410c-4360-8cae-687e7e73ff21.pid.keepalived -r /var/lib/neutron/ha_confs
/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived-vrrp -D

root     23001  0.0  0.1  92084  4088 ?        S    19:17   0:00
keepalived -P -f /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-
687e7e73ff21/keepalived.conf -p /var/lib/neutron/ha_confs/09fa811f-
410c-4360-8cae-687e7e73ff21.pid.keepalived -r /var/lib/neutron/ha_confs
/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived-vrrp -D


===> After upgrading -- None is returned, and in fact the keepalived processes aren't spawned
after neutron-* is upgraded.

Pre-upgrade:
Jun 24 19:17:07 juju-da864d-1927868-5 Keepalived[22997]: Starting Keepalived v1.3.9 (10/21,2017)
Jun 24 19:17:07 juju-da864d-1927868-5 Keepalived[22999]: Starting VRRP child process, pid=23001

Post - upgrade -- Not started

Jun 24 19:30:41 juju-da864d-1927868-5 Keepalived[22999]: Stopping
Jun 24 19:30:42 juju-da864d-1927868-5 Keepalived_vrrp[23001]: Stopped
Jun 24 19:30:42 juju-da864d-1927868-5 Keepalived[22999]: Stopped Keepalived v1.3.9 (10/21,2017)

The reason for those keepalived processes not re-spawned is

1) The ml2 process starts the router devices by requesting a rpc call on the device details. This
one fails with different oslo target versions.

Therefore is required for the neutron-api migrations to be applied
before the gateways.

9819:2021-06-24 19:31:09.935 31744 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] Starting to process
devices in:{'current': {'87cfdd45-fea7-4c06-aa13-174cb71b294f',
'b8e18ba0-c65b-498e-9a8b-34c0fcc42d07',
'926b7377-30f4-4b2c-9064-8aab3918a385'}, 'added':
{'87cfdd45-fea7-4c06-aa13-174cb71b294f'}, 'removed': set(), 'updated':
set(), 're_added': set()} rpc_loop /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:2685

9821:2021-06-24 19:31:10.028 31744 ERROR neutron.agent.rpc [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] Failed to get details
for device 87cfdd45-fea7-4c06-aa13-174cb71b294f:
oslo_messaging.rpc.client.RemoteError: Remote error:
InvalidTargetVersion Invalid target version 1.1

9869:2021-06-24 19:31:10.510 31744 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
14f31407-6342-4f71-98b8-4437e166dbaa - - - - -] retrying failed devices
{'87cfdd45-fea7-4c06-aa13-174cb71b294f'}
_update_port_info_failed_devices_stats /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1674

2)  Then the l3 ha router creation mechanism can't process the HA router because the HA port id 87cfdd45-fea7-4c06-aa13-174cb71b294f is down 
and keepalived cannot be spawned [0] [1]

[0] https://github.com/openstack/neutron/blob/1ad9ca56b07ffdc9f7e0bc6a62af61961b9128eb/neutron/agent/l3/ha_router.py#L519
[1] https://github.com/openstack/neutron/blob/1ad9ca56b07ffdc9f7e0bc6a62af61961b9128eb/neutron/agent/linux/keepalived.py#L455

1971:2021-06-24 19:31:15.034 32459 DEBUG neutron.agent.l3.ha_router [-]
Processing HA router with HA port: {'id':
'87cfdd45-fea7-4c06-aa13-174cb71b294f', 'name': 'HA port tenant
6f5aaf5130764305a5d37862e3ff18ce', 'network_id':
'1a2e73c3-1587-4417-be96-40fde935474b', 'tenant_id': '', 'mac_address':
'fa:16:3e:e2:e0:56', 'admin_state_up': True, 'status': 'DOWN',
'device_id': '09fa811f-410c-4360-8cae-687e7e73ff21', 'device_owner':
'network:router_ha_interface', 'fixed_ips': [{'subnet_id': '6f8bfdbf-
ca04-4847-ac83-f4bd90c089b6', 'ip_address': '169.254.193.135',
'prefixlen': 18}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [],
'security_groups': [], 'description': '', 'binding:vnic_type': 'normal',
'binding:profile': {}, 'binding:host_id': 'juju-da864d-1927868-5',
'binding:vif_type': 'ovs', 'binding:vif_details': {'connectivity': 'l2',
'port_filter': True, 'ovs_hybrid_plug': True, 'datapath_type': 'system',
'bridge_name': 'br-int'}, 'port_security_enabled': False, 'dns_name':
'', 'dns_assignment': [{'ip_address': '169.254.193.135', 'hostname':
'host-169-254-193-135', 'fqdn':
'host-169-254-193-135.1927868.stsstack.qa.1ss.'}], 'dns_domain': '',
'ip_allocation': 'immediate', 'tags': [], 'created_at':
'2021-06-24T19:16:35Z', 'updated_at': '2021-06-24T19:30:59Z',
'revision_number': 5, 'project_id': '', 'subnets': [{'id': '6f8bfdbf-
ca04-4847-ac83-f4bd90c089b6', 'cidr': '169.254.192.0/18', 'gateway_ip':
None, 'dns_nameservers': [], 'ipv6_ra_mode': None, 'subnetpool_id':
None}], 'extra_subnets': [], 'address_scopes': {'4': None, '6': None},
'mtu': 1500} process /usr/lib/python3/dist-
packages/neutron/agent/l3/ha_router.py:513


3) Since the port is down, the keepalived process cannot be  started, the 'neutron-keepalived-state-change' agent fails with:


11166:2021-06-24 20:12:53.600 8839 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'neutron-keepalived-state-change', '--router_id=09fa811f-410c-4360-8cae-687e7e73ff21', '--namespace=qrouter-09fa811f-410c-4360-8cae-687e7e73ff21', '--conf_dir=/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21', '--log-file=/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21/neutron-keepalived-state-change.log', '--monitor_interface=ha-87cfdd45-fe', '--monitor_cidr=169.254.0.203/24', '--pid_file=/var/lib/neutron/external/pids/09fa811f-410c-4360-8cae-687e7e73ff21.monitor.pid.neutron-keepalived-state-change-monitor', '--state_path=/var/lib/neutron', '--user=113', '--group=117'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:88
11167:2021-06-24 20:12:55.379 8839 DEBUG neutron.agent.l3.ha_router [-] Router 09fa811f-410c-4360-8cae-687e7e73ff21 neutron-keepalived-state-change-monitor pid 8961 spawn_state_change_monitor /usr/lib/python3/dist-packages/neutron/agent/l3/ha_router.py:428
11182:2021-06-24 20:12:55.611 8839 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived; Error: [Errno 2] No such file or directory: '/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived' get_value_from_file /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:263
11214:2021-06-24 20:12:56.172 8839 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived; Error: [Errno 2] No such file or directory: '/var/lib/neutron/ha_confs/09fa811f-410c-4360-8cae-687e7e73ff21.pid.keepalived' get_value_from_file /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:263

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1927868

Title:
  vRouter not working after update to 16.3.1

Status in neutron:
  New
Status in neutron package in Ubuntu:
  New

Bug description:
  We run a juju managed Openstack Ussuri on Bionic. After updating
  neutron packages from 16.3.0 to 16.3.1 all virtual routers stopped
  working. It seems that most (not all) namespaces are created but have
  only the lo interface and sometime the ha-XYZ interface in DOWN state.
  The underlying tap interfaces are also in down.

  neutron-l3-agent has many logs similar to the following:
  2021-05-08 15:01:45.286 39411 ERROR neutron.agent.l3.ha_router [-] Gateway interface for router 02945b59-639b-41be-8237-3b7933b4e32d was not set up; router will not work properly

  and journal logs report at around the same time
  May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.765 18596 INFO neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 62.62.62.62 on qg-5a6efe8c-6b in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d: Exit code: 2; Stdin: ; Stdout: Interface "qg-5a6efe8c-6b" is down
  May 08 15:01:40 lar1615.srv-louros.grnet.gr neutron-keepalived-state-change[18596]: 2021-05-08 15:01:40.767 18596 INFO neutron.agent.linux.ip_lib [-] Interface qg-5a6efe8c-6b or address 62.62.62.62 in namespace qrouter-02945b59-639b-41be-8237-3b7933b4e32d was deleted concurrently

  
  The neutron packages installed are:

  ii  neutron-common                         2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - common
  ii  neutron-dhcp-agent                     2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - DHCP agent
  ii  neutron-l3-agent                       2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - l3 agent
  ii  neutron-metadata-agent                 2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - metadata agent
  ii  neutron-metering-agent                 2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - metering agent
  ii  neutron-openvswitch-agent              2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
  ii  python3-neutron                        2:16.3.1-0ubuntu1~cloud0                                    all          Neutron is a virtual network service for Openstack - Python library
  ii  python3-neutron-lib                    2.3.0-0ubuntu1~cloud0                                       all          Neutron shared routines and utilities - Python 3.x
  ii  python3-neutronclient                  1:7.1.1-0ubuntu1~cloud0                                     all          client API library for Neutron - Python 3.x


  Downgrading to 16.3.0 resolves the issues.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1927868/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list