[Bug 1961402] Fix included in openstack/oslo.messaging yoga-eom

OpenStack Infra 1961402 at bugs.launchpad.net
Tue Feb 6 14:28:10 UTC 2024


This issue was fixed in the openstack/oslo.messaging yoga-eom  release.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1961402

Title:
  Hanging services when connectivity to RabbitMQ lost

Status in Ubuntu Cloud Archive:
  New
Status in oslo.messaging:
  Fix Released

Bug description:
  # Versions
  - oslo.messaging 12.9.1
  - rabbitmq 3.9.8
  - ubuntu 20.04

  Hi,
  We are observing issues with services recovering if they encounter connectivity issues to the RabbitMQ cluster. We have seen this across Nova, Neutron and Cinder services in particular, across all of our deployments. When this occurs, the following greenlet related traceback is always seen in the service logs, following a number of reconnection related messages (example for Nova compute):

  Feb 18 08:42:33 compute102 nova-compute[1402787]: 2022-02-18 08:42:33.514 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.99:5671 is unreachable: . Trying again in 1 seconds.: socket.timeout
  Feb 18 08:42:34 compute102 nova-compute[1402787]: 2022-02-18 08:42:34.517 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:35 compute102 nova-compute[1402787]: 2022-02-18 08:42:35.050 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.99:5671 is unreachable: . Trying again in 1 seconds.: socket.timeout
  Feb 18 08:42:35 compute102 nova-compute[1402787]: 2022-02-18 08:42:35.520 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:36 compute102 nova-compute[1402787]: 2022-02-18 08:42:36.052 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:36 compute102 nova-compute[1402787]: 2022-02-18 08:42:36.521 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 2 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:37 compute102 nova-compute[1402787]: 2022-02-18 08:42:37.053 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:38 compute102 nova-compute[1402787]: 2022-02-18 08:42:38.055 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 2 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:38 compute102 nova-compute[1402787]: 2022-02-18 08:42:38.524 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:39 compute102 nova-compute[1402787]: 2022-02-18 08:42:39.526 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:40 compute102 nova-compute[1402787]: 2022-02-18 08:42:40.058 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:40 compute102 nova-compute[1402787]: 2022-02-18 08:42:40.527 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 4 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:41 compute102 nova-compute[1402787]: 2022-02-18 08:42:41.060 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:42 compute102 nova-compute[1402787]: 2022-02-18 08:42:42.062 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 4 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:44 compute102 nova-compute[1402787]: 2022-02-18 08:42:44.532 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:45 compute102 nova-compute[1402787]: 2022-02-18 08:42:45.534 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:46 compute102 nova-compute[1402787]: 2022-02-18 08:42:46.067 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:46 compute102 nova-compute[1402787]: 2022-02-18 08:42:46.536 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 6 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:47 compute102 nova-compute[1402787]: 2022-02-18 08:42:47.068 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:48 compute102 nova-compute[1402787]: 2022-02-18 08:42:48.070 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 6 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:52 compute102 nova-compute[1402787]: 2022-02-18 08:42:52.543 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:53 compute102 nova-compute[1402787]: 2022-02-18 08:42:53.545 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:54 compute102 nova-compute[1402787]: 2022-02-18 08:42:54.077 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.99:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:54 compute102 nova-compute[1402787]: 2022-02-18 08:42:54.546 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] [c21a4649-bc17-4648-82b4-e88743d61fc9] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 8 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:55 compute102 nova-compute[1402787]: 2022-02-18 08:42:55.079 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.98:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 1 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:56 compute102 nova-compute[1402787]: 2022-02-18 08:42:56.080 1402787 ERROR oslo.messaging._drivers.impl_rabbit [req-85fc671a-18e5-4b0d-9c4b-27562efafd2a - - - - -] [229e749d-adb2-4375-87ca-2f7235129935] AMQP server on 10.99.99.97:5671 is unreachable: [Errno 101] ENETUNREACH. Trying again in 8 seconds.: OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:58 compute102 nova-compute[1402787]: 2022-02-18 08:42:58.700 1402787 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 110] Connection timed out
  Feb 18 08:42:58 compute102 nova-compute[1402787]: 2022-02-18 08:42:58.701 1402787 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 110] Connection timed out
  Feb 18 08:42:58 compute102 nova-compute[1402787]: 2022-02-18 08:42:58.702 1402787 ERROR oslo.messaging._drivers.impl_rabbit [-] Connection failed: [Errno 101] ENETUNREACH (retrying in 0 seconds): OSError: [Errno 101] ENETUNREACH
  Feb 18 08:42:58 compute102 nova-compute[1402787]: Traceback (most recent call last):
  Feb 18 08:42:58 compute102 nova-compute[1402787]:   File "/openstack/venvs/nova-24.0.0.0rc1/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
  Feb 18 08:42:58 compute102 nova-compute[1402787]:     timer()
  Feb 18 08:42:58 compute102 nova-compute[1402787]:   File "/openstack/venvs/nova-24.0.0.0rc1/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__
  Feb 18 08:42:58 compute102 nova-compute[1402787]:     cb(*args, **kw)
  Feb 18 08:42:58 compute102 nova-compute[1402787]:   File "/openstack/venvs/nova-24.0.0.0rc1/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire
  Feb 18 08:42:58 compute102 nova-compute[1402787]:     waiter.switch()
  Feb 18 08:42:58 compute102 nova-compute[1402787]: greenlet.error: cannot switch to a different thread


  Typically if the RabbitMQ cluster is taken down this will impact ~5%
  of the services in the deployment, all of which will need to be
  restarted in order to recover. Similar recovery issues have been seen
  if the host's network interface is taken down and brought back up (as
  used to generate the above traceback).

  As far as we can tell this started to occur at a similar time to
  https://bugs.launchpad.net/oslo.messaging/+bug/1949964, so around the
  time of the Wallaby OpenStack release, and coinciding with a switch
  from TLSv1.0/v1.1 to TLSv1.2 in our RabbitMQ connections, plus a
  switch to using a full PKI infrastructure with certificate validation,
  rather than ignoring certificate errors.

  Any suggestions for diagnosing this further would be appreciated.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1961402/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list