[Bug 2009138] Fix included in openstack/oslo.messaging 14.0.3

OpenStack Infra 2009138 at bugs.launchpad.net
Thu Feb 8 14:35:08 UTC 2024


This issue was fixed in the openstack/oslo.messaging 14.0.3  release.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2009138

Title:
  Heartbeat in pthreads still using greenthreads

Status in Ubuntu Cloud Archive:
  New
Status in oslo.messaging:
  Fix Released

Bug description:
  Context
  =======
  OpenStack Yoga
  Nova API behind apache2 with mod_wsgi
  RabbitMQ 3.9.12

  Explanation
  ===========
  When using nova with apache2/mod_wsgi, we need to set 'heartbeat_in_pthread=True' to avoid using green threads (eventlet monkey patched threads).

  The python thread is mandatory to keep sending heartbeats so rabbit
  will not close the connection.

  One other option is to completely disable the heartbeats, so the
  connection will only rely on tcp keepalive. But more is better.

  The problem with the current heartbeat_in_pthread implementation is that some threads are still greenthreads.
  The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).

  We identified that oslo_messaging is connecting to rabbit for two different purpose:
  - send
  - listen

  The current heartbeat_in_pthread=True parameter is switching heartbeat from greenthread to python thread *only for send* purpose (done in impl_rabbit.py).
  For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.

  As a result, for listen purpose, rabbit connections are killed.
  We can see in rabbit logs:
  missed heartbeats from client, timeout: 60s

  We can see in nova-api logs:
  Server unexpectedly closed connection.


  How to reproduce
  ================
  Start nova-api with apache mod_wsgi and set heartbeat_in_pthread=True

  Monitor the current rabbitmq connection from nova:
  $ ss -tnep  |grep 5672

  (this can be empty if nova did nothing yet)

  Do an nova API call that needs rabbit, e.g. ask for a console url:
  $ openstack console url show 5700ecbc-adff-41d3-88a4-f24e0b885b2e

  
  This will create two connecitons:
  ESTAB 0      0        10.42.1.165:58206 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570487 sk:1a cgroup:/ <->   
  ESTAB 0      0        10.42.1.165:58204 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570486 sk:1b cgroup:/ <->   

  One is for "send" purpose, second is for "listen" purpose.

  You can also see them in rabbit logs:
  connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:88239:41e4b74d-c3be-47f5-8b8f-d3bd99871f46): user 'openstack' authenticated and granted access to vhost '/'
  connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2): user 'openstack' authenticated and granted access to vhost '/'

  You can also monitor the heartbeats going from/to rabbit:
  $ tcpdump -i eth0 -nn port 5672
  ...
  You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).

  
  After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> closing AMQP connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2):
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> missed heartbeats from client, timeout: 60s

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2009138/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list