[Bug 2009138] Fix merged to oslo.messaging (stable/2023.1)

OpenStack Infra 2009138 at bugs.launchpad.net
Thu Sep 14 04:54:08 UTC 2023


Reviewed:  https://review.opendev.org/c/openstack/oslo.messaging/+/880187
Committed: https://opendev.org/openstack/oslo.messaging/commit/3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Submitter: "Zuul (22348)"
Branch:    stable/2023.1

commit 3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Author: Arnaud Morin <arnaud.morin at ovhcloud.com>
Date:   Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections
    
    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).
    
    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.
    
    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138
    
    Signed-off-by: Arnaud Morin <arnaud.morin at ovhcloud.com>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
    (cherry picked from commit fd2381c723fe805b17aca1f80bfff4738fbe9628)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2009138

Title:
  Heartbeat in pthreads still using greenthreads

Status in Ubuntu Cloud Archive:
  New
Status in oslo.messaging:
  Fix Released

Bug description:
  Context
  =======
  OpenStack Yoga
  Nova API behind apache2 with mod_wsgi
  RabbitMQ 3.9.12

  Explanation
  ===========
  When using nova with apache2/mod_wsgi, we need to set 'heartbeat_in_pthread=True' to avoid using green threads (eventlet monkey patched threads).

  The python thread is mandatory to keep sending heartbeats so rabbit
  will not close the connection.

  One other option is to completely disable the heartbeats, so the
  connection will only rely on tcp keepalive. But more is better.

  The problem with the current heartbeat_in_pthread implementation is that some threads are still greenthreads.
  The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).

  We identified that oslo_messaging is connecting to rabbit for two different purpose:
  - send
  - listen

  The current heartbeat_in_pthread=True parameter is switching heartbeat from greenthread to python thread *only for send* purpose (done in impl_rabbit.py).
  For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.

  As a result, for listen purpose, rabbit connections are killed.
  We can see in rabbit logs:
  missed heartbeats from client, timeout: 60s

  We can see in nova-api logs:
  Server unexpectedly closed connection.


  How to reproduce
  ================
  Start nova-api with apache mod_wsgi and set heartbeat_in_pthread=True

  Monitor the current rabbitmq connection from nova:
  $ ss -tnep  |grep 5672

  (this can be empty if nova did nothing yet)

  Do an nova API call that needs rabbit, e.g. ask for a console url:
  $ openstack console url show 5700ecbc-adff-41d3-88a4-f24e0b885b2e

  
  This will create two connecitons:
  ESTAB 0      0        10.42.1.165:58206 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570487 sk:1a cgroup:/ <->   
  ESTAB 0      0        10.42.1.165:58204 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570486 sk:1b cgroup:/ <->   

  One is for "send" purpose, second is for "listen" purpose.

  You can also see them in rabbit logs:
  connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:88239:41e4b74d-c3be-47f5-8b8f-d3bd99871f46): user 'openstack' authenticated and granted access to vhost '/'
  connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2): user 'openstack' authenticated and granted access to vhost '/'

  You can also monitor the heartbeats going from/to rabbit:
  $ tcpdump -i eth0 -nn port 5672
  ...
  You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).

  
  After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> closing AMQP connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2):
  2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> missed heartbeats from client, timeout: 60s

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2009138/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list