[Bug 2009138] Fix merged to oslo.messaging (stable/2023.1)
OpenStack Infra
2009138 at bugs.launchpad.net
Thu Sep 14 04:54:08 UTC 2023
Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/880187
Committed: https://opendev.org/openstack/oslo.messaging/commit/3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Submitter: "Zuul (22348)"
Branch: stable/2023.1
commit 3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Author: Arnaud Morin <arnaud.morin at ovhcloud.com>
Date: Fri Mar 3 11:16:56 2023 +0100
Disable greenthreads for RabbitDriver "listen" connections
When enabling heartbeat_in_pthread, we were restoring the "threading"
python library from eventlet to original one in RabbitDriver but we
forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
AMQPDriverBase).
We also need to use the original "queue" so that queues are not going to
use greenthreads as well.
Related-bug: #1961402
Related-bug: #1934937
Closes-bug: #2009138
Signed-off-by: Arnaud Morin <arnaud.morin at ovhcloud.com>
Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
(cherry picked from commit fd2381c723fe805b17aca1f80bfff4738fbe9628)
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2009138
Title:
Heartbeat in pthreads still using greenthreads
Status in Ubuntu Cloud Archive:
New
Status in oslo.messaging:
Fix Released
Bug description:
Context
=======
OpenStack Yoga
Nova API behind apache2 with mod_wsgi
RabbitMQ 3.9.12
Explanation
===========
When using nova with apache2/mod_wsgi, we need to set 'heartbeat_in_pthread=True' to avoid using green threads (eventlet monkey patched threads).
The python thread is mandatory to keep sending heartbeats so rabbit
will not close the connection.
One other option is to completely disable the heartbeats, so the
connection will only rely on tcp keepalive. But more is better.
The problem with the current heartbeat_in_pthread implementation is that some threads are still greenthreads.
The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).
We identified that oslo_messaging is connecting to rabbit for two different purpose:
- send
- listen
The current heartbeat_in_pthread=True parameter is switching heartbeat from greenthread to python thread *only for send* purpose (done in impl_rabbit.py).
For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.
As a result, for listen purpose, rabbit connections are killed.
We can see in rabbit logs:
missed heartbeats from client, timeout: 60s
We can see in nova-api logs:
Server unexpectedly closed connection.
How to reproduce
================
Start nova-api with apache mod_wsgi and set heartbeat_in_pthread=True
Monitor the current rabbitmq connection from nova:
$ ss -tnep |grep 5672
(this can be empty if nova did nothing yet)
Do an nova API call that needs rabbit, e.g. ask for a console url:
$ openstack console url show 5700ecbc-adff-41d3-88a4-f24e0b885b2e
This will create two connecitons:
ESTAB 0 0 10.42.1.165:58206 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570487 sk:1a cgroup:/ <->
ESTAB 0 0 10.42.1.165:58204 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570486 sk:1b cgroup:/ <->
One is for "send" purpose, second is for "listen" purpose.
You can also see them in rabbit logs:
connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:88239:41e4b74d-c3be-47f5-8b8f-d3bd99871f46): user 'openstack' authenticated and granted access to vhost '/'
connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2): user 'openstack' authenticated and granted access to vhost '/'
You can also monitor the heartbeats going from/to rabbit:
$ tcpdump -i eth0 -nn port 5672
...
You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).
After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> closing AMQP connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2):
2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> missed heartbeats from client, timeout: 60s
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2009138/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list