[Bug 1934937] Re: Heartbeat in pthreads in nova-wallaby crashes with greenlet error
Edward Hope-Morley
1934937 at bugs.launchpad.net
Tue Jul 16 11:21:18 UTC 2024
Small correction to the above. Heartbeats will technically not stop
working as a result of the change of default but they will revert back
to using greenthreads in wsgi services which at the time of that config
option being added was considered to be suboptimal and hence this could
be considered a regression. Therefore I have opened a separate bug [1]
so that we can set to True for wsgi services. Note that the default has
already been changed for > Yoga (i.e. all versions above Jammy).
[1] https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260
** Description changed:
When performing a heartbeat to rabbit (inside a nova-compute process),
there is a greenlet error which causes a hard crash.
I'm not exactly sure what details are relevant, but can provide more
info if there's something that will be useful!
This is on RHEL7 (essentially... somewhat custom image based on it)
Log snippet:
```
2021-07-07 19:34:52,686 DEBUG [oslo.messaging._drivers.impl_rabbit] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py:__init__:608 [279fc413-9d7c-4fad-89e8-8de308658947] Connecting to AMQP server on 127.0.0.1:5671
2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: None/None, now - 6/6, monotonic - 9634.717472491, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,700 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,701 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: 6/6, now - 6/6, monotonic - 9634.719438155, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,718 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:_on_start:382 Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit_5672 at fedb460a.openstack', 'copyright': 'Copyright (c) 2007-2020 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 1.1. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 23.0.2', 'product': 'RabbitMQ', 'version': '3.8.5'}, mechanisms: [b'PLAIN', b'AMQPLAIN', b'EXTERNAL'], locales: ['en_US']
2021-07-07 19:34:52,719 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:__init__:104 using channel_id: 1
2021-07-07 19:34:52,720 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:_on_open_ok:444 Channel open
2021-07-07 19:34:52,721 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection c0299792d20e42a2b0a17d037d7d3058
Traceback (most recent call last):
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
timer()
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__
cb(*args, **kw)
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire
waiter.switch()
greenlet.error: cannot switch to a different thread
```
Versions:
```
oslo.messaging==12.7.1
nova==23.0.2 (packaged locally from stable/wallaby as of July 3, 2021)
```
-------------------------------------------------------------------------------
[Impact]
The Nova default value of heartbeat_in_pthread needs to be False for non-wsgi services otherwise they crash when attempting to send a heartbeat message e.g. in a greenthread like nova-compute. This backports the patch to Jammy/Yoga in Ubuntu.
[Test Plan]
* Deploy Openstack Yoga on Jammy and ensure nova-compute has debug=True
* ensure "oslo_messaging_rabbit.heartbeat_in_pthread = False" by checking latest entry in /var/log/nova/nova-compute.log
* By default a heartbeat is checked 2 times every 60 seconds
* Check /var/log/nova/nova-compute.log and ensure that do not see any "greenlet.error: cannot switch to a different thread" errors
[Regression Potential]
- No regressions are expected as a result of this patch.
+ Changing the default to False will mean that while services not running under wsgi will be fixed, services that are running under wsgi will revert back to using their native threading method i.e. greenthreads which is considered suboptimal and in very loaded environments this could have a perceived impact
+ on api performance. A separate bug https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260 has been opened to address this.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1934937
Title:
Heartbeat in pthreads in nova-wallaby crashes with greenlet error
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive yoga series:
Triaged
Status in oslo.messaging:
Fix Released
Status in nova package in Ubuntu:
Invalid
Status in nova source package in Jammy:
Triaged
Bug description:
When performing a heartbeat to rabbit (inside a nova-compute process),
there is a greenlet error which causes a hard crash.
I'm not exactly sure what details are relevant, but can provide more
info if there's something that will be useful!
This is on RHEL7 (essentially... somewhat custom image based on it)
Log snippet:
```
2021-07-07 19:34:52,686 DEBUG [oslo.messaging._drivers.impl_rabbit] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_rabbit.py:__init__:608 [279fc413-9d7c-4fad-89e8-8de308658947] Connecting to AMQP server on 127.0.0.1:5671
2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,699 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: None/None, now - 6/6, monotonic - 9634.717472491, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,700 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,701 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:740 heartbeat_tick : Prev sent/recv: 6/6, now - 6/6, monotonic - 9634.719438155, last_heartbeat_sent - 9634.717470288, heartbeat int. - 60 for connection 79f7cf4331b34cb0a2e3608281076773
2021-07-07 19:34:52,718 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:_on_start:382 Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit_5672 at fedb460a.openstack', 'copyright': 'Copyright (c) 2007-2020 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 1.1. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 23.0.2', 'product': 'RabbitMQ', 'version': '3.8.5'}, mechanisms: [b'PLAIN', b'AMQPLAIN', b'EXTERNAL'], locales: ['en_US']
2021-07-07 19:34:52,719 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:__init__:104 using channel_id: 1
2021-07-07 19:34:52,720 DEBUG [amqp] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/channel.py:_on_open_ok:444 Channel open
2021-07-07 19:34:52,721 DEBUG [amqp.connection.Connection.heartbeat_tick] /opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/amqp/connection.py:heartbeat_tick:726 heartbeat_tick : for connection c0299792d20e42a2b0a17d037d7d3058
Traceback (most recent call last):
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
timer()
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__
cb(*args, **kw)
File "/opt/openstack/venv/nova-23.0.20031033070/lib/python3.8/site-packages/eventlet/semaphore.py", line 152, in _do_acquire
waiter.switch()
greenlet.error: cannot switch to a different thread
```
Versions:
```
oslo.messaging==12.7.1
nova==23.0.2 (packaged locally from stable/wallaby as of July 3, 2021)
```
-------------------------------------------------------------------------------
[Impact]
The Nova default value of heartbeat_in_pthread needs to be False for non-wsgi services otherwise they crash when attempting to send a heartbeat message e.g. in a greenthread like nova-compute. This backports the patch to Jammy/Yoga in Ubuntu.
[Test Plan]
* Deploy Openstack Yoga on Jammy and ensure nova-compute has debug=True
* ensure "oslo_messaging_rabbit.heartbeat_in_pthread = False" by checking latest entry in /var/log/nova/nova-compute.log
* By default a heartbeat is checked 2 times every 60 seconds
* Check /var/log/nova/nova-compute.log and ensure that do not see any "greenlet.error: cannot switch to a different thread" errors
[Regression Potential]
Changing the default to False will mean that while services not running under wsgi will be fixed, services that are running under wsgi will revert back to using their native threading method i.e. greenthreads which is considered suboptimal and in very loaded environments this could have a perceived impact
on api performance. A separate bug https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/2073260 has been opened to address this.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1934937/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list