[Bug 1472712] Re: Using SSL with rabbitmq prevents communication between nova-compute and conductor after latest nova updates
Edward Hope-Morley
edward.hope-morley at canonical.com
Wed Aug 5 16:35:21 UTC 2015
OK upon further investigation i have found some trace of a root cause.
Oslo.messaging always uses a timeout of 1 second when polling queues and
connections. This appears to be too small when using ssl and frequently
results in SSLError/timeout which cause all threads to fail and
reconnect and fail again repeatedly thus resulting in the number of
connections rising fast and rpc not working, hence why compute and
conductor are not able to communicate. I've played around with
alternative timeout values and I get much better results even with a
value of 2s instead of 1s. I'll propose an initial workaround patch
shortly so we can get out of this bind for now but I think we'll
ultimately need a more intelligent solution than what oslo.messaging
support in this version.
** Changed in: python-oslo.messaging (Ubuntu)
Status: Confirmed => In Progress
** Changed in: python-oslo.messaging (Ubuntu)
Assignee: (unassigned) => Edward Hope-Morley (hopem)
** Changed in: python-oslo.messaging (Ubuntu)
Importance: Undecided => High
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1472712
Title:
Using SSL with rabbitmq prevents communication between nova-compute
and conductor after latest nova updates
Status in OpenStack Compute (nova):
Invalid
Status in oslo.messaging:
Invalid
Status in python-oslo.messaging package in Ubuntu:
In Progress
Bug description:
On the latest update of the Ubuntu OpenStack packages, it was
discovered that the nova-compute/nova-conductor
(1:2014.1.4-0ubuntu2.1) packages encountered a bug with using SSL to
connect to rabbitmq.
When this problem occurs, the compute node cannot connect to the
controller, and this message is constantly displayed:
WARNING nova.conductor.api [req-4022395c-9501-47cf-bf8e-476e1cc58772
None None] Timed out waiting for nova-conductor. Is it running? Or did
this service start before nova-conductor?
Investigation revealed that having rabbitmq configured with SSL was
the root cause of this problem. This seems to have been introduced
with the current version of the nova packages. Rabbitmq was not
updated as part of this distribution update, but the messaging library
(python-oslo.messaging 1.3.0-0ubuntu1.1) was updated. So the problem
could exist in any of these components.
Versions installed:
Openstack version: Icehouse
Ubuntu 14.04.2 LTS
nova-conductor 1:2014.1.4-0ubuntu2.1
nova-compute 1:2014.1.4-0ubuntu2.1
rabbitmq-server 3.2.4-1
openssl:amd64/trusty-security 1.0.1f-1ubuntu2.15
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1472712/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list