[Bug 1448650] Re: rpc.server do not consume messages after message acknowledge failure
Mathew Hodson
mathew.hodson at gmail.com
Wed Aug 31 19:02:28 UTC 2016
** No longer affects: oslo.messaging (Ubuntu Wily)
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1448650
Title:
rpc.server do not consume messages after message acknowledge failure
Status in oslo.messaging:
Fix Released
Status in oslo.messaging package in Ubuntu:
Fix Released
Status in oslo.messaging source package in Trusty:
Fix Released
Status in oslo.messaging source package in Utopic:
Won't Fix
Status in oslo.messaging source package in Vivid:
Fix Released
Bug description:
def start(self):
@excutils.forever_retry_uncaught_exceptions
def _executor_thread():
try:
while self._running:
incoming = self.listener.poll()
if incoming is not None:
self._dispatch(incoming)
except greenlet.GreenletExit:
return
class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on message acknowledgement can be raised and caught by the excutils.forever_retry_uncaught_exceptions. At this time, do_consume will be False, which means connection will drain_events acrocss "registering" consumer on the queues. kombu.Connection.drain_events establish a connection instead of raising a connection error.
Kombu related code is listed below.
def drain_events(self, **kwargs):
return self.transport.drain_events(self.connection, **kwargs)
@property
def connection(self):
if not self._closed:
if not self.connected:
self.declared_entities.clear()
self._default_channel = None
self._connection = self._establish_connection()
self._closed = False
return self._connection
---------------------------
[Impact]
This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.
[Test Case]
Note steps are for trusty-icehouse, including latest oslo.messaging
library (1.3.0-0ubuntu1.1 at the time of this writing).
Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
kill one of the rabbit nodes (e.g. force panic, etc). Observe that the
nova services do detect that the node went down and report that they
are reconnected, but messages are still reporting as timed out, nova
service-list still reports compute nodes as down, etc.
[Regression Potential]
There is the possibility that there will be more reconnect attempts
from the oslo.messaging library if there is a false positive in the
underlying kombu connection reported as disconnected. This should be
unlikely since this is bringing the oslo.messaging code into sync with
the underlying library, but it is a possibility.
[Other Info]
The attempt to drive reconnection logic was fixed in a recent SRU of
oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix
that is required in order to allow the oslo.messaging library to not
go into a zombie-fied connection state.
To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1448650/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list