[Bug 1448650] Re: rpc.server do not consume messages after message acknowledge failure
James Page
james.page at ubuntu.com
Thu Jun 25 08:50:54 UTC 2015
** Also affects: oslo.messaging (Ubuntu Trusty)
Importance: Undecided
Status: New
** Also affects: oslo.messaging (Ubuntu Wily)
Importance: Undecided
Status: New
** Also affects: oslo.messaging (Ubuntu Vivid)
Importance: Undecided
Status: New
** Changed in: oslo.messaging (Ubuntu Wily)
Status: New => Fix Released
** Changed in: oslo.messaging (Ubuntu Vivid)
Importance: Undecided => High
** Changed in: oslo.messaging (Ubuntu Trusty)
Importance: Undecided => High
** Changed in: oslo.messaging (Ubuntu Wily)
Importance: Undecided => High
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1448650
Title:
rpc.server do not consume messages after message acknowledge failure
Status in Messaging API for OpenStack:
Fix Released
Status in oslo.messaging package in Ubuntu:
Fix Released
Status in oslo.messaging source package in Trusty:
New
Status in oslo.messaging source package in Vivid:
New
Status in oslo.messaging source package in Wily:
Fix Released
Bug description:
def start(self):
@excutils.forever_retry_uncaught_exceptions
def _executor_thread():
try:
while self._running:
incoming = self.listener.poll()
if incoming is not None:
self._dispatch(incoming)
except greenlet.GreenletExit:
return
class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on message acknowledgement can be raised and caught by the excutils.forever_retry_uncaught_exceptions. At this time, do_consume will be False, which means connection will drain_events acrocss "registering" consumer on the queues. kombu.Connection.drain_events establish a connection instead of raising a connection error.
Kombu related code is listed below.
def drain_events(self, **kwargs):
return self.transport.drain_events(self.connection, **kwargs)
@property
def connection(self):
if not self._closed:
if not self.connected:
self.declared_entities.clear()
self._default_channel = None
self._connection = self._establish_connection()
self._closed = False
return self._connection
---------------------------
[Impact]
This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.
[Test Case]
Note steps are for trusty-icehouse, including latest oslo.messaging
library (1.3.0-0ubuntu1.1 at the time of this writing).
Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
kill one of the rabbit nodes (e.g. force panic, etc). Observe that the
nova services do detect that the node went down and report that they
are reconnected, but messages are still reporting as timed out, nova
service-list still reports compute nodes as down, etc.
[Regression Potential]
There is the possibility that there will be more reconnect attempts
from the oslo.messaging library if there is a false positive in the
underlying kombu connection reported as disconnected. This should be
unlikely since this is bringing the oslo.messaging code into sync with
the underlying library, but it is a possibility.
[Other Info]
The attempt to drive reconnection logic was fixed in a recent SRU of
oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix
that is required in order to allow the oslo.messaging library to not
go into a zombie-fied connection state.
To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1448650/+subscriptions
More information about the Ubuntu-sponsors
mailing list