[Bug 1448650] Re: rpc.server do not consume messages after message acknowledge failure
Billy Olsen
billy.olsen at canonical.com
Thu Jun 25 04:44:24 UTC 2015
** Description changed:
def start(self):
- @excutils.forever_retry_uncaught_exceptions
- def _executor_thread():
- try:
- while self._running:
- incoming = self.listener.poll()
- if incoming is not None:
- self._dispatch(incoming)
- except greenlet.GreenletExit:
- return
+ @excutils.forever_retry_uncaught_exceptions
+ def _executor_thread():
+ try:
+ while self._running:
+ incoming = self.listener.poll()
+ if incoming is not None:
+ self._dispatch(incoming)
+ except greenlet.GreenletExit:
+ return
class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on message acknowledgement can be raised and caught by the excutils.forever_retry_uncaught_exceptions. At this time, do_consume will be False, which means connection will drain_events acrocss "registering" consumer on the queues. kombu.Connection.drain_events establish a connection instead of raising a connection error.
Kombu related code is listed below.
def drain_events(self, **kwargs):
- return self.transport.drain_events(self.connection, **kwargs)
+ return self.transport.drain_events(self.connection, **kwargs)
@property
def connection(self):
- if not self._closed:
- if not self.connected:
- self.declared_entities.clear()
- self._default_channel = None
- self._connection = self._establish_connection()
- self._closed = False
- return self._connection
+ if not self._closed:
+ if not self.connected:
+ self.declared_entities.clear()
+ self._default_channel = None
+ self._connection = self._establish_connection()
+ self._closed = False
+ return self._connection
+
+ ---------------------------
+
+ [Impact]
+
+ This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
+ from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.
+
+ [Test Case]
+
+ Note steps are for trusty-icehouse, including latest oslo.messaging
+ library (1.3.0-0ubuntu1.1 at the time of this writing).
+
+ Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
+ kill one of the rabbit nodes (e.g. force panic, etc). Observe that the
+ nova services do detect that the node went down and report that they are
+ reconnected, but messages are still reporting as timed out, nova
+ service-list still reports compute nodes as down, etc.
+
+ [Regression Potential]
+
+ There is the possibility that there will be more reconnect attempts from
+ the oslo.messaging library if there is a false positive in the
+ underlying kombu connection reported as disconnected. This should be
+ unlikely since this is bringing the oslo.messaging code into sync with
+ the underlying library, but it is a possibility.
+
+ [Other Info]
+
+ The attempt to drive reconnection logic was fixed in a recent SRU of
+ oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix
+ that is required in order to allow the oslo.messaging library to not go
+ into a zombie-fied connection state.
** Also affects: oslo.messaging (Ubuntu)
Importance: Undecided
Status: New
** No longer affects: python-oslo.messaging (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to python-oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1448650
Title:
rpc.server do not consume messages after message acknowledge failure
To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1448650/+subscriptions
More information about the Ubuntu-server-bugs
mailing list