[Bug 1448650] Re: rpc.server do not consume messages after message acknowledge failure

Mathew Hodson mathew.hodson at gmail.com
Wed Aug 31 19:02:28 UTC 2016


** No longer affects: oslo.messaging (Ubuntu Wily)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1448650

Title:
  rpc.server do not consume messages after message acknowledge failure

Status in oslo.messaging:
  Fix Released
Status in oslo.messaging package in Ubuntu:
  Fix Released
Status in oslo.messaging source package in Trusty:
  Fix Released
Status in oslo.messaging source package in Utopic:
  Won't Fix
Status in oslo.messaging source package in Vivid:
  Fix Released

Bug description:
  def start(self):

      @excutils.forever_retry_uncaught_exceptions
      def _executor_thread():
          try:
           while self._running:
           incoming = self.listener.poll()
           if incoming is not None:
        self._dispatch(incoming)
          except greenlet.GreenletExit:
       return

  class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on  message acknowledgement can be raised and caught by the excutils.forever_retry_uncaught_exceptions. At this time, do_consume will be False, which means connection will drain_events acrocss "registering" consumer on the queues.  kombu.Connection.drain_events establish a connection instead of raising a connection error.
  Kombu related code is listed below.
  def drain_events(self, **kwargs):
      return self.transport.drain_events(self.connection, **kwargs)

  @property
  def connection(self):
      if not self._closed:
          if not self.connected:
              self.declared_entities.clear()
              self._default_channel = None
              self._connection = self._establish_connection()
              self._closed = False
          return self._connection

  ---------------------------

  [Impact]

  This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
  from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.

  [Test Case]

  Note steps are for trusty-icehouse, including latest oslo.messaging
  library (1.3.0-0ubuntu1.1 at the time of this writing).

  Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
  kill one of the rabbit nodes (e.g. force panic, etc). Observe that the
  nova services do detect that the node went down and report that they
  are reconnected, but messages are still reporting as timed out, nova
  service-list still reports compute nodes as down, etc.

  [Regression Potential]

  There is the possibility that there will be more reconnect attempts
  from the oslo.messaging library if there is a false positive in the
  underlying kombu connection reported as disconnected. This should be
  unlikely since this is bringing the oslo.messaging code into sync with
  the underlying library, but it is a possibility.

  [Other Info]

  The attempt to drive reconnection logic was fixed in a recent SRU of
  oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix
  that is required in order to allow the oslo.messaging library to not
  go into a zombie-fied connection state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1448650/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list