[Bug 1338732] [NEW] Timed out waiting for a reply via rabbit

Launchpad Bug Tracker 1338732 at bugs.launchpad.net
Wed Sep 9 21:39:32 UTC 2015


You have been subscribed to a public bug by Jorge Niedbalski (niedbalski):

Icehouse
oslo-messaging 1.3.0 (stable/icehouse branch from github, c21a2c2)
rabbitmq-server 3.1.3

Nova rpc calls fail often after rabbit restarts.  I've tracked it down
to oslo/rabbit/kombu timing out if it's forced to reconnect to rabbit.
The code below times out waiting for a reply if the topic has been used
in a previous run.  The reply always arrives the first time a topic is
used, or if the topic is none.  But, the second run with the same topic
will hang with this error:

MessagingTimeout: Timed out waiting for a reply to message ID ...

This problem seems too basic to not be caught earlier in oslo, but the
program below does really reproduce the same symptoms we see in nova
when run against a live rabbit server.

Here's a log from a test run:
https://gist.github.com/noelbk/12adcfe188d9f54f971d

#! /usr/bin/python

from oslo.config import cfg
import threading
from oslo import messaging
import logging
import time
log = logging.getLogger(__name__)

class OsloTest():
    def test(self):
        # The code below times out waiting for a reply if the topic
        # has been used in a previous run.  The reply always arrives
        # the first time a topic is used, or if the topic is none.
        # But, the second run with the same topic will hang with this
        # error:
        #
        # MessagingTimeout: Timed out waiting for a reply to message ID ...
        #
        topic  = 'will_hang_on_second_usage'
        #topic  = None # never hangs

        url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict(
            proto = 'rabbit',
            host = '1.2.3.4',
            password = 'xxxxxxxx',
            user = 'rabbit-mq-user',
            )
        transport = messaging.get_transport(cfg.CONF, url)
        driver = transport._driver

        target = messaging.Target(topic=topic)
        listener = driver.listen(target)
        ctxt={"context": True}
        timeout = 10

        def send_main():
            log.debug("sending msg")
            reply = driver.send(target,
                                ctxt,
                                {'send': 1},
                                wait_for_reply=True,
                                timeout=timeout)

            # times out if topic was not None and used before
            log.debug("received reply=%r" % (reply,))

        send_thread = threading.Thread(target=send_main)
        send_thread.daemon = True
        send_thread.start()

        msg = listener.poll()
        log.debug("received msg=%r" % (msg,))

        msg.reply({'reply': 1})

        log.debug("sent reply")

        send_thread.join()

if __name__ == '__main__':
    FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s %(funcName)s %(message)s'
    logging.basicConfig(level=logging.DEBUG, format=FORMAT)
    OsloTest().test()


---- ---- ---- ---- ----

[Impact]

 * This patch along with those from LP #1400268 and LP #1408370 fixes
rabbitmq reconnects

 * We are backporting this to Icehouse since oslo.messaging 1.3.0
   fails to reconnect to Rabbit properly, particularly nova-compute.

 * This patch alond with it's dependencies metnioend above, will ensure that
   multiple reconnect attempts happen by having connections timout and retry.

[Test Case]

 * Start a service that uses oslo.messaging with rabbitmq e.g. nova-
compute

 * Stop rabbitmq while tail-F /var/log/nova/nova-compute.log

 * Observe that nova-compute amqp times out and it is trying to
reconnect

 * Restart rabbitmq

 * Observe that rabbitmq connection has re-established

[Regression Potential]

 * None. I have tested in my local cloud environment and it appears to be
   reliable.

** Affects: cloud-archive
     Importance: Undecided
         Status: New

** Affects: oslo.messaging
     Importance: High
     Assignee: Mehdi Abaakouk (sileht)
         Status: Fix Released

** Affects: oslo.messaging (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: oslo.messaging (Ubuntu Trusty)
     Importance: High
     Assignee: James Page (james-page)
         Status: Fix Released

** Affects: oslo.messaging (Ubuntu Utopic)
     Importance: High
     Assignee: Jorge Niedbalski (niedbalski)
         Status: In Progress

** Affects: oslo.messaging (Ubuntu Vivid)
     Importance: Undecided
         Status: Fix Released


** Tags: icehouse-backport-potential in-stable-icehouse in-stable-juno verification-needed
-- 
Timed out waiting for a reply via rabbit
https://bugs.launchpad.net/bugs/1338732
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list