[Bug 1338732] Re: Timed out waiting for a reply via rabbit

JohnsonYi yichengli at vip.qq.com
Fri Jul 17 01:53:28 UTC 2015


Hello Chris J Arges,  I tested your package
https://launchpad.net/ubuntu/+source/oslo.messaging/1.4.1-0ubuntu1.1 on
my  fuel openstack environment, it didn't work very well.

Please check the below information for details,

[root at mvp-builder ~]# fuel --f
api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-26_14-25-46
build_number: '58'
feature_groups:
- mirantis
- experimental
fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
      build_id: 2014-12-26_14-25-46
      build_number: '58'
      feature_groups:
      - mirantis
      fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
      fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
      production: docker
      release: '6.0'

root at node-3:~# dpkg -l | grep oslo
ii  python-oslo.config                  1:1.4.0-fuel6.0~mira16                              Common code for Openstack Projects (configuration API)
ii  python-oslo.db                      1.0.1-fuel6.0~mira19                                oslo db library
ii  python-oslo.i18n                    1.0.0-fuel6.0~mira16                                Oslo Internationalization Utilities - Python 2.x
ii  python-oslo.messaging               1.4.1-fuel6.0~mira18                                oslo messaging library
ii  python-oslo.rootwrap                1.3.0.0a1-ubuntu2                                   Fine filtering of shell commands as 'root'
ii  python-oslo.serialization           1.0.0-fuel6.0~mira15                                utilities for serialization , especially JSON - Python 2.x
ii  python-oslo.utils                   1.0.0-fuel6.0~mira16                                set of utility functions for OpenStack - Python 2.x
ii  python-oslo.vmware                  0.6.0-fuel6.0~mira16                                VMware library for OpenStack projects - Python 2.x

I replaced the whole folder under oslo.messaging-1.4.1\oslo\messaging to
/usr/share/pyshared/oslo/messaging, then restart all 3 controllers, the
HA didn't work very well, nova service-list did not work with HTTP 503
error. Neutron components didn't work very well also.

Please check below error messages,
<166>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.436 6138 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
<166>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.456 6138 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 127.0.0.1:5673
<163>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.544 6061 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_c2d631eb2efa4fc1b88d2d03f90d6bca': [Errno 32] Broken pipe
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 681, in ensure
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     return method()
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 784, in _publish
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 377, in __init__
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     type='direct', **options)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 340, in __init__
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 348, in reconnect
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 84, in __init__
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 104, in declare
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self._send_method((40, 10), args)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, method_sig, args, content,
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     write_frame(1, channel, payload)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     frame_type, channel, size, payload, 0xce,
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 357, in sendall
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 340, in send
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 32] Broken pipe
2015-07-16 14:22:12.544 6061 TRACE oslo.messaging._drivers.impl_rabbit
<166>Jul 16 14:22:12 node-3 neutron-server 2015-07-16 14:22:12.549 6061 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 5.0 seconds ...
<166>Jul 16 14:22:17 node-3 neutron-server 2015-07-16 14:22:17.549 6061 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 10.14.68.7:5673
<166>Jul 16 14:22:17 node-3 neutron-server 2015-07-16 14:22:17.566 6061 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.14.68.7:5673

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ubuntu-cloud-archive.
https://bugs.launchpad.net/bugs/1338732

Title:
  Timed out waiting for a reply via rabbit

Status in ubuntu-cloud-archive:
  New
Status in oslo.messaging:
  Fix Released
Status in oslo.messaging package in Ubuntu:
  Fix Released
Status in oslo.messaging source package in Trusty:
  Fix Released
Status in oslo.messaging source package in Utopic:
  Fix Committed
Status in oslo.messaging source package in Vivid:
  Fix Released

Bug description:
  Icehouse
  oslo-messaging 1.3.0 (stable/icehouse branch from github, c21a2c2)
  rabbitmq-server 3.1.3

  Nova rpc calls fail often after rabbit restarts.  I've tracked it down
  to oslo/rabbit/kombu timing out if it's forced to reconnect to rabbit.
  The code below times out waiting for a reply if the topic has been
  used in a previous run.  The reply always arrives the first time a
  topic is used, or if the topic is none.  But, the second run with the
  same topic will hang with this error:

  MessagingTimeout: Timed out waiting for a reply to message ID ...

  This problem seems too basic to not be caught earlier in oslo, but the
  program below does really reproduce the same symptoms we see in nova
  when run against a live rabbit server.

  Here's a log from a test run:
  https://gist.github.com/noelbk/12adcfe188d9f54f971d

  #! /usr/bin/python

  from oslo.config import cfg
  import threading
  from oslo import messaging
  import logging
  import time
  log = logging.getLogger(__name__)

  class OsloTest():
      def test(self):
          # The code below times out waiting for a reply if the topic
          # has been used in a previous run.  The reply always arrives
          # the first time a topic is used, or if the topic is none.
          # But, the second run with the same topic will hang with this
          # error:
          #
          # MessagingTimeout: Timed out waiting for a reply to message ID ...
          #
          topic  = 'will_hang_on_second_usage'
          #topic  = None # never hangs

          url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict(
              proto = 'rabbit',
              host = '1.2.3.4',
              password = 'xxxxxxxx',
              user = 'rabbit-mq-user',
              )
          transport = messaging.get_transport(cfg.CONF, url)
          driver = transport._driver

          target = messaging.Target(topic=topic)
          listener = driver.listen(target)
          ctxt={"context": True}
          timeout = 10

          def send_main():
              log.debug("sending msg")
              reply = driver.send(target,
                                  ctxt,
                                  {'send': 1},
                                  wait_for_reply=True,
                                  timeout=timeout)

              # times out if topic was not None and used before
              log.debug("received reply=%r" % (reply,))

          send_thread = threading.Thread(target=send_main)
          send_thread.daemon = True
          send_thread.start()

          msg = listener.poll()
          log.debug("received msg=%r" % (msg,))

          msg.reply({'reply': 1})

          log.debug("sent reply")

          send_thread.join()

  if __name__ == '__main__':
      FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s %(funcName)s %(message)s'
      logging.basicConfig(level=logging.DEBUG, format=FORMAT)
      OsloTest().test()

  
  ---- ---- ---- ---- ----

  [Impact]

   * This patch along with those from LP #1400268 and LP #1408370 fixes
  rabbitmq reconnects

   * We are backporting this to Icehouse since oslo.messaging 1.3.0
     fails to reconnect to Rabbit properly, particularly nova-compute.

   * This patch alond with it's dependencies metnioend above, will ensure that
     multiple reconnect attempts happen by having connections timout and retry.

  [Test Case]

   * Start a service that uses oslo.messaging with rabbitmq e.g. nova-
  compute

   * Stop rabbitmq while tail-F /var/log/nova/nova-compute.log

   * Observe that nova-compute amqp times out and it is trying to
  reconnect

   * Restart rabbitmq

   * Observe that rabbitmq connection has re-established

  [Regression Potential]

   * None. I have tested in my local cloud environment and it appears to be
     reliable.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1338732/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list