[Bug 1393391] Re: neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-update_fanout..
Corey Bryant
corey.bryant at canonical.com
Fri Feb 26 20:52:44 UTC 2016
** Description changed:
Under an HA deployment, neutron-openvswitch-agent can get stuck
when receiving a close command on a fanout queue the agent is not subscribed to.
It stops responding to any other messages, so it stops effectively
working at all.
2014-11-11 10:27:33.092 3027 INFO neutron.common.config [-] Logging enabled!
2014-11-11 10:27:34.285 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:27:34.370 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:27:35.348 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent initialized successfully, now running...
2014-11-11 10:27:35.351 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent out of sync with plugin!
2014-11-11 10:27:35.401 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent tunnel out of sync with plugin!
2014-11-11 10:27:35.414 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:32:33.143 3027 INFO neutron.agent.securitygroups_rpc [req-22c7fa11-882d-4278-9f83-6dd56ab95ba4 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 10:58:11.916 3027 INFO neutron.agent.securitygroups_rpc [req-484fd71f-8f61-496c-aa8a-2d3abf8de365 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 10:59:43.954 3027 INFO neutron.agent.securitygroups_rpc [req-2c0bc777-04ed-470a-aec5-927a59100b89 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 11:00:22.500 3027 INFO neutron.agent.securitygroups_rpc [req-df447d01-d132-40f2-8528-1c1c4d57c0f5 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-12 01:27:35.662 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout,
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method()
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common raise m
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common
2014-11-12 01:27:35.695 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
2014-11-12 01:27:35.722 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:22.682 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout,
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method()
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common raise m
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common
2014-11-12 02:00:22.683 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:23.017 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:23.021 3027 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-11-12 02:00:23.021 3027 TRACE root Traceback (most recent call last):
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-11-12 02:00:23.021 3027 TRACE root return infunc(*args, **kwargs)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread
2014-11-12 02:00:23.021 3027 TRACE root self.consume()
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 737, in consume
2014-11-12 02:00:23.021 3027 TRACE root six.next(it)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 664, in iterconsume
2014-11-12 02:00:23.021 3027 TRACE root yield self.ensure(_error_callback, _consume)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 02:00:23.021 3027 TRACE root return method(*args, **kwargs)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 657, in _consume
2014-11-12 02:00:23.021 3027 TRACE root queues_tail.consume(nowait=False)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 190, in consume
2014-11-12 02:00:23.021 3027 TRACE root self.queue.consume(*args, callback=_callback, **options)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 598, in consume
2014-11-12 02:00:23.021 3027 TRACE root nowait=nowait)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1769, in basic_consume
2014-11-12 02:00:23.021 3027 TRACE root (60, 21), # Channel.basic_consume_ok
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 71, in wait
2014-11-12 02:00:23.021 3027 TRACE root return self.dispatch_method(method_sig, args, content)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 88, in dispatch_method
2014-11-12 02:00:23.021 3027 TRACE root return amqp_method(self, args)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 224, in _close
2014-11-12 02:00:23.021 3027 TRACE root raise ChannelError(reply_code, reply_text, (class_id, method_id))
2014-11-12 02:00:23.021 3027 TRACE root ChannelError: 404: (NOT_FOUND - no queue 'q-agent-notifier-port-update_fanout_cc21f47607704321860757b7e6a1194a' in vhost '/', (60, 20), None)
2014-11-12 02:00:23.021 3027 TRACE root
2014-11-12 02:01:24.268 3027 ERROR root [-] Unexpected exception occurred 61 time(s)... retrying.
2014-11-12 02:01:24.268 3027 TRACE root Traceback (most recent call last):
2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-11-12 02:01:24.268 3027 TRACE root return infunc(*args, **kwargs)
2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread
-
---------------------------
[Impact]
This patch addresses an issue under a RabbitMQ HA deployment where
neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
update_fanout_xx' error when one of the RabbitMQ cluster node goes down,
if there are more than 100 nova compute nodes, all neutron agents are
down which is awful, even restart neutron-openvswitch agent can solve
it, it is not the idea reality to restart all of the agents on all
compute nodes, it broke HA.
[Test Case]
Note steps are for trusty-icehouse, including neutron package
1:2014.1.5-0ubuntu1.
Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
kill one of the rabbit nodes (e.g. sudo service rabbitmq-server stop,
etc). Observe that the neutron agents stopped to consume messages and
keep throw no queue 'q-agent-notifier-port-update_fanout..' exception.
[Regression Potential]
- None.
+ The regression potential is low. The fix is fairly minimal and is
+ limited to the code path where a 404 error occurs.
[Other Info]
Oslo library has this fix, but due to Neutron is using kombu other than
oslo library in Icehouse, it still suffer this issue.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to neutron in Ubuntu.
https://bugs.launchpad.net/bugs/1393391
Title:
neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
update_fanout..
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1393391/+subscriptions
More information about the Ubuntu-server-bugs
mailing list