[Bug 1789177] Re: RabbitMQ fails to synchronize exchanges under high load
Edward Hope-Morley
1789177 at bugs.launchpad.net
Tue Feb 2 12:55:12 UTC 2021
e.g. 2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to process message ... skipping it.: DuplicateMessageError: Found duplicate message(fc9335298407444ab0e7000d3fe2f4b7). Skipping it.
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 368, in _callback
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit self.callback(RabbitMessage(message))
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in __call__
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit unique_id = self.msg_id_cache.check_duplicate_message(message)
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqp.py", line 121, in check_duplicate_message
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit raise rpc_common.DuplicateMessageError(msg_id=msg_id)
2021-02-02 12:07:53.930 27349 ERROR oslo.messaging._drivers.impl_rabbit DuplicateMessageError: Found duplicate message(fc9335298407444ab0e7000d3fe2f4b7). Skipping it.
and
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-e53cf710-52f8-4790-bb7a-9968807f842f - - - - -] Error while processing VIF ports: MessagingTimeout: Timed out waiting for a reply to message ID 06bc2386bc6b42f2ad48ebc615
7b3ec6
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2163, in rpc_loop
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_info, provisioning_needed)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = f(*args, **kwargs)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1740, in process_network_ports
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent failed_devices['added'] |= self._bind_devices(need_binding_devices)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 892, in _bind_devices
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.conf.host, agent_restarted=agent_restarted)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 165, in update_device_list
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent agent_restarted=agent_restarted)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 185, in call
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent time.sleep(wait)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.force_reraise()
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent six.reraise(self.type_, self.value, self.tb)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 162, in call
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return self._original_context.call(ctxt, method, **kwargs)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent timeout=timeout, retry=retry)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = self._waiter.wait(msg_id, timeout)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id)
2021-02-02 12:05:54.869 27349 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 06bc2386bc6b42f2ad48ebc6157b3ec6
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1789177
Title:
RabbitMQ fails to synchronize exchanges under high load
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive mitaka series:
New
Status in Ubuntu Cloud Archive queens series:
Fix Committed
Status in Ubuntu Cloud Archive rocky series:
New
Status in Ubuntu Cloud Archive stein series:
Fix Released
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in oslo.messaging:
Fix Released
Status in python-oslo.messaging package in Ubuntu:
Fix Released
Status in python-oslo.messaging source package in Xenial:
In Progress
Status in python-oslo.messaging source package in Bionic:
Fix Released
Bug description:
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-
server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh)
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid
new connections with iptables rules for some time after failed node
gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct
messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages?
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1789177/+subscriptions
More information about the Ubuntu-sponsors
mailing list