[Bug 1789177] Re: RabbitMQ fails to synchronize exchanges under high load
Ubuntu Foundations Team Bug Bot
1789177 at bugs.launchpad.net
Thu Dec 17 04:26:00 UTC 2020
The attachment "lp1789177_bionic.debdiff" seems to be a debdiff. The
ubuntu-sponsors team has been subscribed to the bug report so that they
can review and hopefully sponsor the debdiff. If the attachment isn't a
patch, please remove the "patch" flag from the attachment, remove the
"patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe
the team.
[This is an automated message performed by a Launchpad user owned by
~brian-murray, for any issue please contact him.]
** Tags added: patch
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1789177
Title:
RabbitMQ fails to synchronize exchanges under high load
Status in oslo.messaging:
Fix Released
Status in python-oslo.messaging package in Ubuntu:
New
Bug description:
[Impact]
Affected
Bionic
Not affected
Focal
[Test Case]
TBD
[Where problems could occur]
TBD
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid
new connections with iptables rules for some time after failed node
gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct
messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages?
To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+subscriptions
More information about the Ubuntu-sponsors
mailing list