[Bug 1789177] Re: RabbitMQ fails to synchronize exchanges under high load

Seyeong Kim 1789177 at bugs.launchpad.net
Thu Dec 17 03:30:25 UTC 2020


** Tags added: sts

** Description changed:

+ [Impact]
+ 
+ 
+ Affected
+  Bionic
+ Not affected
+  Focal
+ 
+ [Test Case]
+ TBD
+ 
+ 
+ [Where problems could occur]
+ TBD
+ 
+ [Others]
+ 
+ 
+ // original description
+ 
  Input:
-  - OpenStack Pike cluster with ~500 nodes
-  - DVR enabled in neutron
-  - Lots of messages
+  - OpenStack Pike cluster with ~500 nodes
+  - DVR enabled in neutron
+  - Lots of messages
  
  Scenario: failover of one rabbit node in a cluster
  
  Issue: after failed rabbit node gets back online some rpc communications appear broken
  Logs from rabbit:
  
  =ERROR REPORT==== 10-Aug-2018::17:24:37 ===
  Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
  operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
  
  Investigation:
  After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
  
  Workaround: let the recovered node synchronize all exchanges - forbid
  new connections with iptables rules for some time after failed node gets
  online (30 sec)
  
  Proposal: do not create new exchanges (use default) for all direct
  messages - this also fixes the issue.
  
  Is there a good reason for creating new exchanges for direct messages?

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to python-oslo.messaging in Ubuntu.
https://bugs.launchpad.net/bugs/1789177

Title:
  RabbitMQ fails to synchronize exchanges under high load

Status in oslo.messaging:
  Fix Released
Status in python-oslo.messaging package in Ubuntu:
  New

Bug description:
  [Impact]

  
  Affected
   Bionic
  Not affected
   Focal

  [Test Case]
  TBD

  
  [Where problems could occur]
  TBD

  [Others]

  
  // original description

  Input:
   - OpenStack Pike cluster with ~500 nodes
   - DVR enabled in neutron
   - Lots of messages

  Scenario: failover of one rabbit node in a cluster

  Issue: after failed rabbit node gets back online some rpc communications appear broken
  Logs from rabbit:

  =ERROR REPORT==== 10-Aug-2018::17:24:37 ===
  Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
  operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'

  Investigation:
  After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.

  Workaround: let the recovered node synchronize all exchanges - forbid
  new connections with iptables rules for some time after failed node
  gets online (30 sec)

  Proposal: do not create new exchanges (use default) for all direct
  messages - this also fixes the issue.

  Is there a good reason for creating new exchanges for direct messages?

To manage notifications about this bug go to:
https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list