[Bug 1883038] Please test proposed package

Tue Jan 18 15:50:45 UTC 2022

Hello norman, or anyone else affected,

Accepted python-oslo.messaging into train-proposed. The package will
build now and be available in the Ubuntu Cloud Archive in a few hours,
and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed
repository:

  sudo add-apt-repository cloud-archive:train-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, and change the tag
from verification-train-needed to verification-train-done. If it does
not fix the bug for you, please add a comment stating that, and change
the tag to verification-train-failed. In either case, details of your
testing will help us make a better decision.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance!

** Changed in: cloud-archive/train
       Status: New => Fix Committed

** Tags added: verification-train-needed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1883038

Title:
  Excessive number of ConnectionForced: Too many heartbeats missed in
  logs

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive stein series:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in oslo.messaging:
  Fix Released

Bug description:
  We are using Openstack Rocky as well as rabbitmq 3.7.4 in our
  production.

  Occasionally I saw many following lines in log

  2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
  2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed

  heartbeart interval is 60s and rate is 2. Although it is screaming for
  missing hearbeats seems rabbitmq server is running fine and messages
  are received and processed successfully.

  ***************************************************

  SRU Details
  -----------

  [Impact]
  AMQP messages are dropped sometimes resulted in resource creation errors (happened on an environment twice in a week).
  Catching the ConnectionForced AMQP connection and reestablish the connection immediately will remediate the issue.

  [Test Case]
  Reproducing the issue is trickysome. Here are the steps that might help in reproducing the issue.

  1. Deploy OpenStack 
      (If stsstack-bundles project is used, run command ./generate-bundle.sh -s bionic -r stein -n ddmi:stsstack --run)
  2. Change heartbeat_timeout_threshold to 20s in nova.conf and restart nova-api
  On nova-cloud-controller,

  [oslo_messaging_rabbit]
  heartbeat_timeout_threshold = 20

  systemctl restart apache2.service

  3. Create and delete instances continuously

  ./tools/instance_launch.sh 10 cirros  # command on stsstack-bundles
  openstack server list -c ID -f value | xargs openstack server delete

  4. On rabbitmq server, drop packets from nova-api -> rabbitmq and allow them randomly
  sudo iptables -A INPUT -p tcp --dport 5672 -s 10.5.1.55 -j DROP
  sudo iptables -D INPUT 1

  5. Perform steps 3,4 until you see the following message in nova-api log
  WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: amqp.exceptions.ConnectionForced: Too many heartbeats missed

  6. Install the fixed python-oslo.messaging package on nova-cloud-controller
     And restart apache service.

  7. Perform steps 3,4 and verify nova-api log for the following INFO message.
  INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed

  As the above test case is random in nature to reproduce, as additional
  measure, continuous integration tests for nova-cloud-controller will
  be run against the packages that are in -proposed.

  [Regression Potential]
  I do not foresee any regression potential as the patch just adds a new exception and reconnects to AMQP server immediately.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1883038/+subscriptions