[Bug 1914437] Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

Robie Basak 1914437 at bugs.launchpad.net
Wed Feb 3 17:54:24 UTC 2021


Hello Chris, or anyone else affected,

Accepted python-oslo.messaging into bionic-proposed. The package will
build now and be available at https://launchpad.net/ubuntu/+source
/python-oslo.messaging/5.35.0-0ubuntu3 in a few hours, and then in the
-proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
bionic to verification-done-bionic. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-bionic. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: python-oslo.messaging (Ubuntu Bionic)
       Status: Triaged => Fix Committed

** Tags added: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1914437

Title:
  [SRU] MessageTimeout and DuplicateMessage errors after udpate

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive rocky series:
  Triaged
Status in Ubuntu Cloud Archive stein series:
  Triaged
Status in oslo.messaging:
  New
Status in python-oslo.messaging package in Ubuntu:
  Invalid
Status in python-oslo.messaging source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  A recent update to oslo.messaging to resolve #1789177 causes failures.

  (Below comments copied form the original bug):

  After a partial upgrade (only one side, producers or consumers), there
  are a lot of MessageTimeout and DuplicateMessage errors in the logs.
  Downgrading back to 5.35.0-0ubuntu1~cloud0 fixed the problem.

  Right after restarted n-ovs-agent, I can see a lot of errors in rabbitmq log[1]
  which is the same as the error when rabbitmq failover issue ( the original issue of this LP )

  Then after I upgraded oslo.messaging in neutron-api unit and restarted
  neutron-server, below errors are gone and I was able to create
  instance again.

  After upgrading oslo.messaging in n-ovs only, exchange they communicate didn't match.
  As changing exchanges they use depends on publisher-cosumer relation.

  So I think there are two ways.
  1. revert this patch for Q ( original failover problem will be there )
  2. upgrade them with maintenance window

  Thanks a lot

  [1]
  ################################################################################
  =ERROR REPORT==== 3-Feb-2021::03:25:26 ===
  Channel error on connection <0.2379.1> (10.0.0.32:60430 -> 10.0.0.34:5672, vhost: 'openstack', user: 'neutron'), channel 1:
  {amqp_error,not_found,
              "no exchange 'reply_7da3cecc31b34bdeb96c866dc84e3044' in vhost 'openstack'",
              'basic.publish'}

  10.0.0.32 is neutron-api unit

  [Test Case]
  This SRU needs the following scenarios tested:

  1) partial upgrade of n-ovs at 5.35.0-0ubuntu3 [1] and n-api/n-gateway
  at 5.35.0-0ubuntu1 - instance creation will be successful

  2) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs
  at 5.35.0-0ubuntu1 - instance creation will be successful

  3) partial upgrade of n-ovs at 5.35.0-0ubuntu2 [1] and n-api/n-gateway
  at 5.35.0-0ubuntu3 - instance creation will fail (see regression
  potential)

  4) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs
  at 5.35.0-0ubuntu2 - instance creation will fail (see regression
  potential)

  5) test all neutron nodes at 5.35.0-0ubunt3 - instance creation will
  be successful

  [1] and neutron* services restarted

  [Regression Potential]
  There is regression potential for clouds that have already upgraded to 5.35.0-0ubuntu2. This needs to be tested but if a cloud has fully upgraded to 5.35.0-0ubuntu2, then the same disruption that this SRU is trying to solve may once again occur in a cloud with some services running 5.35.0-0ubuntu2 and some running 5.35.0-0ubuntu3. Once that cloud is entirely at 5.35.0-0ubuntu3, messages will no longer timeout.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1914437/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list