[Bug 2019293] Re: mgr: relax "pending_service_map.epoch > service_map.epoch" assert

dongdong tao 2019293 at bugs.launchpad.net
Mon May 15 02:07:32 UTC 2023


** Description changed:

- When we are activating we may receive several service map updates
- initiated by the previous active mgr. Treat them all as initial map.
- Unexpected assert will hit when there are more than one service map are received by the new active mgr
+ [Impact]
  
- Fixes: https://tracker.ceph.com/issues/51835
+ This issue has been observed from ubuntu Octopus release. 
+ An assert will be triggered during the mgr fail-over process if the new active one unexpectedly received two continuous service map update. 
+ The upstream fix has relaxed the assert condition to allow the new active mgr to receive multiple service map update in a fail-over scenario. 
  
- upstream PR: https://github.com/ceph/ceph/pull/45984
+ [Test Case]
+ 
+ 1. Deploy a 15.2.16 ceph cluster
+ 
+ 2. upgrade it to 15.2.17, inject multiple service map to the monitor
+ 
+ 3. stop the active mgr
+ 
+ 4. observe the new active mgr will hit the assert condition
+ 
+ 
+ [Potential Regression]
+ The new active mgr would be required to process multiple service map, it might slow down a little bit on the fail-over process, but still much better than crash.
+ 
+ 
+ [Other info]
+ 
+ Upstream bug tracker: https://tracker.ceph.com/issues/51835
+ Upstream PR: https://github.com/ceph/ceph/pull/45984
  we need to backport it to octopus

** Patch added: "focal debdiff"
   https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2019293/+attachment/5673057/+files/fix_mgr_crash.diff

** Tags added: sts

** Tags added: sts-sru-needed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2019293

Title:
  mgr: relax "pending_service_map.epoch > service_map.epoch" assert

Status in ceph package in Ubuntu:
  New

Bug description:
  [Impact]

  This issue has been observed from ubuntu Octopus release. 
  An assert will be triggered during the mgr fail-over process if the new active one unexpectedly received two continuous service map update. 
  The upstream fix has relaxed the assert condition to allow the new active mgr to receive multiple service map update in a fail-over scenario. 

  [Test Case]

  1. Deploy a 15.2.16 ceph cluster

  2. upgrade it to 15.2.17, inject multiple service map to the monitor

  3. stop the active mgr

  4. observe the new active mgr will hit the assert condition

  
  [Potential Regression]
  The new active mgr would be required to process multiple service map, it might slow down a little bit on the fail-over process, but still much better than crash.


  [Other info]

  Upstream bug tracker: https://tracker.ceph.com/issues/51835
  Upstream PR: https://github.com/ceph/ceph/pull/45984
  we need to backport it to octopus

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2019293/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list