[Bug 2013960] Re: Recovery operation takes high priority than client I/O with mclock scheduler

Ponnuvel Palaniyappan 2013960 at bugs.launchpad.net
Tue May 23 13:42:54 UTC 2023


** Description changed:

  Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
  in 'main' branch and backported to Quincy [0][1].
  
  There's no upstream Quincy release with this fix yet.
  17.2.6 will have this fix which is undergoing QA at the moment.
  
  Workaround:
  
  There are couple of ways this can be mitigated in Quincy.
  
- 1. Use the 'wpq' as osd_op_queue. This has been the default in previous
- releases and works just fine. This will require restarting OSDs.
+ 1. Use the 'wpq' as osd_op_queue. This has been the default in previous releases and works just fine. This will require restarting OSDs.
+ Steps:
+ i. Change osd_op_queue to 'wpq': `sudo ceph config set osd osd_op_queue wpq`
+ ii. Rolling restart of all the OSDs (with `noout` & `norebalance` flags set)
+ iii. Check that 'wpq' is now set: `ceph tell osd.* config get osd_op_queue`
  
  2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters.
  ```
  osd_mclock_scheduler_background_recovery_res
  osd_mclock_scheduler_background_recovery_wgt
  osd_mclock_scheduler_background_recovery_lim
  ```
  To be able to use this option, 17.2.4 or later is required due to another
  bug [2]. So probably it's both simpler & straightforward to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.
  
  NB: This affects Quincy release only. Older (pacific, octopus, et all) use
  'wpq' and as much the recovery parameters can be modified as usual. Only
  starting from Quincy this has changed.
  
  [0] https://tracker.ceph.com/issues/57529
  [1] https://github.com/ceph/ceph/pull/48226
  [2] https://tracker.ceph.com/issues/55153

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2013960

Title:
  Recovery operation takes high priority than client I/O with mclock
  scheduler

Status in ceph package in Ubuntu:
  Confirmed

Bug description:
  Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
  in 'main' branch and backported to Quincy [0][1].

  There's no upstream Quincy release with this fix yet.
  17.2.6 will have this fix which is undergoing QA at the moment.

  Workaround:

  There are couple of ways this can be mitigated in Quincy.

  1. Use the 'wpq' as osd_op_queue. This has been the default in previous releases and works just fine. This will require restarting OSDs.
  Steps:
  i. Change osd_op_queue to 'wpq': `sudo ceph config set osd osd_op_queue wpq`
  ii. Rolling restart of all the OSDs (with `noout` & `norebalance` flags set)
  iii. Check that 'wpq' is now set: `ceph tell osd.* config get osd_op_queue`

  2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters.
  ```
  osd_mclock_scheduler_background_recovery_res
  osd_mclock_scheduler_background_recovery_wgt
  osd_mclock_scheduler_background_recovery_lim
  ```
  To be able to use this option, 17.2.4 or later is required due to another
  bug [2]. So probably it's both simpler & straightforward to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.

  NB: This affects Quincy release only. Older (pacific, octopus, et all) use
  'wpq' and as much the recovery parameters can be modified as usual. Only
  starting from Quincy this has changed.

  [0] https://tracker.ceph.com/issues/57529
  [1] https://github.com/ceph/ceph/pull/48226
  [2] https://tracker.ceph.com/issues/55153

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2013960/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list