[Bug 2013960] Re: Recovery operation takes high priority than client I/O with mclock scheduler

Tue Apr 4 17:11:13 UTC 2023

** Description changed:

- Starting with Quincy, the mclock_scheduler is used as default. However,
- the default recovery settings are very high that it the impact on client
- I/O can be really high depending on the amount of recovery operations
- needed to be done.
- 
- Affects Quincy only.
+ Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
+ in 'main' branch and backported to Quincy [0][1].

  There's no upstream Quincy release with this fix yet.
  17.2.6 will have this fix which is undergoing QA at the moment.

- Upstream bug: https://tracker.ceph.com/issues/57529
- Upstream fix: https://github.com/ceph/ceph/pull/48226
+ 
+ Workaround:
+ 
+ There are couple of ways this can be mitigated in Quincy.
+ 
+ 1. Use the 'wpq' as osd_op_queue. This has been the default in previous
+ releases and works just fine. This will require restarting OSDs.
+ 
+ 2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters. 
+ ```
+ osd_mclock_scheduler_background_recovery_res
+ osd_mclock_scheduler_background_recovery_wgt
+ osd_mclock_scheduler_background_recovery_lim
+ ```
+ To be able to use this option, 17.2.4 or later is required due to another 
+ bug [2].
+ 
+ NB: This affects Quincy release only. Older (pacific, octopus, et all) use
+ 'wpq' and as much the recovery parameters can be modified as usual. Only
+ starting from Quincy this has changed.
+ 
+ [0] https://tracker.ceph.com/issues/57529
+ [1] https://github.com/ceph/ceph/pull/48226
+ [2] https://tracker.ceph.com/issues/55153

** Description changed:

  Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
  in 'main' branch and backported to Quincy [0][1].

  There's no upstream Quincy release with this fix yet.
  17.2.6 will have this fix which is undergoing QA at the moment.
- 

  Workaround:

  There are couple of ways this can be mitigated in Quincy.

  1. Use the 'wpq' as osd_op_queue. This has been the default in previous
  releases and works just fine. This will require restarting OSDs.

- 2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters. 
+ 2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters.
  ```
  osd_mclock_scheduler_background_recovery_res
  osd_mclock_scheduler_background_recovery_wgt
  osd_mclock_scheduler_background_recovery_lim
  ```
- To be able to use this option, 17.2.4 or later is required due to another 
- bug [2].
+ To be able to use this option, 17.2.4 or later is required due to another
+ bug [2]. So probably it's both simpler to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.

  NB: This affects Quincy release only. Older (pacific, octopus, et all) use
  'wpq' and as much the recovery parameters can be modified as usual. Only
  starting from Quincy this has changed.

+ 
  [0] https://tracker.ceph.com/issues/57529
  [1] https://github.com/ceph/ceph/pull/48226
  [2] https://tracker.ceph.com/issues/55153

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2013960

Title:
  Recovery operation takes high priority than client I/O with mclock
  scheduler

Status in ceph package in Ubuntu:
  New

Bug description:
  Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
  in 'main' branch and backported to Quincy [0][1].

  There's no upstream Quincy release with this fix yet.
  17.2.6 will have this fix which is undergoing QA at the moment.

  Workaround:

  There are couple of ways this can be mitigated in Quincy.

  1. Use the 'wpq' as osd_op_queue. This has been the default in
  previous releases and works just fine. This will require restarting
  OSDs.

  2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters.
  ```
  osd_mclock_scheduler_background_recovery_res
  osd_mclock_scheduler_background_recovery_wgt
  osd_mclock_scheduler_background_recovery_lim
  ```
  To be able to use this option, 17.2.4 or later is required due to another
  bug [2]. So probably it's both simpler to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.

  NB: This affects Quincy release only. Older (pacific, octopus, et all) use
  'wpq' and as much the recovery parameters can be modified as usual. Only
  starting from Quincy this has changed.

  [0] https://tracker.ceph.com/issues/57529
  [1] https://github.com/ceph/ceph/pull/48226
  [2] https://tracker.ceph.com/issues/55153

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2013960/+subscriptions