[Bug 2013960] Re: Recovery operation takes high priority than client I/O with mclock scheduler
Ponnuvel Palaniyappan
2013960 at bugs.launchpad.net
Thu May 25 13:09:47 UTC 2023
The Quincy point release 17.2.6 (which has fix for this) is being
tracked via https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2018929
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/2013960
Title:
Recovery operation takes high priority than client I/O with mclock
scheduler
Status in ceph package in Ubuntu:
Confirmed
Bug description:
Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed
in 'main' branch and backported to Quincy [0][1].
There's no upstream Quincy release with this fix yet.
17.2.6 will have this fix which is undergoing QA at the moment.
Workaround:
There are couple of ways this can be mitigated in Quincy.
1. Use the 'wpq' as osd_op_queue. This has been the default in previous releases and works just fine. This will require restarting OSDs.
Steps:
i. Change osd_op_queue to 'wpq': `sudo ceph config set osd osd_op_queue wpq`
ii. Rolling restart of all the OSDs (with `noout` & `norebalance` flags set)
iii. Check that 'wpq' is now set: `ceph tell osd.* config get osd_op_queue`
2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be modify recovery parameters.
```
osd_mclock_scheduler_background_recovery_res
osd_mclock_scheduler_background_recovery_wgt
osd_mclock_scheduler_background_recovery_lim
```
To be able to use this option, 17.2.4 or later is required due to another
bug [2]. So probably it's both simpler & straightforward to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.
NB: This affects Quincy release only. Older (pacific, octopus, et all) use
'wpq' and as much the recovery parameters can be modified as usual. Only
starting from Quincy this has changed.
[0] https://tracker.ceph.com/issues/57529
[1] https://github.com/ceph/ceph/pull/48226
[2] https://tracker.ceph.com/issues/55153
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2013960/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list