[Bug 1909162] Re: cluster log slow request spam
Dan Hill
1909162 at bugs.launchpad.net
Fri Dec 2 00:43:20 UTC 2022
Verified the ceph package in nautilus-proposed
(14.2.22-0ubuntu0.19.10.1~cloud2)
Detail steps:
1. deployed bionic+nautilus with `--force`
2. added nautilus-proposed
3. upgraded ceph to `14.2.22-0ubuntu0.19.10.1~cloud2` and restarted ceph services
4. ran a write benchmark: `sudo rados bench -p bench_pool 30 write --no-cleanup`
5. lowered slow request complaint threshold: `sudo ceph config set osd osd_op_complaint_time 0.1`
6. increased osd debug: `sudo ceph config set osd debug_osd 20`
7. verified slow request debug detail is present in the osd logs
8. verified slow request debug detail is NOT present in the cluster log (ceph.log)
** Tags removed: verification-train-needed
** Tags added: verification-train-done
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1909162
Title:
cluster log slow request spam
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive train series:
Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in ceph package in Ubuntu:
Fix Released
Status in ceph source package in Focal:
Fix Released
Status in ceph source package in Groovy:
Fix Released
Status in ceph source package in Hirsute:
Fix Released
Bug description:
[Impact]
A recent change (issue#43975 [0]) was made to slow request logging to
include detail on each operation in the cluster logs. With this
change, detail for every slow request is always sent to the monitors
and added to the cluster logs.
This does not scale. Large, high-throughput clusters can overwhelm
their monitors with spurious logs in the event of a performance issue.
Disrupting the monitors can then cause further instability in the
cluster.
This SRU reverts the cluster logging of every slow request the osd is
processing.
The slow request clog change was added in nautilus (14.2.10) and
octopus (15.2.0).
[Test Case]
Stress the cluster with a benchmarking tool to generate slow requests
and observe the cluster logs.
[Where problems could occur]
The cluster logs contain detailed debug information on slow requests
that is useful for smaller, low-throughput clusters. While these logs
are not used by ceph, they may be used by the cluster administrators
(for monitoring or alerts). Changing this logging behavior may be
unexpected.
[Other Info]
The intent is to re-enable this feature behind a configurable setting,
but the solution must be discussed upstream.
The same slow request detail can be enabled for each osd by raising
the "debug osd" log level to 20.
[0] https://tracker.ceph.com/issues/43975
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1909162/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list