[Bug 1909162] [NEW] cluster log slow request spam
Dan Hill
1909162 at bugs.launchpad.net
Thu Dec 24 01:59:19 UTC 2020
Public bug reported:
[Impact]
A recent change (issue#43975 [0]) was made to slow request logging to
include detail on each operation in the cluster logs. With this change,
detail for every slow request is always sent to the monitors and added
to the cluster logs.
This does not scale. Large, high-throughput clusters can overwhelm their
monitors with spurious logs in the event of a performance issue.
Disrupting the monitors can then cause further instability in the
cluster.
This SRU reverts the cluster logging of every slow request the osd is
processing.
The slow request clog change was added in nautilus (14.2.10) and octopus
(15.2.0).
[Test Case]
Stress the cluster with a benchmarking tool to generate slow requests
and observe the cluster logs.
[Where problems could occur]
The cluster logs contain detailed debug information on slow requests
that is useful for smaller, low-throughput clusters. While these logs
are not used by ceph, they may be used by the cluster administrators
(for monitoring or alerts). Changing this logging behavior may be
unexpected.
[Other Info]
The intent is to re-enable this feature behind a configurable setting,
but the solution must be discussed upstream.
The same slow request detail can be enabled for each osd by raising the
"debug osd" log level to 20.
[0] https://tracker.ceph.com/issues/43975
** Affects: cloud-archive
Importance: High
Status: In Progress
** Affects: cloud-archive/train
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Affects: cloud-archive/ussuri
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Affects: ceph (Ubuntu)
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Affects: ceph (Ubuntu Focal)
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Affects: ceph (Ubuntu Groovy)
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Affects: ceph (Ubuntu Hirsute)
Importance: High
Assignee: gerald.yang (gerald-yang-tw)
Status: In Progress
** Tags: seg sts
** Also affects: ceph (Ubuntu Focal)
Importance: Undecided
Status: New
** Also affects: ceph (Ubuntu Hirsute)
Importance: Undecided
Status: New
** Also affects: ceph (Ubuntu Groovy)
Importance: Undecided
Status: New
** Tags added: seg sts
** Also affects: cloud-archive
Importance: Undecided
Status: New
** Also affects: cloud-archive/train
Importance: Undecided
Status: New
** Also affects: cloud-archive/ussuri
Importance: Undecided
Status: New
** Changed in: ceph (Ubuntu Hirsute)
Status: New => In Progress
** Changed in: ceph (Ubuntu Hirsute)
Importance: Undecided => High
** Changed in: ceph (Ubuntu Groovy)
Importance: Undecided => High
** Changed in: ceph (Ubuntu Focal)
Importance: Undecided => High
** Changed in: cloud-archive/ussuri
Importance: Undecided => High
** Changed in: cloud-archive/train
Importance: Undecided => High
** Changed in: cloud-archive
Importance: Undecided => High
** Changed in: ceph (Ubuntu Groovy)
Status: New => In Progress
** Changed in: ceph (Ubuntu Focal)
Status: New => In Progress
** Changed in: cloud-archive/ussuri
Status: New => In Progress
** Changed in: cloud-archive/train
Status: New => In Progress
** Changed in: cloud-archive
Status: New => In Progress
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1909162
Title:
cluster log slow request spam
Status in Ubuntu Cloud Archive:
In Progress
Status in Ubuntu Cloud Archive train series:
In Progress
Status in Ubuntu Cloud Archive ussuri series:
In Progress
Status in ceph package in Ubuntu:
In Progress
Status in ceph source package in Focal:
In Progress
Status in ceph source package in Groovy:
In Progress
Status in ceph source package in Hirsute:
In Progress
Bug description:
[Impact]
A recent change (issue#43975 [0]) was made to slow request logging to
include detail on each operation in the cluster logs. With this
change, detail for every slow request is always sent to the monitors
and added to the cluster logs.
This does not scale. Large, high-throughput clusters can overwhelm
their monitors with spurious logs in the event of a performance issue.
Disrupting the monitors can then cause further instability in the
cluster.
This SRU reverts the cluster logging of every slow request the osd is
processing.
The slow request clog change was added in nautilus (14.2.10) and
octopus (15.2.0).
[Test Case]
Stress the cluster with a benchmarking tool to generate slow requests
and observe the cluster logs.
[Where problems could occur]
The cluster logs contain detailed debug information on slow requests
that is useful for smaller, low-throughput clusters. While these logs
are not used by ceph, they may be used by the cluster administrators
(for monitoring or alerts). Changing this logging behavior may be
unexpected.
[Other Info]
The intent is to re-enable this feature behind a configurable setting,
but the solution must be discussed upstream.
The same slow request detail can be enabled for each osd by raising
the "debug osd" log level to 20.
[0] https://tracker.ceph.com/issues/43975
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1909162/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list