[Bug 1909162] [NEW] cluster log slow request spam

Dan Hill 1909162 at bugs.launchpad.net
Thu Dec 24 01:59:19 UTC 2020


Public bug reported:

[Impact]

A recent change (issue#43975 [0]) was made to slow request logging to
include detail on each operation in the cluster logs. With this change,
detail for every slow request is always sent to the monitors and added
to the cluster logs.

This does not scale. Large, high-throughput clusters can overwhelm their
monitors with spurious logs in the event of a performance issue.
Disrupting the monitors can then cause further instability in the
cluster.

This SRU reverts the cluster logging of every slow request the osd is
processing.

The slow request clog change was added in nautilus (14.2.10) and octopus
(15.2.0).

[Test Case]

Stress the cluster with a benchmarking tool to generate slow requests
and observe the cluster logs.

[Where problems could occur]

The cluster logs contain detailed debug information on slow requests
that is useful for smaller, low-throughput clusters. While these logs
are not used by ceph, they may be used by the cluster administrators
(for monitoring or alerts). Changing this logging behavior may be
unexpected.

[Other Info]

The intent is to re-enable this feature behind a configurable setting,
but the solution must be discussed upstream.

The same slow request detail can be enabled for each osd by raising the
"debug osd" log level to 20.

[0] https://tracker.ceph.com/issues/43975

** Affects: cloud-archive
     Importance: High
         Status: In Progress

** Affects: cloud-archive/train
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress

** Affects: cloud-archive/ussuri
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress

** Affects: ceph (Ubuntu)
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress

** Affects: ceph (Ubuntu Focal)
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress

** Affects: ceph (Ubuntu Groovy)
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress

** Affects: ceph (Ubuntu Hirsute)
     Importance: High
     Assignee: gerald.yang (gerald-yang-tw)
         Status: In Progress


** Tags: seg sts

** Also affects: ceph (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: ceph (Ubuntu Hirsute)
   Importance: Undecided
       Status: New

** Also affects: ceph (Ubuntu Groovy)
   Importance: Undecided
       Status: New

** Tags added: seg sts

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/train
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
       Status: New

** Changed in: ceph (Ubuntu Hirsute)
       Status: New => In Progress

** Changed in: ceph (Ubuntu Hirsute)
   Importance: Undecided => High

** Changed in: ceph (Ubuntu Groovy)
   Importance: Undecided => High

** Changed in: ceph (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: cloud-archive/ussuri
   Importance: Undecided => High

** Changed in: cloud-archive/train
   Importance: Undecided => High

** Changed in: cloud-archive
   Importance: Undecided => High

** Changed in: ceph (Ubuntu Groovy)
       Status: New => In Progress

** Changed in: ceph (Ubuntu Focal)
       Status: New => In Progress

** Changed in: cloud-archive/ussuri
       Status: New => In Progress

** Changed in: cloud-archive/train
       Status: New => In Progress

** Changed in: cloud-archive
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1909162

Title:
  cluster log slow request spam

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive train series:
  In Progress
Status in Ubuntu Cloud Archive ussuri series:
  In Progress
Status in ceph package in Ubuntu:
  In Progress
Status in ceph source package in Focal:
  In Progress
Status in ceph source package in Groovy:
  In Progress
Status in ceph source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  A recent change (issue#43975 [0]) was made to slow request logging to
  include detail on each operation in the cluster logs. With this
  change, detail for every slow request is always sent to the monitors
  and added to the cluster logs.

  This does not scale. Large, high-throughput clusters can overwhelm
  their monitors with spurious logs in the event of a performance issue.
  Disrupting the monitors can then cause further instability in the
  cluster.

  This SRU reverts the cluster logging of every slow request the osd is
  processing.

  The slow request clog change was added in nautilus (14.2.10) and
  octopus (15.2.0).

  [Test Case]

  Stress the cluster with a benchmarking tool to generate slow requests
  and observe the cluster logs.

  [Where problems could occur]

  The cluster logs contain detailed debug information on slow requests
  that is useful for smaller, low-throughput clusters. While these logs
  are not used by ceph, they may be used by the cluster administrators
  (for monitoring or alerts). Changing this logging behavior may be
  unexpected.

  [Other Info]

  The intent is to re-enable this feature behind a configurable setting,
  but the solution must be discussed upstream.

  The same slow request detail can be enabled for each osd by raising
  the "debug osd" log level to 20.

  [0] https://tracker.ceph.com/issues/43975

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1909162/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list