[Bug 1906496] [NEW] mgr can be very slow in a large ceph cluster

Launchpad Bug Tracker 1906496 at bugs.launchpad.net
Thu Dec 10 15:32:09 UTC 2020


You have been subscribed to a public bug by Ponnuvel Palaniyappan (pponnuvel):

upstream implemented a new feature [1] that will check/report those long
network ping times between osds, but it introduced an issue that ceph-
mgr might be very slow because it needs to dump all the new osd network
ping stats [2] for some tasks, this can be bad especially when the
cluster has large number of osds.

Since these kind osd network ping stats doesn't need to be exposed to the python mgr module.
so, it only makes the mgr doing more work than it needs to, it could cause the mgr slow or even hang and could cause the cpu usage of mgr process constantly high. the fix is to disable the ping time dump for those mgr python modules.

The major fix from upstream is here [3], and also I found an improvement
commit [4] that submitted later in another PR.

We need to backport them to bionic Luminous and Mimic(Stein), Nautilus
and Octopus have the fix

[1] https://github.com/ceph/ceph/pull/28755
[2] https://github.com/ceph/ceph/pull/28755/files#diff-5498d83111f1210998ee186e98d5836d2bce9992be7648addc83f59e798cddd8L430
[3] https://github.com/ceph/ceph/pull/32406
[4] https://github.com/ceph/ceph/pull/32554/commits/1112584621016c4a8cac1bedb1a1b8b17c394f7f

** Affects: ceph (Ubuntu)
     Importance: Undecided
     Assignee: Ponnuvel Palaniyappan (pponnuvel)
         Status: In Progress


** Tags: sts
-- 
mgr can be very slow in a large ceph cluster
https://bugs.launchpad.net/bugs/1906496
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.



More information about the Ubuntu-sponsors mailing list