[Bug 1959649] [NEW] BlueFS spillover detected for particular OSDs
nikhil kshirsagar
1959649 at bugs.launchpad.net
Tue Feb 1 05:43:34 UTC 2022
Public bug reported:
This is an issue described in https://tracker.ceph.com/issues/38745,
where ceph health details shows messages like,
sudo ceph health detail
HEALTH_WARN 3 OSD(s) experiencing BlueFS spillover; mon juju-6879b7-6-lxd-1 is low on available space
[WRN] BLUEFS_SPILLOVER: 3 OSD(s) experiencing BlueFS spillover <---
osd.41 spilled over 66 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
osd.96 spilled over 461 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
osd.105 spilled over 198 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
The bluefs spillover is very likely caused because of the rocksdb's
level-sized issue.
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-
ref/#sizing has a statement about this leveled sizing.
Between versions 15.2.6 and 15.2.10 , if the value of
bluestore_volume_selection_policy is not set to use_some_extra, this
issue can be faced inspite of free space available, due to the fact that
RocksDB only uses "leveled" space on the NVME partition. The values are
set to be 300MB, 3GB, 30GB and 300GB. Every DB space above such a limit
will automatically end up on slow devices.
There is also a discussion at www.mail-archive.com/ceph-
users at ceph.io/msg05782.html
Running compaction on the database, i.e ceph tell osd.XX compact
(replace XX with the OSD number) can work around the issue, but the best
fix is to either,
I am also pasting some notes Dongdong mentions on SF case 00326782,
where the fix is to either,
A. Redeploy the OSDs with a larger DB lvm/partition.
OR
B. Migrate to a new larger DB lvm/partition, this can be done offline
with ceph-volume lvm migrate, please refer to
https://docs.ceph.com/en/octopus/ceph-volume/lvm/migrate/ but it
requires to upgrade the cluster to 15.2.14 first.
A will be much safer, but more time-consuming. B will be much faster,
but its recommended to do it on one node first and wait/monitoring for a
couple of weeks before moving forward.
As mentioned above, to avoid running into the issue even with free space
available, the value of bluestore_volume_selection_policy should be set
to use_some_extra for all OSDs. 15.2.6 has
bluestore_volume_selection_policy but the default was only set to
use_some_extra 15.2.11 onwards. (https://tracker.ceph.com/issues/47053)
** Affects: ceph (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1959649
Title:
BlueFS spillover detected for particular OSDs
Status in ceph package in Ubuntu:
New
Bug description:
This is an issue described in https://tracker.ceph.com/issues/38745,
where ceph health details shows messages like,
sudo ceph health detail
HEALTH_WARN 3 OSD(s) experiencing BlueFS spillover; mon juju-6879b7-6-lxd-1 is low on available space
[WRN] BLUEFS_SPILLOVER: 3 OSD(s) experiencing BlueFS spillover <---
osd.41 spilled over 66 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
osd.96 spilled over 461 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
osd.105 spilled over 198 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) to slow device
The bluefs spillover is very likely caused because of the rocksdb's
level-sized issue.
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-
ref/#sizing has a statement about this leveled sizing.
Between versions 15.2.6 and 15.2.10 , if the value of
bluestore_volume_selection_policy is not set to use_some_extra, this
issue can be faced inspite of free space available, due to the fact
that RocksDB only uses "leveled" space on the NVME partition. The
values are set to be 300MB, 3GB, 30GB and 300GB. Every DB space above
such a limit will automatically end up on slow devices.
There is also a discussion at www.mail-archive.com/ceph-
users at ceph.io/msg05782.html
Running compaction on the database, i.e ceph tell osd.XX compact
(replace XX with the OSD number) can work around the issue, but the
best fix is to either,
I am also pasting some notes Dongdong mentions on SF case 00326782,
where the fix is to either,
A. Redeploy the OSDs with a larger DB lvm/partition.
OR
B. Migrate to a new larger DB lvm/partition, this can be done offline
with ceph-volume lvm migrate, please refer to
https://docs.ceph.com/en/octopus/ceph-volume/lvm/migrate/ but it
requires to upgrade the cluster to 15.2.14 first.
A will be much safer, but more time-consuming. B will be much faster,
but its recommended to do it on one node first and wait/monitoring for
a couple of weeks before moving forward.
As mentioned above, to avoid running into the issue even with free
space available, the value of bluestore_volume_selection_policy should
be set to use_some_extra for all OSDs. 15.2.6 has
bluestore_volume_selection_policy but the default was only set to
use_some_extra 15.2.11 onwards.
(https://tracker.ceph.com/issues/47053)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1959649/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list