[Bug 1936136] Re: ceph on bcache performance regression

Ponnuvel Palaniyappan 1936136 at bugs.launchpad.net
Thu Jul 15 14:20:53 UTC 2021


** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1936136

Title:
  ceph on bcache performance regression

Status in ceph package in Ubuntu:
  New

Bug description:
  Ceph on bcache could have serious performance degradation (10 times drop) when the below two conditions are met:
  1. bluefs_buffered_io is turned on

  2. Any OSD bcache’s cache_available_percent is less than 60

  As many of us may already know that bcache will force all writes to go
  directly to the backing device when the cache_available_percent is
  less than CUTOFF_WRITEBACK_SYNC(30).

  But the thing is that bcache will start to bypass *some* writes when cache_available_percent reached to CUTOFF_WRITEBACK(60), and those are the writes that are not carrying any synchronization flags. Those are kernel IO flags: REQ_SYNC, REQ_FUA, REQ_PREFLUSH.
  The code is here https://github.com/torvalds/linux/blob/master/drivers/md/bcache/writeback.h#L123

  
  The problem I found from a recent case (bionic-stein + 4.15 kernel) is that when bluefs are submitting writes with bluefs_buffered_io turned on, the writes that send to bcache won’t carry any of the sync flags, and when the cache_available_percent dropped lower than 60 (which is quite easy to hit), all bluefs IO will be forced to be performed in a non-writeback mode. This is equivalent to setting up the bluestore DB on an HDD device, so every IO is bounded by the HDD speed. 

  I’m not sure How the sync flags are propagated from ceph all the way to the kernel bcache layer.
  But I’ve verified all different ceph/kernel/ubuntu versions when bluefs_buffered_io turned on:

  N: no issue, all writes contain SYNC flag. 
  P: has the issue, disable bluefs_buffered_io works. 

  Bionic-ussuri + kernel 5.4.0  -> N

  Bionic-ussuri + kernel 4.15.0 -> P

  Bionic-stein + kernel 5.4.0 -> N

  Bionic-stein + kernel 4.15.0 -> P

  Bionic-train + kernel 5.4.0 -> N

  Bionic-train + kernel 4.15.0 -> P

  Focal(octopus) + kernel 5.4.0 -> N

  Focal(octopus) + kernel 5.8.0 -> N

  Focal-wallaby + kernel 5.4.0 -> N

  Focal-wallaby + kernel 5.8.0 -> N

  
  As we can see the issue appears to hit when bluefs_buffered_io = true and the kernel with 4.15.0.
  I’m not sure how/why the SYNC flag is added in the 5.4 or 5.8 kernel when bluefs_buffered_io is enabled. Currently, I know 5.4 and 5.8 are good when bluefs_buffered_io is turned on. 

  Note that if all OSDs are deployed with separate NVME as the bluestore
  DB device, then the cluster won’t hit the issue, only those OSDs that
  put bluestore DB on bcache device will hit this issue.

  Ceph releases with bluefs_buffered_io enabled by default:
  bluefs_buffered_io was enabled by default in v13.2.0 and v14.2.0.
  bluefs_buffered_io was disabled by default in v14.2.10 and v15.2.0.
  bluefs_buffered io was re-enabled in the following point releases:
  v14.2.22
  v15.2.13
  v16.2.0

  So in a summary, if all below 3 are met, this cluster will very likely
  hit the issue when any OSD bcache has the cache_available_percent
  dropped to 60:

  1. ceph has bluefs_buffered_io enabled

  2. OSDs are putting bluestore DB on top of bcache device

  3. kernel version is bionic-ga (4.15.0)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1936136/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list