[Bug 1914911] Re: [SRU] bluefs doesn't compact log file

Timo Aaltonen 1914911 at bugs.launchpad.net
Mon Apr 26 15:04:00 UTC 2021


Hello dongdong, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/ceph/12.2.13-0ubuntu0.18.04.7 in a
few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
bionic to verification-done-bionic. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-bionic. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: ceph (Ubuntu Bionic)
       Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1914911

Title:
  [SRU] bluefs doesn't compact log file

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in ceph package in Ubuntu:
  Invalid
Status in ceph source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  For a certain type of workload, the bluefs might never compact the log
  file, which would cause the bluefs log file slowly grows to a huge
  size (some bigger than 1TB for a 1.5T device).

  There are more details in the bluefs perf counters when this issue happened:
  e.g.
  "bluefs": {
  "gift_bytes": 811748818944,
  "reclaim_bytes": 0,
  "db_total_bytes": 888564350976,
  "db_used_bytes": 867311747072,
  "wal_total_bytes": 0,
  "wal_used_bytes": 0,
  "slow_total_bytes": 0,
  "slow_used_bytes": 0,
  "num_files": 11,
  "log_bytes": 866545131520,
  "log_compactions": 0,
  "logged_bytes": 866542977024,
  "files_written_wal": 2,
  "files_written_sst": 3,
  "bytes_written_wal": 32424281934,
  "bytes_written_sst": 25382201
  }

  This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
  We might see below log when trying to restart the osd:
  bluefs mount failed to replay log: (5) Input/output error

  As we can see the log_compactions is 0, which means it's never
  compacted and the log file size(log_bytes) is already 800+G. After the
  compaction, the log file size would need to be reduced to around 1G.

  [Test Case]

  Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and
  drive I/O. The compaction doesn't get triggered often when most I/O
  are reads. So fill up the cluster initially with lots of writes and
  then start reading heavy reads (no writes). Then the problem should
  occur. Smaller sized OSDs are OK as we'are only interested filling up
  the OSD and grow the bluefs log.

  [Where problems could occur]

  This fix has been part of all upstream releases since Mimic, so there's been quite good "runtime".
  The changes ensure that compaction happens more often. But that's not going to cause any problem.
  I can't see any real problems.

  [Other Info]
   - It's only needed for Luminous (Bionic). All new releases since have this already.
   - Upstream master PR: https://github.com/ceph/ceph/pull/17354
   - Upstream Luminous PR: https://github.com/ceph/ceph/pull/34876/files

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1914911/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list