[Bug 1843085] Re: Backport of zero-length gc chain fixes to Luminous

Ɓukasz Zemczak 1843085 at bugs.launchpad.net
Thu Jan 16 11:31:21 UTC 2020


I see there's a very specific test-case here in the bug to verify if the
issue is resolved, but I don't see any mention of it being ran as part
of verification. Was it part of the general regression testing? Could
you perform those steps and only then switch the bug to -verified? Thank
you!

** Tags removed: verification-done verification-done-bionic
** Tags added: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1843085

Title:
  Backport of zero-length gc chain fixes to Luminous

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Committed
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in ceph package in Ubuntu:
  Invalid
Status in ceph source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  Cancelling large S3/Swift object puts may result in garbage collection entries with zero-length chains. Rados gateway garbage collection does not efficiently process and clean up these zero-length chains.

  A large number of zero-length chains will result in rgw processes
  quickly spinning through the garbage collection lists doing very
  little work. This can result in abnormally high cpu utilization and op
  workloads.

  [Test Case]
  Modify garbage collection parameters by editing ceph.conf on the target rgw:
  ```
  rgw enable gc threads = false
  rgw gc obj min wait = 60
  rgw gc processor period = 60
  ```

  Restart the ceph-radosgw service to apply the new configuration:
  `sudo systemctl restart ceph-radosgw at rgw.$HOSTNAME`

  Repeatedly interrupt 512MB object put requests for randomized object names:
  ```
  for i in {0..1000}; do 
    f=$(mktemp); fallocate -l 512M $f
    s3cmd put $f s3://test_bucket --disable-multipart &
    pid=$!
    sleep $((RANDOM % 7 + 3)); kill $pid
    rm $f
  done
  ```

  Delete all objects in the bucket index:
  ```
  for f in $(s3cmd ls s3://test_bucket | awk '{print $4}'); do
    s3cmd del $f
  done
  ```

  By default rgw_max_gc_objs splits the garbage collection list into 32 shards.
  Capture omap detail and verify zero-length chains were left over:
  ```
  export CEPH_ARGS="--id=rgw.$HOSTNAME"
  for i in {0..31}; do 
    sudo -E rados -p default.rgw.log --namespace gc listomapvals gc.$i
  done
  ```

  Confirm the garbage collection list contains expired objects by listing expiration timestamps:
  `sudo -E radosgw-admin gc list | grep time; date`

  Raise the debug level and process the garbage collection list:
  `sudo -E radosgw-admin --debug-rgw=20 --err-to-stderr gc process`

  Use the logs to verify the garbage collection process iterates through all remaining omap entry tags. Then confirm all rados objects have been cleaned up:
  `sudo -E rados -p default.rgw.buckets.data ls`

  [Regression Potential]
  Backport has been accepted into the Luminous release stable branch upstream.

  [Other Information]
  This issue has been reported upstream [0] and was fixed in Nautilus alongside a number of other garbage collection issues/enhancements in pr#26601 [1]:
  * adds additional logging to make future debugging easier.
  * resolves bug where the truncated flag was not always set correctly in gc_iterate_entries
  * resolves bug where marker in RGWGC::process was not advanced
  * resolves bug in which gc entries with a zero-length chain were not trimmed
  * resolves bug where same gc entry tag was added to list for deletion multiple times

  These fixes were slated for back-port into Luminous and Mimic, but the
  Luminous work was not completed because of a required dependency: AIO
  GC [2]. This dependency has been resolved upstream, and is pending SRU
  verification in Ubuntu packages [3].

  [0] https://tracker.ceph.com/issues/38454
  [1] https://github.com/ceph/ceph/pull/26601
  [2] https://tracker.ceph.com/issues/23223
  [3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1843085/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list