[Bug 1843085] Re: Backport of zero-length gc chain fixes to Luminous
Launchpad Bug Tracker
1843085 at bugs.launchpad.net
Mon Jan 20 16:51:44 UTC 2020
This bug was fixed in the package ceph - 12.2.12-0ubuntu0.18.04.4
---------------
ceph (12.2.12-0ubuntu0.18.04.4) bionic; urgency=medium
[ Billy Olsen ]
* Do not validate fs caps on authorize (LP: #1847822):
- d/p/dont-validate-fs-caps-on-authorize.patch: Do not validate
the filesystem caps with a new client connection to the monitor
when authorizing a client connection.
[ Dan Hill ]
* d/p/issue38454.patch: Cherry pick of fixes for misc RGW bugs
and cleanup of garbage collection code (LP: #1843085).
[ Dariusz Gadomski ]
* d/p/issue37490.patch: Cherry pick fix to optimize LVM queries
in ceph-volume, resolving performance issues in systems under
heavy load or with large numbers of disks (LP: #1850754).
-- James Page <james.page at ubuntu.com> Thu, 28 Nov 2019 10:27:34 +0000
** Changed in: ceph (Ubuntu Bionic)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1843085
Title:
Backport of zero-length gc chain fixes to Luminous
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive queens series:
Fix Committed
Status in Ubuntu Cloud Archive rocky series:
Fix Released
Status in ceph package in Ubuntu:
Invalid
Status in ceph source package in Bionic:
Fix Released
Bug description:
[Impact]
Cancelling large S3/Swift object puts may result in garbage collection entries with zero-length chains. Rados gateway garbage collection does not efficiently process and clean up these zero-length chains.
A large number of zero-length chains will result in rgw processes
quickly spinning through the garbage collection lists doing very
little work. This can result in abnormally high cpu utilization and op
workloads.
[Test Case]
Modify garbage collection parameters by editing ceph.conf on the target rgw:
```
rgw enable gc threads = false
rgw gc obj min wait = 60
rgw gc processor period = 60
```
Restart the ceph-radosgw service to apply the new configuration:
`sudo systemctl restart ceph-radosgw at rgw.$HOSTNAME`
Repeatedly interrupt 512MB object put requests for randomized object names:
```
for i in {0..1000}; do
f=$(mktemp); fallocate -l 512M $f
s3cmd put $f s3://test_bucket --disable-multipart &
pid=$!
sleep $((RANDOM % 7 + 3)); kill $pid
rm $f
done
```
Delete all objects in the bucket index:
```
for f in $(s3cmd ls s3://test_bucket | awk '{print $4}'); do
s3cmd del $f
done
```
By default rgw_max_gc_objs splits the garbage collection list into 32 shards.
Capture omap detail and verify zero-length chains were left over:
```
export CEPH_ARGS="--id=rgw.$HOSTNAME"
for i in {0..31}; do
sudo -E rados -p default.rgw.log --namespace gc listomapvals gc.$i
done
```
Confirm the garbage collection list contains expired objects by listing expiration timestamps:
`sudo -E radosgw-admin gc list | grep time; date`
Raise the debug level and process the garbage collection list:
`sudo -E radosgw-admin --debug-rgw=20 --err-to-stderr gc process`
Use the logs to verify the garbage collection process iterates through all remaining omap entry tags. Then confirm all rados objects have been cleaned up:
`sudo -E rados -p default.rgw.buckets.data ls`
[Regression Potential]
Backport has been accepted into the Luminous release stable branch upstream.
[Other Information]
This issue has been reported upstream [0] and was fixed in Nautilus alongside a number of other garbage collection issues/enhancements in pr#26601 [1]:
* adds additional logging to make future debugging easier.
* resolves bug where the truncated flag was not always set correctly in gc_iterate_entries
* resolves bug where marker in RGWGC::process was not advanced
* resolves bug in which gc entries with a zero-length chain were not trimmed
* resolves bug where same gc entry tag was added to list for deletion multiple times
These fixes were slated for back-port into Luminous and Mimic, but the
Luminous work was not completed because of a required dependency: AIO
GC [2]. This dependency has been resolved upstream, and is pending SRU
verification in Ubuntu packages [3].
[0] https://tracker.ceph.com/issues/38454
[1] https://github.com/ceph/ceph/pull/26601
[2] https://tracker.ceph.com/issues/23223
[3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1843085/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list