[Bug 1704106] Re: [SRU] Gathering of thin provisioning stats breaks c-vol

Thu Mar 28 20:29:50 UTC 2019

This is fixed in cinder 11.2.1 point release for pike so we might as
well just pick this up in a new round of point releases. We'll do that
via this bug: https://bugs.launchpad.net/cloud-archive/+bug/1822192

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1704106

Title:
  [SRU] Gathering of thin provisioning stats breaks c-vol

Status in Cinder:
  Fix Released
Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ocata series:
  Triaged
Status in Ubuntu Cloud Archive pike series:
  Triaged

Bug description:
  [Impact]

  Backport config option added in Queens to allow disabling the
  collection of stats from all rbd volumes since this causes
  tons of non-fatal race conditions and slows down deletes to
  the point where the rpc thread pool fills up blocking further
  requests. Our charms do not configure pool by default and we
  are not aware of anyone doing this in the field so this patch
  enables this option by default.

  [Test Case]

  By default no change in behaviour should occur. To test the
  new feature we need to enable it i.e.:

  * deploy openstack ocata
  * set rbd_exclusive_cinder_pool = true in cinder.conf
  * create 100 volumes via cinder
  * also create 100 volumes from the cinder pool but using the rbd client directly
  * delete cinder volumes (via cinder) and delete the non-cinder rbd volumes using rbd client
  * ensure there are no exceptions in cinder-volume.log

  [Regression Potential]
  The default behaviour is unchanged so no regression is expected.

  ==========================================================================

  The gathering of the thin provisioning stats is done by looping over
  all volumes:

  https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L369

  For larger deployments, this loop (done at start-up, upon volume
  deletion and as periodic a task) is taking too long and hence breaks
  the c-vol service.

  From what I understand, the overall idea of this stats gathering is to
  bring the current real fill status of the pool to the admin's
  attention in case over-subscription was configured. For this, a fill
  status at the pool level (rather than the volume level) should be good
  enough.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1704106/+subscriptions