[Bug 1900775] Re: Cinder fails to create image-based volume if mirroring is enabled

Corey Bryant 1900775 at bugs.launchpad.net
Thu Jan 28 16:28:17 UTC 2021


** Description changed:

+ [Impact | Test Case]
+ 
  OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
  Cinder has Ceph as backend for volumes.
  
  Creating a volume from qcow2 image.
  Current flow is the following:
  1. Cinder creates empty volume in Ceph
  2. Cinder downloads the image
  3. Image is converted to raw
  4. Volume is deleted https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1611
  5. Cinder performs "rbd import" using unpacked raw image as source.
  
  Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
  It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
  The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.
+ 
+ [Regression Potential]
+ This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.
+ 
+ [Discussion]
+ This is accompanied by a unit test fix for: https://pad.lv/1913607

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1900775

Title:
  Cinder fails to create image-based volume if mirroring is enabled

Status in charm-ceph-rbd-mirror:
  Invalid
Status in Cinder:
  In Progress
Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive stein series:
  Triaged
Status in Ubuntu Cloud Archive train series:
  Triaged
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in cinder package in Ubuntu:
  Fix Released
Status in cinder source package in Xenial:
  Triaged
Status in cinder source package in Bionic:
  Triaged
Status in cinder source package in Focal:
  Triaged
Status in cinder source package in Groovy:
  Triaged

Bug description:
  [Impact | Test Case]

  OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
  Cinder has Ceph as backend for volumes.

  Creating a volume from qcow2 image.
  Current flow is the following:
  1. Cinder creates empty volume in Ceph
  2. Cinder downloads the image
  3. Image is converted to raw
  4. Volume is deleted https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1611
  5. Cinder performs "rbd import" using unpacked raw image as source.

  Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
  It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
  The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.

  [Regression Potential]
  This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.

  [Discussion]
  This is accompanied by a unit test fix for: https://pad.lv/1913607

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1900775/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list