[Bug 1900775] Re: Cinder fails to create image-based volume if mirroring is enabled

Corey Bryant 1900775 at bugs.launchpad.net
Thu Jan 28 18:48:22 UTC 2021


** Description changed:

- [Impact | Test Case]
+ [Impact]
  
  OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
  Cinder has Ceph as backend for volumes.
  
  Creating a volume from qcow2 image.
  Current flow is the following:
  1. Cinder creates empty volume in Ceph
  2. Cinder downloads the image
  3. Image is converted to raw
  4. Volume is deleted https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1611
  5. Cinder performs "rbd import" using unpacked raw image as source.
  
  Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
  It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
  The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.
  
  [Regression Potential]
  This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.
  
+ [Test Case]
+ This is a light-weight test to ensure the code is working as expected, using the unit test from the patch:
+ 
+ lxc launch ubuntu-daily:hirsute h1
+ lxc exec h1 /bin/bash
+ root at h1:~# sudo apt install python3-cinder
+ root at h1:~# cd /usr/lib/python3/dist-packages/
+ /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py:359: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade
+   self._legacy_facade = LegacyEngineFacade(None, _factory=self)
+ .
+ ----------------------------------------------------------------------
+ Ran 1 test in 0.701s
+ 
+ OK
+ 
+ The test will fail if the fixed code is not installed.
+ 
  [Discussion]
  This is accompanied by a unit test fix for: https://pad.lv/1913607

** Description changed:

  [Impact]
  
  OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
  Cinder has Ceph as backend for volumes.
  
  Creating a volume from qcow2 image.
  Current flow is the following:
  1. Cinder creates empty volume in Ceph
  2. Cinder downloads the image
  3. Image is converted to raw
  4. Volume is deleted https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1611
  5. Cinder performs "rbd import" using unpacked raw image as source.
  
  Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
  It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
  The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.
- 
- [Regression Potential]
- This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.
  
  [Test Case]
  This is a light-weight test to ensure the code is working as expected, using the unit test from the patch:
  
  lxc launch ubuntu-daily:hirsute h1
  lxc exec h1 /bin/bash
  root at h1:~# sudo apt install python3-cinder
  root at h1:~# cd /usr/lib/python3/dist-packages/
  /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py:359: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade
    self._legacy_facade = LegacyEngineFacade(None, _factory=self)
  .
  ----------------------------------------------------------------------
  Ran 1 test in 0.701s
  
  OK
  
  The test will fail if the fixed code is not installed.
  
+ 
+ [Regression Potential]
+ This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.
+ 
  [Discussion]
  This is accompanied by a unit test fix for: https://pad.lv/1913607

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1900775

Title:
  Cinder fails to create image-based volume if mirroring is enabled

Status in charm-ceph-rbd-mirror:
  Invalid
Status in Cinder:
  In Progress
Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive stein series:
  Triaged
Status in Ubuntu Cloud Archive train series:
  Triaged
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in cinder package in Ubuntu:
  Fix Released
Status in cinder source package in Xenial:
  Triaged
Status in cinder source package in Bionic:
  Triaged
Status in cinder source package in Focal:
  Triaged
Status in cinder source package in Groovy:
  Triaged

Bug description:
  [Impact]

  OpenStack Train, Ceph Nautilus, ceph-rbd-mirror is deployed in dual-way.
  Cinder has Ceph as backend for volumes.

  Creating a volume from qcow2 image.
  Current flow is the following:
  1. Cinder creates empty volume in Ceph
  2. Cinder downloads the image
  3. Image is converted to raw
  4. Volume is deleted https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1611
  5. Cinder performs "rbd import" using unpacked raw image as source.

  Apparently rbd-mirror daemon creates a snapshot upon the image creation in Ceph (seems for mirroring purposes) which for empty image lasts for about a second.
  It happens that step4 may be performed (very often) during the period of time when the snapshot exists, and it fails with "Cannot delete the volume with snapshots" error.
  The only way to fix this behaviour - disable mirroring of the backend pool which is not desired.

  [Test Case]
  This is a light-weight test to ensure the code is working as expected, using the unit test from the patch:

  lxc launch ubuntu-daily:hirsute h1
  lxc exec h1 /bin/bash
  root at h1:~# sudo apt install python3-cinder
  root at h1:~# cd /usr/lib/python3/dist-packages/
  /usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py:359: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade
    self._legacy_facade = LegacyEngineFacade(None, _factory=self)
  .
  ----------------------------------------------------------------------
  Ran 1 test in 0.701s

  OK

  The test will fail if the fixed code is not installed.

  
  [Regression Potential]
  This is a very minimal change that simply adds a retry when exception.VolumeIsBusy is encountered. The frequency and number of retries are configurable via rados_connection_interval and rados_connection_retries. Worst case scenario, if these are exceeded, the original error will be encountered.

  [Discussion]
  This is accompanied by a unit test fix for: https://pad.lv/1913607

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1900775/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list