[Bug 1698786] Re: cinder-volume fails on start when rbd pool contains partially deleted images

Jorge Niedbalski 1698786 at bugs.launchpad.net
Wed Feb 12 22:55:37 UTC 2020


** Patch added: "lp1698786-ocata.debdiff"
   https://bugs.launchpad.net/cinder/+bug/1698786/+attachment/5327772/+files/lp1698786-ocata.debdiff

** Summary changed:

- cinder-volume fails on start when rbd pool contains partially deleted images
+ [SRU] cinder-volume fails on start when rbd pool contains partially deleted images

** Description changed:

+ [Impact]
+ 
+  * Cinder-volume service gets marked as down when rbd pool contains
+ partially deleted images
+ 
+ [Test Case]
+ 
+ 1) Use this bundle (pastebin: http://paste.ubuntu.com/p/XxJPcs7YX9/)
+ 2) Force a volume deletion
+ 
+ root at juju-30736a-1698786-cinder-0:/home/ubuntu# rbd -p cinder-ceph info volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
+ rbd image 'volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04':
+  size 10240 MB in 2560 objects
+  order 22 (4096 kB objects)
+  block_name_prefix: rbd_data.10f96b8b4567
+  format: 2
+  features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
+  flags:
+ root at juju-30736a-1698786-cinder-0:/home/ubuntu# rados -p cinder-ceph rm rbd_id.volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
+ root at juju-30736a-1698786-cinder-0:/home/ubuntu# rbd -p cinder-ceph info volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
+ rbd: error opening image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04: (2) No such file or directory
+ 
+ 3) Wait for a few seconds. The following exception gets raised and the cinder-volume
+ service gets reported as down.
+ 
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd [req-21a9da64-4d10-4c75-b5fd-4bb3328d6057 - - - - -] error opening rbd image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 131, in __init__
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd read_only=read_only)
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-eXkpH5/ceph-10.2.11/src/build/rbd.c:9939)
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd ImageNotFound: error opening image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04 at snapshot None
+ 2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd
+ 
+ [Regression Potential]
+ 
+  * Minor backport, added a new exception to the try block.
+ 
+ [Other Info]
+ 
  If `rbd_remove` image operation fails by some reason [*] the image being
  deleted may be left in a state, when its data and part of metadata
  (rbd_header object) is deleted, but it still has an entry in
  rbd_directory object. As a result the image is seen in `rbd_list` output
  but `open` fails.
  
  To calculate rbd pool capacity the cinder-volume scans the pool images
  using `rbd_list` and then tries to get images size by opening every
  image. If there is such a partially removed image this causes cinder-
  volume failure like below:
  
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd [req-caa6f7fa-23c2-4972-b48a-264bcec6dbb1 - - - - -] error opening rbd image volume-099313f9-2f6f-4e86-9b46-8da16b138090
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 119, in __init__
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd     read_only=read_only)
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd   File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-25Z60r/ceph-10.2.7/src/build/rbd.c:9939)
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd ImageNotFound: error opening image volume-099313f9-2f6f-4e86-9b46-8da16b138090 at snapshot None
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service [req-caa6f7fa-23c2-4972-b48a-264bcec6dbb1 - - - - -] Error starting thread.
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service Traceback (most recent call last):
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 722, in run_service
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     service.start()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 241, in start
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     service_id=Service.service_id)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 442, in init_host
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self.driver.init_capabilities()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 719, in init_capabilities
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     stats = self.get_volume_stats(True)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 432, in get_volume_stats
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self._update_volume_stats()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 418, in _update_volume_stats
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self._get_usage_info()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 365, in _get_usage_info
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     with RBDVolumeProxy(self, t, read_only=True) as v:
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 119, in __init__
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     read_only=read_only)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-25Z60r/ceph-10.2.7/src/build/rbd.c:9939)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service ImageNotFound: error opening image volume-099313f9-2f6f-4e86-9b46-8da16b138090 at snapshot None
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service
  
  [*] Situations when `rbd_remove` fails leaving partially removed image
  linked in rbd_directory is not avoidable in general case. The operation
  involves scanning and removing many objects and can't be atomic. It may
  be interrupted by many different reasons: user intervention, client
  crash, network or Ceph cluster error. For this reason removal from
  rbd_directory is done as the last operation so users could still see
  such images and could complete the removal by rerunning `rbd remove`.
  
  Note, if `rbd_remove` fails for some reason it should return an error,
  so this can be detected.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1698786

Title:
  [SRU] cinder-volume fails on start when rbd pool contains partially
  deleted images

Status in Cinder:
  Fix Released
Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive ocata series:
  Triaged

Bug description:
  [Impact]

   * Cinder-volume service gets marked as down when rbd pool contains
  partially deleted images

  [Test Case]

  1) Use this bundle (pastebin: http://paste.ubuntu.com/p/XxJPcs7YX9/)
  2) Force a volume deletion

  root at juju-30736a-1698786-cinder-0:/home/ubuntu# rbd -p cinder-ceph info volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
  rbd image 'volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04':
   size 10240 MB in 2560 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.10f96b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
   flags:
  root at juju-30736a-1698786-cinder-0:/home/ubuntu# rados -p cinder-ceph rm rbd_id.volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
  root at juju-30736a-1698786-cinder-0:/home/ubuntu# rbd -p cinder-ceph info volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
  rbd: error opening image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04: (2) No such file or directory

  3) Wait for a few seconds. The following exception gets raised and the cinder-volume
  service gets reported as down.

  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd [req-21a9da64-4d10-4c75-b5fd-4bb3328d6057 - - - - -] error opening rbd image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 131, in __init__
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd read_only=read_only)
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-eXkpH5/ceph-10.2.11/src/build/rbd.c:9939)
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd ImageNotFound: error opening image volume-ad50fecd-bc7e-47a7-81d8-9a48ff996a04 at snapshot None
  2020-02-10 20:42:47.491 14050 ERROR cinder.volume.drivers.rbd

  [Regression Potential]

   * Minor backport, added a new exception to the try block.

  [Other Info]

  If `rbd_remove` image operation fails by some reason [*] the image
  being deleted may be left in a state, when its data and part of
  metadata (rbd_header object) is deleted, but it still has an entry in
  rbd_directory object. As a result the image is seen in `rbd_list`
  output but `open` fails.

  To calculate rbd pool capacity the cinder-volume scans the pool images
  using `rbd_list` and then tries to get images size by opening every
  image. If there is such a partially removed image this causes cinder-
  volume failure like below:

  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd [req-caa6f7fa-23c2-4972-b48a-264bcec6dbb1 - - - - -] error opening rbd image volume-099313f9-2f6f-4e86-9b46-8da16b138090
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd Traceback (most recent call last):
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 119, in __init__
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd     read_only=read_only)
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd   File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-25Z60r/ceph-10.2.7/src/build/rbd.c:9939)
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd ImageNotFound: error opening image volume-099313f9-2f6f-4e86-9b46-8da16b138090 at snapshot None
  2017-06-15 17:47:58.045 26352 ERROR cinder.volume.drivers.rbd
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service [req-caa6f7fa-23c2-4972-b48a-264bcec6dbb1 - - - - -] Error starting thread.
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service Traceback (most recent call last):
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 722, in run_service
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     service.start()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 241, in start
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     service_id=Service.service_id)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 442, in init_host
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self.driver.init_capabilities()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 719, in init_capabilities
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     stats = self.get_volume_stats(True)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 432, in get_volume_stats
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self._update_volume_stats()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 418, in _update_volume_stats
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     self._get_usage_info()
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 365, in _get_usage_info
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     with RBDVolumeProxy(self, t, read_only=True) as v:
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/rbd.py", line 119, in __init__
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service     read_only=read_only)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service   File "rbd.pyx", line 1061, in rbd.Image.__init__ (/build/ceph-25Z60r/ceph-10.2.7/src/build/rbd.c:9939)
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service ImageNotFound: error opening image volume-099313f9-2f6f-4e86-9b46-8da16b138090 at snapshot None
  2017-06-15 17:47:58.050 26352 ERROR oslo_service.service

  [*] Situations when `rbd_remove` fails leaving partially removed image
  linked in rbd_directory is not avoidable in general case. The
  operation involves scanning and removing many objects and can't be
  atomic. It may be interrupted by many different reasons: user
  intervention, client crash, network or Ceph cluster error. For this
  reason removal from rbd_directory is done as the last operation so
  users could still see such images and could complete the removal by
  rerunning `rbd remove`.

  Note, if `rbd_remove` fails for some reason it should return an error,
  so this can be detected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1698786/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list