[Bug 1969643] Re: RBD: Unable to delete a volume which has snapshot/volume children

Mauricio Faria de Oliveira 1969643 at bugs.launchpad.net
Wed Jun 12 19:16:38 UTC 2024


Chengen,

Now the changes review notes:

This patch is concerning for a couple of reasons, if I understand correctly:
1) behavior changes that are potentially impactful to the storage backend;
2) new requirements for the deployment configuration that might seriously
   impact the deployment.

In order to proceed, could you please discuss/confirm these with Ed (hopem),
Dan (hillpd), and if they both agree, get an ACK from James Page (james-page),
and record their feedback here?

I'll mark the bug task as Incomplete for now.

Thanks!
Mauricio

Details:

1) The behavior change (flatten dependent images on volume removal) seems
to introduce O(N) load in the cluster in storage size/consumption _and_
CPU load, as each dependent image will be flattened thus the data from
the volume being removed will be copied into _each one_, which multiplies
the storage consumption with a single delete operation.

Without the patch, this would not happen, as the operation would fail.

@ https://docs.ceph.com/en/latest/rbd/rbd-snapshot/#flattening-a-cloned-
image

When you remove the reference to the parent snapshot from the clone,
you effectively “flatten” the clone by copying the data stored 
in the snapshot to the clone.
The time it takes to flatten a clone increases with the size of the snapshot.

@ https://docs.ceph.com/en/latest/man/8/rbd/

If the image is a clone, copy all shared blocks from the parent snapshot
[...]

@ patch

++    cfg.IntOpt('rbd_concurrent_flatten_operations', default=3, min=0,
++               help='Number of flatten operations that will run '
++                    'concurrently on this volume service.')


2) The patch doc/note says that the deployments now must enable deletion options.

It seems potentially problematic if this is missed, specially with auto upgrades,
as, if such option is not enabled, this could eventually consume a lot of storage,
and not be removed until that is noticed (when it might be too late) ?

++    Cinder now uses the RBD trash functionality to handle some volume deletions.
++    Therefore, deployments must either a) enable scheduled RBD trash purging on
++    the RBD backend or b) enable the Cinder RBD driver's enable_deferred_deletion
++    option to have Cinder purge the RBD trash.


** Changed in: cinder (Ubuntu Jammy)
       Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to cinder in Ubuntu.
https://bugs.launchpad.net/bugs/1969643

Title:
  RBD: Unable to delete a volume which has snapshot/volume children

Status in Cinder:
  Fix Released
Status in cinder package in Ubuntu:
  Fix Released
Status in cinder source package in Jammy:
  Incomplete
Status in cinder source package in Mantic:
  Won't Fix
Status in cinder source package in Noble:
  Fix Released
Status in cinder source package in Oracular:
  Fix Released

Bug description:
  [Impact]
  Deleting a volume will fail if it has snapshot or volume children, resulting in an ImageBusy error.

  [Fix]
  Upstream has a patch that uses RBD flatten operations to break dependencies between volumes and snapshots, reducing failures when using RBD volume clones and snapshots.

  commit 1a675c9aa178c6d9c6ed10fd98f086c46d350d3f
  Author:     Eric Harney <eharney at redhat.com>
  CommitDate: Fri Dec 1 10:17:05 2023 -0500

      RBD: Flattening of child volumes during deletion

  [Test Plan]
  1. Prepare an OpenStack environment with cinder-ceph
  2. Create a volume named "vol"
  openstack volume create --image jammy --size 10 vol
  3. Create a snapshot of the volume "vol"
  openstack volume snapshot create --volume vol vol-snap
  4. Create a volume named "vol-copy" from the snapshot
  openstack volume create --snapshot vol-snap vol-copy
  5. Delete the snapshot and then delete the volume "vol"
  openstack volume snapshot delete vol-snap
  # ^ This would fail with ImageBusy previously (see patch's "For example")
  openstack volume delete vol
  # ^ This would possibly fail previously (see patch's step "4.")
  6. Confirm that the volume "vol" is successfully deleted
  openstack volume list

  [Where problems could occur]
  The patch primarily modifies the workflow for volume deletion when using RBD as the backend and adds a retry mechanism for unprotecting snapshots during snapshot deletion.
  If the patch has any undiscovered issues, it will only affect volume deletion. Other functionalities or non-RBD backends will not be impacted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1969643/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list