[Bug 2019190] [NEW] [SRU][RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)

Wed Jan 24 16:18:43 UTC 2024

You have been subscribed to a public bug by Ubuntu Foundations Team Bug Bot (crichton):

[Impact]

See bug description for full details but short summary is that a patch
landed in Wallaby release that introduced a regression whereby retyping
an in-use volume leaves the attached volume in an inconsistent state
with potential for data corruption. Result is that a vm does not receive
updated connection_info from Cinder and will keep pointing to the old
volume, even after reboot.

[Test Plan]

* Deploy Openstack with two Cinder RBD storage backends (different pools)
* Create two volume types
* Boot a vm from volume: openstack server create --wait --image jammy --flavor m1.small --key-name testkey --nic net-id=8c74f1ef-9231-46f4-a492-eccdb7943ecd testvm --boot-from-volume 10
* Retype the volume to type B: openstack volume set --type typeB --retype-policy on-demand <volume>
* Go to compute host running vm and check that the vm is now copying data to the new location e.g.

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder-ceph'>
        <secret type='ceph' uuid='01b65a79-22a3-4672-80e7-5a47b0e5581a'/>
      </auth>
      <source protocol='rbd' name='cinder-ceph/volume-b68be47d-f526-4f98-a77b-a903bf8b6c65' index='1'>
        <host name='10.5.2.236' port='6789'/>
      </source>
      <mirror type='network' job='copy'>
        <format type='raw'/>
        <source protocol='rbd' name='cinder-ceph-alt/volume-c6b55b4c-a540-4c39-ad1f-626c964ae3e1' index='2'>
          <host name='10.5.2.236' port='6789'/>
          <auth username='cinder-ceph-alt'>
            <secret type='ceph' uuid='e089e27e-3a2f-49d6-b6d9-770f52177eb1'/>
          </auth>
        </source>
        <backingStore/>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <serial>b68be47d-f526-4f98-a77b-a903bf8b6c65</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

which will eventually settle and change to:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder-ceph-alt'>
        <secret type='ceph' uuid='e089e27e-3a2f-49d6-b6d9-770f52177eb1'/>
      </auth>
      <source protocol='rbd' name='cinder-ceph-alt/volume-c6b55b4c-a540-4c39-ad1f-626c964ae3e1' index='2'>
        <host name='10.5.2.236' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <serial>b68be47d-f526-4f98-a77b-a903bf8b6c65</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

* And lastly a reboot of the vm should be successfull.

[Regression Potential]
Given that the current state is potential data corruption and the patch will fix this by successfully refreshing connection info I do not see a regression potential. It is in fact fixing a regression.

-------------------------------------------------------------------------

While trying out the volume retype feature in cinder, we noticed that after an instance is
rebooted it will not come back online and be stuck in an error state or if it comes back
online, its filesystem is corrupted.

## Observations

Say there are the two volume types `fast` (stored in ceph pool `volumes`) and `slow`
(stored in ceph pool `volumes.hdd`). Before the retyping we can see that the volume
for example is present in the `volumes.hdd` pool and has a watcher accessing the
volume.

```sh
[ceph: root at mon0 /]# rbd ls volumes.hdd
volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9

[ceph: root at mon0 /]# rbd status volumes.hdd/volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9
Watchers:
        watcher=[2001:XX:XX:XX::10ad]:0/3914407456 client.365192 cookie=140370268803456
```

Starting the retyping process using the migration policy `on-demand` for that volume either
via the horizon dashboard or the CLI causes the volume to be correctly transferred to the
`volumes` pool within the ceph cluster. However, the watcher does not get transferred, so
nobody is accessing the volume after it has been transferred.

```sh
[ceph: root at mon0 /]# rbd ls volumes
volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9

[ceph: root at mon0 /]# rbd status volumes/volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9
Watchers: none
```

Taking a look at the libvirt XML of the instance in question, one can see that the `rbd`
volume path does not change after the retyping is completed. Therefore, if the instance is
restarted nova will not be able to find its volume preventing an instance start.

#### Pre retype

```xml
[...]
<source protocol='rbd' name='volumes.hdd/volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9' index='1'>
    <host name='2001:XX:XX:XXX::a088' port='6789'/>
    <host name='2001:XX:XX:XXX::3af1' port='6789'/>
    <host name='2001:XX:XX:XXX::ce6f' port='6789'/>
</source>
[...]
```

#### Post retype (no change)

```xml
[...]
<source protocol='rbd' name='volumes.hdd/volume-81cfbafc-4fbb-41b0-abcb-8ec7359d0bf9' index='1'>
    <host name='2001:XX:XX:XXX::a088' port='6789'/>
    <host name='2001:XX:XX:XXX::3af1' port='6789'/>
    <host name='2001:XX:XX:XXX::ce6f' port='6789'/>
</source>
[...]
```

### Possible cause

While looking through the code that is responsible for the volume retype we found a function
`swap_volume` volume which by our understanding should be responsible for fixing the association
above. As we understand cinder should use an internal API path to let nova perform this action.
This doesn't seem to happen.

(`_swap_volume`:
https://github.com/openstack/nova/blob/stable/wallaby/nova/compute/manager.py#L7218)

## Further observations

If one tries to regenerate the libvirt XML by e.g. live migrating the instance and rebooting the
instance after, the filesystem gets corrupted.

## Environmental Information and possibly related reports

We are running the latest version of TripleO Wallaby using the hardened (whole disk)
overcloud image for the nodes.

Cinder Volume Version: `openstack-
cinder-18.2.2-0.20230219112414.f9941d2.el8.noarch`

### Possibly related

- https://bugzilla.redhat.com/show_bug.cgi?id=1293440

(might want to paste the above to a markdown file for better
readability)

** Affects: cinder
     Importance: Critical
     Assignee: Eric Harney (eharney)
         Status: New

** Affects: cinder/wallaby
     Importance: Critical
         Status: New

** Affects: cloud-archive
     Importance: Undecided
         Status: New

** Affects: cloud-archive/antelope
     Importance: Undecided
         Status: New

** Affects: cloud-archive/bobcat
     Importance: Undecided
         Status: New

** Affects: cloud-archive/caracal
     Importance: Undecided
         Status: New

** Affects: cloud-archive/yoga
     Importance: Undecided
         Status: New

** Affects: cloud-archive/zed
     Importance: Undecided
         Status: New

** Affects: nova
     Importance: Undecided
         Status: Invalid

** Affects: cinder (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: cinder (Ubuntu Jammy)
     Importance: Undecided
         Status: New

** Affects: cinder (Ubuntu Lunar)
     Importance: Undecided
         Status: New

** Affects: cinder (Ubuntu Mantic)
     Importance: Undecided
         Status: New

** Affects: cinder (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Tags: drivers live-migration nova patch rbd retype
-- 
[SRU][RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)
https://bugs.launchpad.net/bugs/2019190
You received this bug notification because you are a member of Ubuntu Sponsors, which is subscribed to the bug report.