[Bug 1896617] Re: [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri

Corey Bryant 1896617 at bugs.launchpad.net
Thu Sep 8 14:05:25 UTC 2022


I'm opening this bug back up for upstream nova awareness.

To summarize the issue, I'll recap comment #27 above and add some more
details as to what the issue is.

In nova/virt/libvirt/driver.py there is a chmod on a tempdir that is
made with the assumption that libvirt is evaluated by the "other users"
mode bits.

# NOTE(xqueralt): libvirt needs o+x in the tempdir
os.chmod(tmpdir, 0o701)

In the case of Ubuntu, we need to ensure the nova package remains
functional on hardened systems. A big part of the hardening results in
zeroing out "other users" mode bits in /var/lib/nova. As a result, we
added the libvirt-qemu user to the nova group as it needs access to
files/dirs in /var/lib/nova (most files/dirs in /var/lib/nova are owned
by nova:nova). The result of adding libvirt-qemu to the nova group is
that access to files/dirs by libvirt-qemu are often evaluated by it's
membership in the nova group. Thefore the 0o701 permissions of the
tempdir will deny access to libvirt-qemu.

For example:
$ sudo ls -al /var/lib/nova/instances/snapshots/tmpkajuir8o
total 204
drwx-----x 2 nova nova 4096 Sep 23 19:12 .    # <--- libvirt-qemu denied access as it is in nova group
drwxr-x--- 3 nova nova 4096 Sep 23 19:12 ..
-rw-r--r-- 1 nova nova 197248 Sep 23 19:12 0ece1fb912104f2c849ea4bd6036712c.delta

To fix this in ubuntu, I'm looking to carry the following patch:

+-                        # NOTE(xqueralt): libvirt needs o+x in the tempdir
+-                        os.chmod(tmpdir, 0o701)
++                        # NOTE(coreycb): libvirt needs g+x in the tempdir
++                        st = os.stat(tmpdir)
++                        os.chmod(tmpdir, st.st_mode | stat.S_IXGRP)

I don't know what the right answer is upstream. I don't know that a
chmod 0o711 makes sense either. If 0x710 made sense for all
users/distros we could move to that, but that's hard to assess. For now
I'll patch in ubuntu. I'm planning to do this work in LP:#1967956 to
consolidate with similar work.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1896617

Title:
  [SRU] Creation of image (or live snapshot) from the existing VM fails
  if libvirt-image-backend is configured to qcow2 starting from Ussuri

Status in OpenStack Nova Compute Charm:
  Invalid
Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Invalid
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Focal:
  Fix Released
Status in nova source package in Groovy:
  Fix Released

Bug description:
  [Impact]

  tl;dr

  1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute.
  2) openstack server image create --name <name of the new image> <instance name or uuid> fails with some unrelated error:

  $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc
  HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7

  == Details ==

  Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists
  [0] are failing with the following exception:

  49701867-bedc-4d7d-aa71-7383d877d90c
  Traceback (most recent call last):
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server
      waiters.wait_for_image_status(client, image_id, wait_until)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status
      image = show_image(image_id)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image
      resp, body = self.get("images/%s" % image_id)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get
      return self.request('GET', url, extra_headers, headers)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request
      method, url, extra_headers, headers, body, chunked)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request
      self._error_checker(resp, resp_body)
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker
      raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Image not found.'}

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image
      wait_until='ACTIVE')
    File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in create_image_from_server
      image_id=image_id)
  tempest.exceptions.SnapshotNotFoundException: Server snapshot image d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found.

  So far I was able to identify the following:

  1) https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69 invokes a "create image from server"
  2) It fails with the following error message in the nova-compute logs: https://pastebin.canonical.com/p/h6ZXdqjRRm/

  The same occurs if the "openstack server image create --wait" will be
  executed; however, according to
  https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with-
  snapshot.html the VM has to be shut down before the image creation:

  "Shut down the source VM before you take the snapshot to ensure that
  all data is flushed to disk. If necessary, list the instances to view
  the instance name. Use the openstack server stop command to shut down
  the instance:"

  This step is definitely being skipped by the test (e.g it's trying to
  perform the snapshot on top of the live VM).

  FWIW, I'm using libvirt-image-backend: qcow2 in my nova-compute
  application params; and I was able to confirm that if the above
  parameter will be changed to "libvirt-image-backend: rbd", the tests
  will pass successfully.

  Also, there is similar issue I was able to find:
  https://bugs.launchpad.net/nova/+bug/1885418 but it doesn't have any
  useful information rather then confirmation of the fact that OpenStack
  Ussuri + libvirt backend has some problem with the live snapshotting.

  [0] https://refstack.openstack.org/api/v1/guidelines/2018.02/tests?target=platform&type=required&alias=true&flag=false
  [1] tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON.test_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314]
  [2] tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestJSON.test_create_image_specify_multibyte_character_image_name[id-3b7c6fe4-dfe7-477c-9243-b06359db51e6]

  [Test Case]
  deploy/configure openstack, using juju here
  if upgrading to the fixed package, libvirt-guests will require restart: sudo systemctl restart libvirt-guests
  create openstack instance
  openstack server image create --wait <instance-uuid>
  successful if fixed; fails with permissions error if not fixed

  [Regression Potential]
  This actually reverts the nova group members to what they used to be prior to the focal version of the packages. If there is a regression in this fix it would likely result in a permissions issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1896617/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list