[Bug 1774249] Re: update_available_resource will raise DiskNotFound after resize but before confirm

Alin-Gabriel Serdean 1774249 at bugs.launchpad.net
Tue Oct 19 10:22:59 UTC 2021


Hello Robie,

I have tested the proposed package using the test case.

We need to keep in mind that this patch does not allow the exception 
to be raised, but we still the error as a WARNING.

I am adding the packages used for testing together with the logs.

ubuntu at test:~/home$ juju run -a nova-compute -- "apt list nova*" | grep installed
nova-api-metadata/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-common/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed,automatic]
nova-compute/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-compute-kvm/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-compute-libvirt/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed,automatic]

ubuntu at test:~/home$ juju run -a nova-compute -- grep "DiskNotFound" /var/log/nova/nova-compute.log
2021-10-19 09:22:11.422 19060 WARNING nova.virt.libvirt.driver [req-60b887da-6da1-4463-b754-6d389d7f5df3 - - - - -] Periodic task is updating the host stats, it is trying to get disk info for instance-00000001, but the backing disk storage was removed by a concurrent operation such as resize. Error: No disk at /var/lib/nova/instances/5f5d2c95-afb0-4e8c-9fd1-fc3f29d3a5f8/disk: DiskNotFound: No disk at /var/lib/nova/instances/5f5d2c95-afb0-4e8c-9fd1-fc3f29d3a5f8/disk


** Tags removed: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1774249

Title:
  update_available_resource will raise DiskNotFound after resize but
  before confirm

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in nova package in Ubuntu:
  Invalid
Status in nova source package in Bionic:
  Fix Committed

Bug description:
  Original reported in RH Bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=1584315

  Tested on OSP12 (Pike), but appears to be still present on master.
  Should only occur if nova compute is configured to use local file
  instance storage.

  Create instance A on compute X

  Resize instance A to compute Y
    Domain is powered off
    /var/lib/nova/instances/<uuid A> renamed to <uuid A>_resize on X
    Domain is *not* undefined

  On compute X:
    update_available_resource runs as a periodic task
    First action is to update self
    rt calls driver.get_available_resource()
    ...calls _get_disk_over_committed_size_total
    ...iterates over all defined domains, including the ones whose disks we renamed
    ...fails because a referenced disk no longer exists

  Results in errors in nova-compute.log:

      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last):
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     resources = self.driver.get_available_resource(nodename)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     disk_over_committed = self._get_disk_over_committed_size_total()
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     config, block_device_info)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     dk_size = disk_api.get_allocated_disk_size(path)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     return images.qemu_img_info(path).disk_size
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     raise exception.DiskNotFound(location=path)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

  And resource tracker is no longer updated. We can find lots of these
  in the gate.

  Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly
  mitigates this, but doesn't because task_state is not set while the
  instance is awaiting confirm.

  ================================================================================= 
  [Impact] 

  See above

  
  [Test Plan]

  Deploy Openstack Queens with one compute node.

  Create a VM instance. Eg:
  openstack server create --wait --image $image_name --flavor $flavor --key-name testkey --nic net-id=${net_id} test-instance-1234

  Get the details for that instance and copy the instance_name. Eg:
  openstack server show test-instance-1234 -c OS-EXT-SRV-ATTR:instance_name -f value

  Get the disk location used based on the instance name we retrieved before. Eg:
  disk_location=`juju run -a nova-compute -- virsh domblklist $var_name | grep nova | awk -v N=2 '{print $N}'`

  Move that file in a different location. Eg:
  juju run -a nova-compute -- mv $disk_location "$disk_location"_backup

  Check the nova compute logs on the compute node for a warning. Eg:
  juju run -a nova-compute -- grep "DiskNotFound" /var/log/nova/nova-compute.log

  The output should look like the following:
  ```
  2021-09-22 11:07:46.009 26176 WARNING nova.virt.libvirt.driver [req-6e8eb87e-4024-4908-9b7f-0648ecd03eaf - - - - -] Periodic task is updating the host stats, it is trying to get disk info for instance-00000001, but the backing disk storage was removed by a concurrent operation such as resize. Error: No disk at /var/lib/nova/instances/3bd9578f-e7d7-48bc-bdef-d2d4cb25ea29/disk: DiskNotFound: No disk at /var/lib/nova/instances/3bd9578f-e7d7-48bc-bdef-d2d4cb25ea29/disk
  ```

  [Where problems could occur]

  Users which were relying on an error could be affected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1774249/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list