[Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration

Brian Murray brian at ubuntu.com
Tue Apr 30 20:28:33 UTC 2019


Hello Matthew, or anyone else affected,

Accepted nova into xenial-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/nova/2:13.1.4-0ubuntu4.4 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-xenial to verification-done-xenial. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-xenial. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: nova (Ubuntu Xenial)
       Status: Triaged => Fix Committed

** Tags removed: verification-done
** Tags added: verification-needed verification-needed-xenial

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1744079

Title:
  [SRU] disk over-commit still not correctly calculated during live
  migration

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in Ubuntu Cloud Archive ocata series:
  Fix Released
Status in Ubuntu Cloud Archive pike series:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Xenial:
  Fix Committed
Status in nova source package in Bionic:
  Fix Released
Status in nova source package in Cosmic:
  Fix Released
Status in nova source package in Disco:
  Fix Released

Bug description:
  [Impact]
  nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit.

  So the migration may fail because of a "Migration pre-check error:
  Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of
  instance is too large(available on destination host:-3221225472 <
  need:22806528)" when trying a migration to another compute that has
  plenty of free space in his disk.

  [Test Case]
  Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live <TEST-COMPUTE-NODE> --block-migration --disk-overcommit <VM-NAME>). You will see above migration pre-check error.

  This is the formula to compute disk_available_least and free_disk_gb.

  disk_free_gb = disk_info_dict['free']
  disk_over_committed = self._get_disk_over_committed_size_total()
  available_least = disk_free_gb * units.Gi - disk_over_committed
  data['disk_available_least'] = available_least / units.Gi

  The following command can be used to query the value of
  disk_available_least

  nova hypervisor-show <ID> |grep disk

  Steps to Reproduce:
  1. set disk_allocation_ratio config option > 1.0 
  2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G
  3. glance image-create --disk-format qcow2 ...
  4. boot VMs based on resized image
  5. we see disk_available_least becomes negative

  [Regression Potential]
  Minimal - we're just changing from the following line:

  disk_available_gb = dst_compute_info['disk_available_least']

  to the following codes:

  if disk_over_commit:
      disk_available_gb = dst_compute_info['free_disk_gb']
  else:
      disk_available_gb = dst_compute_info['disk_available_least']

  When enabling overcommit, disk_available_least is possible to be
  negative, so we should use free_disk_gb instead of it by backporting
  the following two fixes.

  https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e11110efe8879da57264fcb7bdfdf2
  https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d

  This is the code path for check_can_live_migrate_destination:

  _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server
  -> _live_migrate -> _build_live_migrate_task ->
  _call_livem_checks_on_host -> check_can_live_migrate_destination

  BTW, redhat also has a same bug -
  https://bugzilla.redhat.com/show_bug.cgi?id=1477706

  
  [Original Bug Report]
  Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do:

          if disk_over_commit:
              disk_available_gb = dst_compute_info['local_gb']

  Unfortunately local_gb is *total* disk, not available disk. We
  actually want free_disk_gb. Fun fact: due to the way we calculate this
  for filesystems, without taking into account reserved space, this can
  also be negative.

  The test we're currently running is: could we fit this guest's
  allocated disks on the target if the target disk was empty. This is at
  least better than it was before, as we don't spuriously fail early. In
  fact, we're effectively disabling a test which is disabled for
  microversion >=2.25 anyway. IOW we should fix it, but it's probably
  not a high priority.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list