[Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration
Corey Bryant
corey.bryant at canonical.com
Mon Nov 5 16:57:52 UTC 2018
I've uploaded new versions of nova with the fix for this bug to the
disco, cosmic, and bionic unapproved queues.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1744079
Title:
[SRU] disk over-commit still not correctly calculated during live
migration
Status in Ubuntu Cloud Archive:
Triaged
Status in Ubuntu Cloud Archive queens series:
Triaged
Status in Ubuntu Cloud Archive rocky series:
Triaged
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
In Progress
Status in OpenStack Compute (nova) rocky series:
In Progress
Status in nova package in Ubuntu:
Triaged
Status in nova source package in Bionic:
Triaged
Status in nova source package in Cosmic:
Triaged
Status in nova source package in Disco:
Triaged
Bug description:
[Impact]
nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit.
So the migration may fail because of a "Migration pre-check error:
Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of
instance is too large(available on destination host:-3221225472 <
need:22806528)" when trying a migration to another compute that has
plenty of free space in his disk.
[Test Case]
Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live <TEST-COMPUTE-NODE> --block-migration --disk-overcommit <VM-NAME>). You will see above migration pre-check error.
This is the formula to compute disk_available_least and free_disk_gb.
disk_free_gb = disk_info_dict['free']
disk_over_committed = self._get_disk_over_committed_size_total()
available_least = disk_free_gb * units.Gi - disk_over_committed
data['disk_available_least'] = available_least / units.Gi
The following command can be used to query the value of
disk_available_least
nova hypervisor-show <ID> |grep disk
Steps to Reproduce:
1. set disk_allocation_ratio config option > 1.0
2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G
3. glance image-create --disk-format qcow2 ...
4. boot VMs based on resized image
5. we see disk_available_least becomes negative
[Regression Potential]
Minimal - we're just changing from the following line:
disk_available_gb = dst_compute_info['disk_available_least']
to the following codes:
if disk_over_commit:
disk_available_gb = dst_compute_info['free_disk_gb']
else:
disk_available_gb = dst_compute_info['disk_available_least']
When enabling overcommit, disk_available_least is possible to be
negative, so we should use free_disk_gb instead of it by backporting
the following two fixes.
https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e11110efe8879da57264fcb7bdfdf2
https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d
This is the code path for check_can_live_migrate_destination:
_migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server
-> _live_migrate -> _build_live_migrate_task ->
_call_livem_checks_on_host -> check_can_live_migrate_destination
BTW, redhat also has a same bug -
https://bugzilla.redhat.com/show_bug.cgi?id=1477706
[Original Bug Report]
Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do:
if disk_over_commit:
disk_available_gb = dst_compute_info['local_gb']
Unfortunately local_gb is *total* disk, not available disk. We
actually want free_disk_gb. Fun fact: due to the way we calculate this
for filesystems, without taking into account reserved space, this can
also be negative.
The test we're currently running is: could we fit this guest's
allocated disks on the target if the target disk was empty. This is at
least better than it was before, as we don't spuriously fail early. In
fact, we're effectively disabling a test which is disabled for
microversion >=2.25 anyway. IOW we should fix it, but it's probably
not a high priority.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list