[Bug 1131284] [NEW] Folsom erroneously destroys paused VMs

Andres Lagar-Cavilla andreslc at gridcentric.ca
Thu Feb 21 16:51:44 UTC 2013


Public bug reported:

Requesting to add upstream stable commit:
https://github.com/openstack/nova/commit/7ace55fcf9e1b7fea074f6c0331b6feafbbc4178

reviewed here:
https://review.openstack.org/#/c/20337/

and which addresses this upstream bug:
https://bugs.launchpad.net/nova/+bug/1097806

(updated description of bug follows)

Libvirt-managed qemu/KVM VMs can be paused outside of nova compute's
workflow through a variety of means.

* By issuing virsh suspend
* By issuing virsh qemu-monitor-command '{"execute" : "stop"}'
* By causing qemu to emit a STOP event, for example when attaching a GDB debugger and single-stepping
* By connecting through an additional qemu monitor and issuing any commands that may cause qemu to emit a STOP event.

Starting in Folsom (specifically
https://github.com/openstack/nova/commit/129b87e17d3333aeaa9e855a70dea51e6581ea63#L6R2502
i.e. commit 129b87e diff line 2502) nova compute will destroy a VM if
libvirt reports it as paused and this doesn't fit nova compute's
recorded state for the VM.

While the original rationale is to destroy VMs that are paused by IO
errors or KVM emulation errors, which would also cause qemu to emit STOP
events.

The problem is that this will also destroy VMs that are paused through a
variety of valid reasons as outlined above.

The problem is exacerbated by a Libvirt bug
(https://bugzilla.redhat.com/show_bug.cgi?id=892791) which latches the
state of a VM to paused even though the VM is running. The fix is
already committed upstream
(http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f266668a39fd655574fe34f1265),
as well as being integrated into Raring and triaged for backport into
Precise: https://bugs.launchpad.net/bugs/1097824.

Even with libvirt's bug fixed, there are still points in time at which
nova-compute will check a VMs state, find it paused for a valid reason,
and decide to erroneously destroy it.

The fix is to either remove this behavior, or to further query libvirt
for the paused reason, which will show conclusively whether the VM is
effectively crashed, or just paused.

** Affects: nova (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to nova in Ubuntu.
https://bugs.launchpad.net/bugs/1131284

Title:
  Folsom erroneously destroys paused VMs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1131284/+subscriptions



More information about the Ubuntu-server-bugs mailing list