[Bug 1640676] Re: libvirt 1.2.12 live-migration corrupts some instances
Hua Zhang
joshua.zhang at canonical.com
Thu Nov 17 09:29:15 UTC 2016
Hi Christian, i have managed to reproduce the problem using libvirt from
the UCA Kilo and it does not exist in the UCA Liberty. I have not tried
earlier versions of libvirt (Juno UCA is EOL and Vivid is also EOL) but
I think that given that the problem exists in Kilo it needs to be fixed
and as you say the diff for trusty-updates libvirt is far too large to
consider. This is a difficult problem to reproduce, perhaps you can
share what steps you are taking to reproduce the problem?
** Summary changed:
- libvirt 1.2.12 live-migration corrupts some instances
+ [SRU] libvirt 1.2.12 live-migration corrupts some instances
** Changed in: libvirt (Ubuntu)
Assignee: Hua Zhang (zhhuabj) => (unassigned)
** Changed in: cloud-archive/kilo
Assignee: (unassigned) => Hua Zhang (zhhuabj)
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1640676
Title:
[SRU] libvirt 1.2.12 live-migration corrupts some instances
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive kilo series:
Triaged
Status in libvirt package in Ubuntu:
Fix Released
Status in libvirt source package in Trusty:
Triaged
Bug description:
[Impact]
While memory load is high, libvirt 1.2.12 (kilo) live-migration
corrupts some instances
[Test Case]
We can replicate the corruption pretty much at will. The sequence of
events to trigger it is:
Create an instance using a cloud image
Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000"
Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>"
Once the migration has finished, stop the dd job on the instance
do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE)
When the instance boots, file system corruption will be observed and it won't boot correctly
[Regression Potential]
[Other Info]
Both libvirt 1.2.16 (liberty) and libvirt 1.2.13 have already fixed
this problem. So this problem only happens on kilo.
Backported from upstream patches, before the commit 80c5f10e libvirt
just polls the events we are interested which can lead to drive mirror
can not be cancelled, then the destination is not in a consistent
state. in this case it is not safe to continue with the migration. so
the commit 80c5f10e introduces listening queue events instead of
polling to fix the problem.
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce
BTW, we have completed 20 to 30 live migrations with I/O running and
have had no problems, and also tested that other functions continue to
work as expected.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1640676/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list