[Bug 1640676] Re: libvirt 1.2.12 live-migration corrupts some instances

Wed Nov 16 14:46:01 UTC 2016

Thanks zhhuabj for already backporting to a debdiff.
I've seen that your debdiff is for the cloud archive kilo version that you run on.

I checked if the change would apply to trusty (without UCA) as well, but it has a huge amount of delta where manual adaption of the patch is needed.
That combined with the fact that I can't reproduce the issue on base trusty so far makes me not consider it for trusty atm.

If your test environment has any chance to recreate the same issue on base trusty could you give it a try to verify if that needs a similar fix as well? With some luck the older version isn't affected.
Also if you happen to find some time to even do the backport to trusty I'm willing to do a bunch of extra tests to it on my side.

All that should not stop you, please go forward on getting it approved
and sponsored into cloud archive given some more testing as the changes
are rather huge.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1640676

Title:
  libvirt 1.2.12 live-migration corrupts some instances

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive kilo series:
  Triaged
Status in libvirt package in Ubuntu:
  Fix Released
Status in libvirt source package in Trusty:
  Triaged

Bug description:
  [Impact]

  While memory load is high, libvirt 1.2.12 (kilo) live-migration
  corrupts some instances

  [Test Case]

  We can replicate the corruption pretty much at will. The sequence of
  events to trigger it is:

  Create an instance using a cloud image
  Start a job running with the following command: "dd if=/dev/urandom of=/var/tmp/mjb.1 bs=4M count=1000"
  Live migrate the instance using a command like: "nova live-migration --block-migrate <server-id> <target-hypervisor>"
  Once the migration has finished, stop the dd job on the instance
  do a "Hard reboot" of the instance (eg: for openstack, nova reboot --hard $INSTANCE)
  When the instance boots, file system corruption will be observed and it won't boot correctly

  [Regression Potential]

  [Other Info]

  Both libvirt 1.2.16 (liberty) and libvirt 1.2.13 have already fixed
  this problem. So this problem only happens on kilo.

  Backported from upstream patches, before the commit 80c5f10e libvirt
  just polls the events we are interested which can lead to drive mirror
  can not be cancelled, then the destination is not in a consistent
  state. in this case it is not safe to continue with the migration. so
  the commit 80c5f10e introduces listening queue events instead of
  polling to fix the problem.

  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=80c5f10e865cda0302519492f197cb020bd14a07
  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=76c61cdca20c106960af033e5d0f5da70177af0f
  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed
  http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=c88b323bf5d5a070c074fda7adc11085f14415ce

  BTW, we have completed 20 to 30 live migrations with I/O running and
  have had no problems, and also tested that other functions continue to
  work as expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1640676/+subscriptions