[Bug 1848497] Re: virtio-balloon change breaks migration from qemu prior to 4.0

Victor Tapia 1848497 at bugs.launchpad.net
Wed Oct 7 09:54:40 UTC 2020


Attached backported fix to bug 1847361. Fixes live migrations from
1:2.11+dfsg-1ubuntu7.32 (Queens/Rocky) and 1:3.1+dfsg-2ubuntu3.3 or
previous (Stein) to latest Stein. I also tested the migration from the
patched Stein to Train and works as expected.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1848497

Title:
  virtio-balloon change breaks migration from qemu prior to 4.0

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Triaged
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Eoan:
  Fix Released
Status in qemu source package in Focal:
  Fix Released

Bug description:
  [Impact]

   * Due to a bug in qemu in 4.0 the config size for virtio-baloon changed.
   * This breaks migration from pre 4.0 qemu because the PCI BAR size is
     affected.

   * Upstream has realized this and fixed it in 4.1, this backports the fix
     to qemu 4.0 in Ubuntu Eoan

  [Test Case]

   * Take a pre-eoan (pre qemu 4.0) guest and check that your setup can
     migrate it back and forth with a eoan/qemu-4.0 target.
     Note: (always) use a versioned machine type like pc-i44fx-disco (also 
     the default if you use disco as source).
     Then add a virt-baloon device to the guest on pre-4.0 and migrate it
     again.
     Unfixed the following error will show up:
     get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0

   * Unfixed -> Fixed qemu 4.0 migrations should work as well. While the
     other way around it could (size didn't change), but there are no
     guarantees (no logic in the target).

  [Regression Potential]

   * Messing with machine types is always dangerous, as in case of a mistake
     things get even more complex. But in this case things seemed rather
     straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the
     new attribute set and later code has logic to handle dynamic sizes.
     That way I think we are safe of machine-type regressions.
   * For the change in behavior, it changes pre 4.0 migrations, which atm
     are broken if a virt-baloon device is present. There is nothing to
     break more int hat use case, and if such a device isn't present it
     shouldn't change anything. Therefore IMHO safe again.

  [Other Info]

   * n/a

  ---

  Related but not the same as bug 1838569 which had two error signatures.
  The first being covered there and the second handled here.

  --- ---
  Quote from https://bugs.launchpad.net/cloud-archive/+bug/1838569/comments/4
  Daniel 'f0o' Preussker (dpreussker) wrote 1 hour ago:	#4
  With recent release of OpenStack Train this issue reappears...

  Upgrading from Stein to Train will require all VMs to be hard-rebooted
  to be migrated as a final step because Live Migration fails with:

  Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer
  Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
                                                            2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config
                                                            2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
                                                            2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
                                                            2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
                                                            2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
                                                            2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument

  --- ---

  Identified as:
  Dr. David Alan Gilbert (dgilbert-h) wrote 1 hour ago:	#5
  Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug.

  I'd bet on this being the one fixed by
  2bbadb08ce272d65e1f78621002008b07d1e0f03

  --- ---

  And that is a fix that only is in qemu 4.1 and would be an open bug
  for Ubuntu and Cloud Archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1848497/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list