[Bug 1848497] Re: virtio-balloon change breaks migration from qemu prior to 4.0
Chris MacNaughton
1848497 at bugs.launchpad.net
Tue Jun 9 12:03:44 UTC 2020
** Changed in: cloud-archive
Status: New => Fix Released
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1848497
Title:
virtio-balloon change breaks migration from qemu prior to 4.0
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Eoan:
Fix Released
Status in qemu source package in Focal:
Fix Released
Bug description:
[Impact]
* Due to a bug in qemu in 4.0 the config size for virtio-baloon changed.
* This breaks migration from pre 4.0 qemu because the PCI BAR size is
affected.
* Upstream has realized this and fixed it in 4.1, this backports the fix
to qemu 4.0 in Ubuntu Eoan
[Test Case]
* Take a pre-eoan (pre qemu 4.0) guest and check that your setup can
migrate it back and forth with a eoan/qemu-4.0 target.
Note: (always) use a versioned machine type like pc-i44fx-disco (also
the default if you use disco as source).
Then add a virt-baloon device to the guest on pre-4.0 and migrate it
again.
Unfixed the following error will show up:
get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
* Unfixed -> Fixed qemu 4.0 migrations should work as well. While the
other way around it could (size didn't change), but there are no
guarantees (no logic in the target).
[Regression Potential]
* Messing with machine types is always dangerous, as in case of a mistake
things get even more complex. But in this case things seemed rather
straight forward. Pre 4.0 code all behaves the same, only 4.0 gets the
new attribute set and later code has logic to handle dynamic sizes.
That way I think we are safe of machine-type regressions.
* For the change in behavior, it changes pre 4.0 migrations, which atm
are broken if a virt-baloon device is present. There is nothing to
break more int hat use case, and if such a device isn't present it
shouldn't change anything. Therefore IMHO safe again.
[Other Info]
* n/a
---
Related but not the same as bug 1838569 which had two error signatures.
The first being covered there and the second handled here.
--- ---
Quote from https://bugs.launchpad.net/cloud-archive/+bug/1838569/comments/4
Daniel 'f0o' Preussker (dpreussker) wrote 1 hour ago: #4
With recent release of OpenStack Train this issue reappears...
Upgrading from Stein to Train will require all VMs to be hard-rebooted
to be migrated as a final step because Live Migration fails with:
Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer
Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config
2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon'
2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable
2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument
--- ---
Identified as:
Dr. David Alan Gilbert (dgilbert-h) wrote 1 hour ago: #5
Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug.
I'd bet on this being the one fixed by
2bbadb08ce272d65e1f78621002008b07d1e0f03
--- ---
And that is a fix that only is in qemu 4.1 and would be an open bug
for Ubuntu and Cloud Archive
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1848497/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list