[Bug 1398718] [NEW] Live migration locks up Linux 3.2-based guests
Matt Mullins
mokomull at gmail.com
Wed Dec 3 06:50:25 UTC 2014
Public bug reported:
In the thread at
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/127042/focus=129294,
three commits were identified to fix live migration for qemu 2.0 (at
least), which I am using on trusty. I would like to get these pulled-in
by the package maintainer.
I have cherry-picked those three commits (with some considerable fix-up
for the first , which may or may not be correct; the others apply
cleanly) and built packages locally. Installing that on the migration-
receiver seems to fix my guest lockups after live-migrating. I can
attach the patches I'm using if someone is able to review my fix-ups to
the first one.
My original problem description was:
Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows
it down a /whole lot/ ...), live migration started killing my Ubuntu precise
(kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop. Once
(and only once) I've observed the guest eventually becoming responsive again,
with a clock nearly 600 years in the future and a negative uptime.
I haven't been able to dig up any previous threads about this problem, so my
gut instinct is that I've configured something wonky. Any pointers toward
/what/ I may have done wrong are appreciated.
It only seems to happen if I've given the guests Nehalem-class CPU features.
My longest-running VMs, from before I started passing-through the CPU
capabilities into the guest, seem to migrate without issue.
It also seems to happen reliably when the guest has been running for a while;
it's easily reproducible with guests that have been up ~1 day, and I've
reproduced it in VMs with an uptime of ~20 hours. I haven't yet figured out a
lower-bound, which makes the testing cycle a little longer for me.
The guests that I reliably reproduce this on are Ubuntu 12.04 guests running
the current 3.2 kernel that Canonical distributes. Recent Fedora kernels
(3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this
case exhaustively, and I haven't written down very good notes for the tests I
have done with Fedora.
The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04
and the associated 3.13 kernel. I had previously reproduced this with 12.04
running a raring-backport 3.11 kernel as well, but I (seemingly erroneously)
assumed it may have been a qemu userspace discrepancy.
** Affects: qemu (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1398718
Title:
Live migration locks up Linux 3.2-based guests
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1398718/+subscriptions
More information about the Ubuntu-server-bugs
mailing list