[Bug 1818880] Please test proposed package
Łukasz Zemczak
1818880 at bugs.launchpad.net
Mon Mar 11 18:56:26 UTC 2019
Hello Heitor, or anyone else affected,
Accepted qemu into xenial-proposed. The package will build now and be
available at https://launchpad.net/ubuntu/+source/qemu/1:2.5+dfsg-
5ubuntu10.35 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-xenial to verification-done-xenial. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-xenial. In either case, without details of
your testing we will not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1818880
Title:
Deadlock when detaching network interface
Status in Ubuntu Cloud Archive:
Confirmed
Status in qemu package in Ubuntu:
Fix Released
Status in qemu source package in Xenial:
Fix Committed
Status in qemu source package in Bionic:
Fix Released
Status in qemu source package in Cosmic:
Fix Released
Status in qemu source package in Disco:
Fix Released
Bug description:
[Impact]
Qemu guests hang indefinitely
[Description]
When running a Qemu guest with VirtIO network interfaces, detaching an interface that's currently being used can result in a deadlock. The guest instance will hang and become unresponsive to commands, and the only option is to kill -9 the instance.
The reason for this is a dealock between a monitor and an RCU thread, which will fight over the BQL (qemu_global_mutex) and the critical RCU section locks. The monitor thread will acquire the BQL for detaching the network interface, and fire up a helper thread to deal with detaching the network adapter. That new thread needs to wait on the RCU thread to complete the deletion, but the RCU thread wants the BQL to commit its transactions.
This bug is already fixed upstream (73c6e4013b4c rcu: completely disable pthread_atfork callbacks as soon as possible) and included for other series (see below), so we don't need to backport it to Bionic onwards.
Upstream commit:
https://git.qemu.org/?p=qemu.git;a=commit;h=73c6e4013b4c
$ git describe --contains 73c6e4013b4c
v2.10.0-rc2~1^2~8
$ rmadison qemu
===> qemu | 1:2.5+dfsg-5ubuntu10.34 | xenial-updates/universe | amd64, ...
qemu | 1:2.11+dfsg-1ubuntu7 | bionic/universe | amd64, ...
qemu | 1:2.12+dfsg-3ubuntu8 | cosmic/universe | amd64, ...
qemu | 1:3.1+dfsg-2ubuntu2 | disco/universe | amd64, ...
[Test Case]
Being a racing condition, this is a tricky bug to reproduce consistently. We've had reports of users running into this with OpenStack deployments and Windows Server guests, and the scenario is usually like this:
1) Deploy a 16vCPU Windows Server 2012 R2 guest with a virtio network interface
2) Stress the network interface with e.g. Windows HLK test suite or similar
3) Repeatedly attach/detach the network adapter that's in use
It usually takes more than ~4000 attach/detach cycles to trigger the bug.
[Regression Potential]
Regressions for this might arise from the fact that the fix changes RCU lock code. Since this patch has been upstream and in other series for a while, it's unlikely that it would regressions in RCU code specifically. Other code that makes use of the RCU locks (MMIO and some monitor events) will be thoroughly tested for any regressions with use-case scenarios and scripted runs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1818880/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list