[PATCH][SRU][Xenial] Fix for LP:#1724614
Victor Tapia
victor.tapia at canonical.com
Fri Jan 26 16:41:23 UTC 2018
BugLink: https://bugs.launchpad.net/bugs/1724614
[SRU Justification]
[Impact]
We've identified a constant high (~90%) system time load at the host level
when a VCPU in a KVM guest remains or switches/resumes in/from halt/idle state
in a constant frequency, usually for a slightly smaller time than the default polling
period.
The halt polling mechanism has the intention to reduce latency in the cases
on which the guest is quickly resumed saving a call to the scheduler.
We've performed some testing by adjusting the /sys/module/kvm/parameters/halt_poll_ns
value which defines the max time that should be spend polling before calling the
scheduler to allow it to run other tasks (which defaults to 400000 ns in Ubuntu).
With the default value the tests shows that the load remains nearly on 90% on a
VCPU that has a single task in the run queue.
We've also tested altering the halt_poll_ns value to 200000 ns and the results
seems to drop the system time usage from 90% to ~5%.
root at porygon:/home/ubuntu# echo 200000 > /sys/module/kvm/parameters/halt_poll_ns
root at porygon:/home/ubuntu# mpstat 1 -P 6 5
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)
02:06:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
...
Average: 6 0.00 0.00 4.26 0.00 0.00 0.00 0.00 17.83 0.00 77.91
root at porygon:/home/ubuntu# echo 400000 > /sys/module/kvm/parameters/halt_poll_ns
root at porygon:/home/ubuntu# mpstat 1 -P 6 5
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)
02:06:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
...
Average: 6 0.00 0.00 89.59 0.00 0.00 0.00 0.00 8.45 0.00 1.96
[Test]
1) Configure a KVM guest with a single pinned VCPU.
2) Run the following program (http://pastebin.ubuntu.com/25731919/) at the KVM guest.
$ gcc test.c -lpthread -o test && ./test 250 0
3) Run mpstat at the host on the pinned CPU and compare the stats
$ sudo mpstat 1 -P 6 5
[Fix]
Change the halt polling max time to half of the current value.
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
increase heavily even in cases where the performance improvement was
small. In particular, bandwidth divided by CPU usage was as much as
60% lower.
To some extent this is the expected effect of the patch, and the
additional CPU utilization is only visible when running the
benchmarks. However, halving the threshold also halves the extra
CPU utilization (from +30-130% to +20-70%) and has no negative
effect on performance.
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
* https://github.com/torvalds/linux/commit/b401ee0b85a53e89739ff68a5b1a0667d664afc9
Paolo Bonzini (1):
KVM: x86: lower default for halt_poll_ns
arch/x86/include/asm/kvm_host.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.7.4
More information about the kernel-team
mailing list