ACK: [SRU][J][N][O][PATCH 0/3] KVM: Cache CPUID at KVM.ko module init to reduce latency of VM-Enter and VM-Exit
Jacob Martin
jacob.martin at canonical.com
Thu Jan 9 14:33:16 UTC 2025
On Wed, Jan 08, 2025 at 04:53:30PM +1300, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2093146
>
> [Impact]
>
> The CPUID instruction is particularly slow on newer generation processors,
> with Intel's Emerald Rapids processor taking significantly longer to execute
> CPUID than Skylake or Icelake.
>
> This introduces significant latency into the KVM subsystem, as it frequently
> calls CPUID when recomputing XSTATE offsets, and especially XSAVE values, as
> they need to call CPUID twice for each XSAVE call.
>
> CPUID.0xD.[1..n] are constant and do not change during runtime, as they don't
> depend on XCR0 or XSS values, and can be saved and cached for future usage.
>
> By caching CPUID.0xD.[1..n] at kvm.ko module load, latency decreases by up to
> 400%.
>
> For a round trip transition between VM-Enter and VM-Exit figures from the
> commit log are:
>
> Skylake 11650
> Icelake 22350
> Emerald 28850
>
> When you add the caching in:
>
> Skylake 6850
> Icelake 9000
> Emerald 7900
>
> That's a saving of 170% for Skylake, 248% for Icelake and 365% for Emerald Rapids.
>
> [Fix]
>
> The fix is part of a 5 patch series. We will only SRU patch 1 for the moment, as
> it is the only one in mainline, and provides a 400% latency improvement, doing
> the brunt of the work. Patches 2-5 are refactors and smaller performance
> improvements, not yet mainlined due to needing rework, and only account for
> about 2.5% latency improvement, quite small, compared to what patch 1 does.
>
> The fix is:
>
> commit 1201f226c863b7da739f7420ddba818cedf372fc
> Author: Sean Christopherson <seanjc at google.com>
> Date: Tue Dec 10 17:32:58 2024 -0800
> Subject: KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1201f226c863b7da739f7420ddba818cedf372fc
>
> This landed in 6.13-rc3, and is currently queued up for upstream -stable 6.12,
> 6.6 and 6.1.
>
> This applies cleanly to noble, oracular. For jammy, it requires the below
> dependency, and a small backport to fix some minor context mismatches.
>
> commit cc04b6a21d431359eceeec0d812b492088b04af5
> Author: Jing Liu <jing2.liu at intel.com>
> Date: Wed Jan 5 04:35:14 2022 -0800
> Subject: kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc04b6a21d431359eceeec0d812b492088b04af5
>
> [Testcase]
>
> 1) Install KVM Stack on Baremetal host
> $ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
> 2) Enable nested virt
> $ vim /etc/modprobe.d/kvm.conf
> options kvm-intel nested=1
> $ sudo reboot
> 3) Start a VM.
> 4) In guest, run:
> $ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
> 5) In host and the guest, run:
> $ sudo apt install build-essential
> 6) Install kvm-unit-tests and run x86/vmexit/cpuid testcase.
> https://gitlab.com/kvm-unit-tests/kvm-unit-tests
> $ git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
> $ ./configure
> $ make standalone
> $ cd tests
> $ sudo -s
> # ACCEL=kvm ./vmexit_cpuid
> BUILD_HEAD=0ed2cdf3
> timeout -k 1s --foreground 90s /usr/bin/qemu-system-x86_64 --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel /tmp/tmp.GMVjItBglu -smp 1 -append cpuid # -initrd /tmp/tmp.uaD4VVyIqc
> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
> enabling apic
> smp: waiting for 0 APs
> paging enabled
> cr0 = 80010011
> cr3 = 1007000
> cr4 = 20
> pci-testdev at 0x10 membar febff000 iobar c000
> cpuid 66485
> PASS vmexit_cpuid
>
> The numbers next to cpuid, is the (t2 = rdtsc) - (t1 = rdtsc) count. Smaller is
> better.
>
> A test kernel is available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf403286-test
>
> Test data for the above test kernel is:
>
> Sapphire Rapids Intel(R) Xeon(R) Platinum 8468
> 24.04 LTS Noble
>
> 6.8.0-51-generic
>
> Outside VM
> cpuid 3527
> cpuid 3384
> cpuid 3467
> cpuid 3300
> cpuid 3544
>
> Inside VM
> cpuid 68395
> cpuid 58364
> cpuid 65254
> cpuid 68554
> cpuid 66905
>
> 6.8.0-51-generic+TEST403286v20250107b1
>
> Outside VM
> cpuid 3253
> cpuid 3416
> cpuid 3447
> cpuid 3260
> cpuid 3281
>
> Inside VM
> cpuid 22852
> cpuid 22890
> cpuid 18168
> cpuid 23462
> cpuid 23281
>
> The number of cycles of rdtsc in a nested VM is of the same order of magnitude
> smaller as we are expecting.
>
> [Where problems could occur]
>
> This fix is related to nested virtualisation in the KVM subsystem. We are adding
> a new function, called on KVM module load, which caches the CPUID instead of
> fetching it every time XSAVE needs to be recomputed, which can be multiple times
> on VM-Enter and VM-Exit on nested guests.
>
> CPUID is static and should never change, so there should be no issues in saving
> a value and reusing it later.
>
> If a regression were to occur, it would affect all KVM users, and there would
> be no workarounds.
>
> [Other info]
>
> Full mailing list series: https://lore.kernel.org/kvm/20241211013302.1347853-1-seanjc@google.com/T/#u
>
> Jing Liu (1):
> kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
>
> Sean Christopherson (1):
> KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
>
> arch/x86/kvm/cpuid.c | 33 +++++++++++++++++++++++++++++----
> arch/x86/kvm/cpuid.h | 2 ++
> arch/x86/kvm/x86.c | 2 ++
> 3 files changed, 33 insertions(+), 4 deletions(-)
>
> --
> 2.45.2
>
Acked-by: Jacob Martin <jacob.martin at canonical.com>
More information about the kernel-team
mailing list