[SRU][N][O][PATCH 3/3] KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init

Matthew Ruffell matthew.ruffell at canonical.com
Wed Jan 8 21:12:08 UTC 2025


Ah, I see. Yes, just the single patch for N and O. I will send it as 1/1
next time for different series.

Thanks,
Matthew

On Wed, 8 Jan 2025 at 22:29, Juerg Haefliger <juerg.haefliger at canonical.com>
wrote:

> Patch 3/3 of but it's only a single patch for N and O?
>
> ...Juerg
>
> On Wed,  8 Jan 2025 16:53:33 +1300
> Matthew Ruffell <matthew.ruffell at canonical.com> wrote:
>
> > From: Sean Christopherson <seanjc at google.com>
> >
> > BugLink: https://bugs.launchpad.net/bugs/2093146
> >
> > Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to
> > avoid the overead of CPUID during runtime.  The offset, size, and
> metadata
> > for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values,
> i.e.
> > is constant for a given CPU, and thus can be cached during module load.
> >
> > On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point
> where
> > recomputing XSAVE offsets and sizes results in a 4x increase in latency
> of
> > nested VM-Enter and VM-Exit (nested transitions can trigger
> > xstate_required_size() multiple times per transition), relative to using
> > cached values.  The issue is easily visible by running `perf top` while
> > triggering nested transitions: kvm_update_cpuid_runtime() shows up at a
> > whopping 50%.
> >
> > As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test
> > and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested
> > roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald
> > Rapids (EMR) takes:
> >
> >   SKX 11650
> >   ICX 22350
> >   EMR 28850
> >
> > Using cached values, the latency drops to:
> >
> >   SKX 6850
> >   ICX 9000
> >   EMR 7900
> >
> > The underlying issue is that CPUID itself is slow on ICX, and comically
> > slow on EMR.  The problem is exacerbated on CPUs which support XSAVES
> > and/or XSAVEC, as KVM invokes xstate_required_size() twice on each
> > runtime CPUID update, and because there are more supported XSAVE features
> > (CPUID for supported XSAVE feature sub-leafs is significantly slower).
> >
> >  SKX:
> >   CPUID.0xD.2  = 348 cycles
> >   CPUID.0xD.3  = 400 cycles
> >   CPUID.0xD.4  = 276 cycles
> >   CPUID.0xD.5  = 236 cycles
> >   <other sub-leaves are similar>
> >
> >  EMR:
> >   CPUID.0xD.2  = 1138 cycles
> >   CPUID.0xD.3  = 1362 cycles
> >   CPUID.0xD.4  = 1068 cycles
> >   CPUID.0xD.5  = 910 cycles
> >   CPUID.0xD.6  = 914 cycles
> >   CPUID.0xD.7  = 1350 cycles
> >   CPUID.0xD.8  = 734 cycles
> >   CPUID.0xD.9  = 766 cycles
> >   CPUID.0xD.10 = 732 cycles
> >   CPUID.0xD.11 = 718 cycles
> >   CPUID.0xD.12 = 734 cycles
> >   CPUID.0xD.13 = 1700 cycles
> >   CPUID.0xD.14 = 1126 cycles
> >   CPUID.0xD.15 = 898 cycles
> >   CPUID.0xD.16 = 716 cycles
> >   CPUID.0xD.17 = 748 cycles
> >   CPUID.0xD.18 = 776 cycles
> >
> > Note, updating runtime CPUID information multiple times per nested
> > transition is itself a flaw, especially since CPUID is a mandotory
> > intercept on both Intel and AMD.  E.g. KVM doesn't need to ensure
> emulated
> > CPUID state is up-to-date while running L2.  That flaw will be fixed in a
> > future patch, as deferring runtime CPUID updates is more subtle than it
> > appears at first glance, the benefits aren't super critical to have once
> > the XSAVE issue is resolved, and caching CPUID output is desirable even
> if
> > KVM's updates are deferred.
> >
> > Cc: Jim Mattson <jmattson at google.com>
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc at google.com>
> > Message-ID: <20241211013302.1347853-2-seanjc at google.com>
> > Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
> > (cherry picked from commit 1201f226c863b7da739f7420ddba818cedf372fc)
> > Signed-off-by: Matthew Ruffell <matthew.ruffell at canonical.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 31 ++++++++++++++++++++++++++-----
> >  arch/x86/kvm/cpuid.h |  1 +
> >  arch/x86/kvm/x86.c   |  2 ++
> >  3 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index ce1499732cb8..c4a369bb1444 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -36,6 +36,26 @@
> >  u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> >  EXPORT_SYMBOL_GPL(kvm_cpu_caps);
> >
> > +struct cpuid_xstate_sizes {
> > +     u32 eax;
> > +     u32 ebx;
> > +     u32 ecx;
> > +};
> > +
> > +static struct cpuid_xstate_sizes xstate_sizes[XFEATURE_MAX]
> __ro_after_init;
> > +
> > +void __init kvm_init_xstate_sizes(void)
> > +{
> > +     u32 ign;
> > +     int i;
> > +
> > +     for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes); i++) {
> > +             struct cpuid_xstate_sizes *xs = &xstate_sizes[i];
> > +
> > +             cpuid_count(0xD, i, &xs->eax, &xs->ebx, &xs->ecx, &ign);
> > +     }
> > +}
> > +
> >  u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >  {
> >       int feature_bit = 0;
> > @@ -44,14 +64,15 @@ u32 xstate_required_size(u64 xstate_bv, bool
> compacted)
> >       xstate_bv &= XFEATURE_MASK_EXTEND;
> >       while (xstate_bv) {
> >               if (xstate_bv & 0x1) {
> > -                     u32 eax, ebx, ecx, edx, offset;
> > -                     cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx,
> &edx);
> > +                     struct cpuid_xstate_sizes *xs =
> &xstate_sizes[feature_bit];
> > +                     u32 offset;
> > +
> >                       /* ECX[1]: 64B alignment in compacted form */
> >                       if (compacted)
> > -                             offset = (ecx & 0x2) ? ALIGN(ret, 64) :
> ret;
> > +                             offset = (xs->ecx & 0x2) ? ALIGN(ret, 64)
> : ret;
> >                       else
> > -                             offset = ebx;
> > -                     ret = max(ret, offset + eax);
> > +                             offset = xs->ebx;
> > +                     ret = max(ret, offset + xs->eax);
> >               }
> >
> >               xstate_bv >>= 1;
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index 23dbb9eb277c..da4e23e32cff 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -32,6 +32,7 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
> >  bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >              u32 *ecx, u32 *edx, bool exact_only);
> >
> > +void __init kvm_init_xstate_sizes(void);
> >  u32 xstate_required_size(u64 xstate_bv, bool compacted);
> >
> >  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index e36bf6162fc8..898b191954eb 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -13917,6 +13917,8 @@
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
> >
> >  static int __init kvm_x86_init(void)
> >  {
> > +     kvm_init_xstate_sizes();
> > +
> >       kvm_mmu_x86_module_init();
> >       mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
> cpu_smt_possible();
> >       return 0;
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250109/ecae2063/attachment-0001.html>


More information about the kernel-team mailing list