[SRU][Xenial][PULL] Guests using IBRS incur a large performance penalty (LP: #1764956)

Juerg Haefliger juerg.haefliger at canonical.com
Wed Dec 19 10:03:19 UTC 2018


[Impact]
the IBRS would be mistakenly enabled in the host when the switching
from an IBRS-enabled VM and that causes the performance overhead in
the host. The other condition could also mistakenly disables the IBRS
in VM when context-switching from the host. And this could be
considered a CVE host.

[Fix]
The patch fixes the logic inside the x86_virt_spec_ctrl that it checks
the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the
x86_spec_ctrl_base by default is zero. Because the upstream
implementation is not equal to the Xenial's implementation. Upstream
doesn't use the IBRS as the formal fix. So, by default, it's zero.

On the other hand, after the VM exit, the SPEC_CTRL register also
needs to be saved manually by reading the SPEC_CTRL MSR as the MSR
intercept is disabled by default in the hardware_setup(v4.4) and
vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and
doesn't trigger a trap. So, the vmx_set_msr() function isn't called.

The v3.13 kernel hasn't been tested. However, the patch can be viewed
at:
http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru

The v4.4 patch:
http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg

[Test]

The patch has been tested on the 4.4.0-140.166 and works fine.

The reproducing environment:
Guest kernel version: 4.4.0-138.164
Host kernel version: 4.4.0-140.166

(host IBRS, guest IBRS)

- 1). (0, 1).
The case can be reproduced by the following instructions:
guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
1

<Several minutes later...>

host$ cat /proc/sys/kernel/ibrs_enabled
0
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111000000000000000000010010100000000000000000

Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
enabled.

host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
stress-ng: info:  [11264] dispatching hogs: 1 cpu
stress-ng: info:  [11264] cache allocate: default cache size: 35840K
stress-ng: info:  [11264] successful run completed in 33.48s

The host kernel didn't notice the IBRS bit is enabled. So, the situation
is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
And running the stress-ng is a pure userspace CPU capability
calculation. So, the performance downgrades to about 1/3. Without the
IBRS enabled, it needs about 10s.

- 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
The guest IBRS has been mistakenly disabled.

guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111111111111111111111111111111111111111111111

host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111111111111111111111111111111111111111111111
host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
00000000000000000000000000000000000000000000000000000000

guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
00000000000000000000000000000000000000000000000000000000


[juergh: MSR-isolation between guests and the host is incomplete in
 Xenial. This PR is supposed to fix this and bring Xenial up to par with
 stable v4.9.]

Signed-off-by: Juerg Haefliger <juergh at canonical.com>
---

The following changes since commit d0b9a387cf1d68745c558d04fd3aa980497d1529:

  UBUNTU: SAUCE: x86/speculation: Move RSB_CTXSW hunk (2018-12-13 13:03:55 +0100)

are available in the Git repository at:

  git://git.launchpad.net/~juergh/+git/xenial-linux lp1764956-v2

for you to fetch changes up to 7ad0e9a99c1466f8fee92cba5ffeaa0af83f6622:

  UBUNTU: SAUCE: Restore the IBRS host state on VMEXIT (2018-12-19 10:58:24 +0100)

----------------------------------------------------------------
Ashok Raj (1):
      KVM/x86: Add IBPB support

David Matlack (1):
      KVM: nVMX: mark vmcs12 pages dirty on L2 exit

Jim Mattson (5):
      kvm: nVMX: VMCLEAR an active shadow VMCS after last use
      kvm: vmx: Scrub hardware GPRs at VM-exit
      KVM: nVMX: Eliminate vmcs02 pool
      kvm: x86: IA32_ARCH_CAPABILITIES is always supported
      kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb

Juerg Haefliger (4):
      UBUNTU: SAUCE: [Fix] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
      UBUNTU: SAUCE: [Fix] x86/KVM/VMX: Add L1D flush logic
      UBUNTU: SAUCE: KVM: Move code fragments, cleanup and re-indent
      UBUNTU: SAUCE: Restore the IBRS host state on VMEXIT

KarimAllah Ahmed (3):
      KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
      KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
      X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs

Paolo Bonzini (5):
      KVM: VMX: introduce alloc_loaded_vmcs
      KVM: VMX: make MSR bitmaps per-VCPU
      KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
      KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
      KVM: VMX: fixes for vmentry_l1d_flush module parameter

Radim Krčmář (1):
      KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC

Thomas Gleixner (2):
      KVM: SVM: Move spec control call after restore of GS
      KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled

Tom Lendacky (1):
      KVM: SVM: Add MSR-based feature support for serializing LFENCE

Wanpeng Li (1):
      KVM: X86: Allow userspace to define the microcode version

 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kernel/cpu/bugs.c      |   4 +
 arch/x86/kvm/cpuid.c            |  25 +-
 arch/x86/kvm/cpuid.h            |  74 ++--
 arch/x86/kvm/svm.c              | 209 +++++++++--
 arch/x86/kvm/vmx.c              | 777 ++++++++++++++++++++++------------------
 arch/x86/kvm/x86.c              |  12 +-
 7 files changed, 691 insertions(+), 411 deletions(-)



More information about the kernel-team mailing list