[SRU][Xenial][PULL] Guests using IBRS incur a large performance penalty (LP: #1764956)

Stefan Bader stefan.bader at canonical.com
Tue Jan 8 12:07:49 UTC 2019


On 08.01.19 12:33, Juerg Haefliger wrote:
> On Mon, 7 Jan 2019 19:26:55 +0100
> Stefan Bader <stefan.bader at canonical.com> wrote:
> 
>> On 19.12.18 11:03, Juerg Haefliger wrote:
>>> [Impact]
>>> the IBRS would be mistakenly enabled in the host when the switching
>>> from an IBRS-enabled VM and that causes the performance overhead in
>>> the host. The other condition could also mistakenly disables the IBRS
>>> in VM when context-switching from the host. And this could be
>>> considered a CVE host.
>>>
>>> [Fix]
>>> The patch fixes the logic inside the x86_virt_spec_ctrl that it checks
>>> the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the
>>> x86_spec_ctrl_base by default is zero. Because the upstream
>>> implementation is not equal to the Xenial's implementation. Upstream
>>> doesn't use the IBRS as the formal fix. So, by default, it's zero.
>>>
>>> On the other hand, after the VM exit, the SPEC_CTRL register also
>>> needs to be saved manually by reading the SPEC_CTRL MSR as the MSR
>>> intercept is disabled by default in the hardware_setup(v4.4) and
>>> vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and
>>> doesn't trigger a trap. So, the vmx_set_msr() function isn't called.
>>>
>>> The v3.13 kernel hasn't been tested. However, the patch can be viewed
>>> at:
>>> http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru
>>>
>>> The v4.4 patch:
>>> http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg
>>>
>>> [Test]
>>>
>>> The patch has been tested on the 4.4.0-140.166 and works fine.
>>>
>>> The reproducing environment:
>>> Guest kernel version: 4.4.0-138.164
>>> Host kernel version: 4.4.0-140.166
>>>
>>> (host IBRS, guest IBRS)
>>>
>>> - 1). (0, 1).
>>> The case can be reproduced by the following instructions:
>>> guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
>>> 1
>>>
>>> <Several minutes later...>
>>>
>>> host$ cat /proc/sys/kernel/ibrs_enabled
>>> 0
>>> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
>>> 11111111111111000000000000000000010010100000000000000000
>>>
>>> Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
>>> enabled.
>>>
>>> host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
>>> stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
>>> stress-ng: info:  [11264] dispatching hogs: 1 cpu
>>> stress-ng: info:  [11264] cache allocate: default cache size: 35840K
>>> stress-ng: info:  [11264] successful run completed in 33.48s
>>>
>>> The host kernel didn't notice the IBRS bit is enabled. So, the situation
>>> is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
>>> And running the stress-ng is a pure userspace CPU capability
>>> calculation. So, the performance downgrades to about 1/3. Without the
>>> IBRS enabled, it needs about 10s.
>>>
>>> - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
>>> The guest IBRS has been mistakenly disabled.
>>>
>>> guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
>>> guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
>>> 11111111111111111111111111111111111111111111111111111111
>>>
>>> host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
>>> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
>>> 11111111111111111111111111111111111111111111111111111111
>>> host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
>>> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
>>> 00000000000000000000000000000000000000000000000000000000
>>>
>>> guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
>>> 00000000000000000000000000000000000000000000000000000000
>>>
>>>
>>> [juergh: MSR-isolation between guests and the host is incomplete in
>>>  Xenial. This PR is supposed to fix this and bring Xenial up to par with
>>>  stable v4.9.]
>>>
>>> Signed-off-by: Juerg Haefliger <juergh at canonical.com>  
>>
>> Just for sanity checking: this pull request (hate hate hate)
> 
> ? The PR itself, or the fact that it's a PR, or the submitter of the PR?

The submitter a bit for the size, but mostly Intel for getting us into this mess...

> 
> 
>> replaces the
>> submitted patch, right?
> 
> Yes.
> 
> ...Juerg
> 
> 
>>
>> -Stefan
>>> ---
>>>
>>> The following changes since commit d0b9a387cf1d68745c558d04fd3aa980497d1529:
>>>
>>>   UBUNTU: SAUCE: x86/speculation: Move RSB_CTXSW hunk (2018-12-13 13:03:55 +0100)
>>>
>>> are available in the Git repository at:
>>>
>>>   git://git.launchpad.net/~juergh/+git/xenial-linux lp1764956-v2
>>>
>>> for you to fetch changes up to 7ad0e9a99c1466f8fee92cba5ffeaa0af83f6622:
>>>
>>>   UBUNTU: SAUCE: Restore the IBRS host state on VMEXIT (2018-12-19 10:58:24 +0100)
>>>
>>> ----------------------------------------------------------------
>>> Ashok Raj (1):
>>>       KVM/x86: Add IBPB support
>>>
>>> David Matlack (1):
>>>       KVM: nVMX: mark vmcs12 pages dirty on L2 exit
>>>
>>> Jim Mattson (5):
>>>       kvm: nVMX: VMCLEAR an active shadow VMCS after last use
>>>       kvm: vmx: Scrub hardware GPRs at VM-exit
>>>       KVM: nVMX: Eliminate vmcs02 pool
>>>       kvm: x86: IA32_ARCH_CAPABILITIES is always supported
>>>       kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
>>>
>>> Juerg Haefliger (4):
>>>       UBUNTU: SAUCE: [Fix] KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
>>>       UBUNTU: SAUCE: [Fix] x86/KVM/VMX: Add L1D flush logic
>>>       UBUNTU: SAUCE: KVM: Move code fragments, cleanup and re-indent
>>>       UBUNTU: SAUCE: Restore the IBRS host state on VMEXIT
>>>
>>> KarimAllah Ahmed (3):
>>>       KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
>>>       KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
>>>       X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
>>>
>>> Paolo Bonzini (5):
>>>       KVM: VMX: introduce alloc_loaded_vmcs
>>>       KVM: VMX: make MSR bitmaps per-VCPU
>>>       KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
>>>       KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
>>>       KVM: VMX: fixes for vmentry_l1d_flush module parameter
>>>
>>> Radim Krčmář (1):
>>>       KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
>>>
>>> Thomas Gleixner (2):
>>>       KVM: SVM: Move spec control call after restore of GS
>>>       KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
>>>
>>> Tom Lendacky (1):
>>>       KVM: SVM: Add MSR-based feature support for serializing LFENCE
>>>
>>> Wanpeng Li (1):
>>>       KVM: X86: Allow userspace to define the microcode version
>>>
>>>  arch/x86/include/asm/kvm_host.h |   1 +
>>>  arch/x86/kernel/cpu/bugs.c      |   4 +
>>>  arch/x86/kvm/cpuid.c            |  25 +-
>>>  arch/x86/kvm/cpuid.h            |  74 ++--
>>>  arch/x86/kvm/svm.c              | 209 +++++++++--
>>>  arch/x86/kvm/vmx.c              | 777 ++++++++++++++++++++++------------------
>>>  arch/x86/kvm/x86.c              |  12 +-
>>>  7 files changed, 691 insertions(+), 411 deletions(-)
>>>   
>>
>>
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20190108/f256d77f/attachment.sig>


More information about the kernel-team mailing list