[Bug 1882774] Re: issues with secondary VMX execution controls

Christian Ehrhardt  1882774 at bugs.launchpad.net
Wed Jun 10 05:59:33 UTC 2020


I have tested this with a GCE instance with nested enabled
here the domcapabilities for the type:
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Skylake-Client-IBRS</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='disable' name='mpx'/>
      <feature policy='disable' name='xsavec'/>
      <feature policy='disable' name='xgetbv1'/>
    </mode>

I set the guest (on Bionic) to use host-model:
   <cpu mode="host-model"/>

Due to that it got on first execution the model generated as the above
reported type:

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Skylake-Client-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='disable' name='mpx'/>
    <feature policy='disable' name='xsavec'/>
    <feature policy='disable' name='xgetbv1'/>
  </cpu>

Guest starts fine, no related errors in
/var/log/libvirt/qemu/testguest.log

After an update to focal the type is now reported as unusable
      <model usable='no'>Skylake-Client-IBRS</model>

The guest would now be detected as, it thinks this definition is now
closer:

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Broadwell-IBRS</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='vme'/>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='f16c'/>
      <feature policy='require' name='rdrand'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='arat'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='umip'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaveopt'/>
      <feature policy='require' name='abm'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='rsba'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
    </mode>


No matter if I use host-model of the former Skylake type - the guest now starts with the reported crash:

$ sudo tail -f /var/log/libvirt/qemu/testguest.log
...
error: Failed to start domain testguest
error: internal error: process exited while connecting to monitor: 2020-06-09T13:51:27.110925Z 2020-06-09T13:51:27.111798Z qemu-system-x86_64: error: failed to set MSR 0x48b to 0x159ff00000000
qemu-system-x86_64: /build/qemu-7aKH5L/qemu-4.2/target/i386/kvm.c:2680: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.


Updating to the PPA with the suggested fix and it resolves the issue as expected.


** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Changed in: qemu (Ubuntu)
       Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1882774

Title:
  issues with secondary VMX execution controls

Status in Ubuntu Cloud Archive:
  New
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Focal:
  Triaged

Bug description:
  [Impact]

  In qemu 4.2 was a change [1] meant to improve the handling of MSRs vs CPUID.
  It was later identified [2] as an issue and fixed.
  This has to be backported to Focal to resolve that issue on several platforms.

  An example where this occurs is:
  - Azure instances with nested virt
  - GCP instances with nested virt

  We have seen a bunch of qemu named CPU types that can expose similar behavior when used on chips that pretend to be of some type e.g. Skylake but miss some of their features to be settable.
  It isn't entirely sure thou that this will be fixed by the same - yet worth to mention.

  The impact is that qemu 4.2 as in Ubuntu 20.04 doesn't work on those
  platforms bailing out.

  [1]: https://github.com/qemu/qemu/commit/048c95163b472ed737a2f0dca4f4e23a82ac2f8a
  [2]: https://github.com/qemu/qemu/commit/4a910e1f6ab4155ec8b24c49b2585cc486916985

  [Test Case]

   * Get a GCP or Azure instance with nested virtualization enabled
   * Spawn a KVM guest on it e.g. by using uvtool-libvirt using a named type 
     matching the cpu
     e.g. if the host reports as skylake use such a type.
     You can use `virsh domcapabilities` to check what the host is
     detected as.

  [Regression Potential]

   * It is a bit hard to guess, but it should not make things worse. But if I'd expect one then the
     VMX subfeatures could change on cases not intended to. Yet we should have one of two cases:
     a) the common one is that the host can set this and has done so, it will continue as before
     b) host was unable to set these and failed, this should now work with the fix in place
     Both seem ok to me.

  [Other Info]

   * there might be a local (non cloud) way to reproduce but I don't
  know it yet

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1882774/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list