ACK: [PATCH 0/1][SRU][B][F] KVM emulation failure when booting into VM crash kernel with multiple CPUs
Tim Gardner
tim.gardner at canonical.com
Wed Oct 27 17:37:15 UTC 2021
Acked-by: Tim Gardner <tim.gardner at canonical.com>
On 10/27/21 11:15 AM, Heitor Alves de Siqueira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1948862
>
> [Impact]
> When kexec'ing into a crash kernel with ncpus > 1, VMs can raise a KVM
> emulation failure. This will cause the VM to go into the "paused"
> state, and prevents it from being restored without a full VM restart.
>
> This happens only when there are multiple enabled CPUs in the crash
> kernel command-line, regardless of whether `nr_cpus` or `maxcpus` is
> being used. Due to the vCPU MMU state not being cleaned up correctly,
> the secondary CPUs try to access virtual addresses with a faulty MMU
> context that will result in the emulation failure. This shows up with
> a similar spew as below:
>
> $ sudo tail -n20 /var/log/libvirt/qemu/focal-vm.log
> KVM internal error. Suberror: 1
> emulation failure
> EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
> EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =de00 000de000 0000ffff 00009300
> DS =de00 000de000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=290b8001 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f
> b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a
>
> [Test Plan]
> 1. Boot an Ubuntu guest VM with e.g. multipass:
> $ multipass launch daily:focal -c8 -m16g -n focal-vm
>
> 2. Configure guest crash kernel command-line with `nr_cpus=8`:
> ubuntu at focal-vm:~$ grep CMDLINE_APPEND /etc/default/kdump-tools
> # KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
> KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0"
>
> 3. Crash guest VM and watch for the KVM emulation failure:
> ubuntu at focal-vm:~$ echo c | sudo tee /proc/sysrq-trigger
>
> [Where problems could occur]
> As we're resetting MMU context on vCPUs, potential regressions would
> show up in workloads relying on KVM guests. We should properly test
> the scenario mentioned in the bug to make sure secondary CPUs are
> being cleaned up properly, and that no other regressions have been
> introduced when rebooting or kexec'ing into different kernels.
> Since we're adding an MMU reset at kvm_vcpu_reset(), the overall
> regression potential should be fairly low and contained to
> starting/resetting vCPUs (i.e. VM start and reboot).
>
> [Other info]
> This has been fixed by upstream commit:
> 0aa1837533e5 KVM: x86: Properly reset MMU context at vCPU RESET/INIT
>
> The commit above has been picked up by stable trees up until 5.11, so
> it's only needed in Bionic and Focal (4.15 and 5.4 kernels). There are
> also two follow up commits, which revert the vendor-specific resets:
> 5d2d7e41e3b8 KVM: SVM: Drop explicit MMU reset at RESET/INIT
> 61152cd907d5 KVM: VMX: Remove explicit MMU reset in enter_rmode()
>
> These follow ups have not been picked up in stable trees due to the
> risk of regressions. According to the original fix, they have been
> introduced primarily to aid bisection in case there are workflows
> relying on the vendor resets. As these are not required for the fix
> and don't conflict with the backport, we should leave them out to
> prevent potential regressions in the older kernels.
>
> Sean Christopherson (1):
> KVM: x86: Properly reset MMU context at vCPU RESET/INIT
>
> arch/x86/kvm/x86.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
--
-----------
Tim Gardner
Canonical, Inc
More information about the kernel-team
mailing list