APPLIED/Cmnt: [SRU][Mantic][PATCH 0/1] kvm: Running perf against qemu processes results in page fault inside guest

Stefan Bader stefan.bader at canonical.com
Thu Feb 29 16:37:07 UTC 2024


On 18.02.24 09:19, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2054218
> 
> [Impact]
> 
> Running perf against a QEMU/kvm process results in the guest suffering a page
> fault in trying to store Precise Event Based Sampling (PEBS) records for the
> host. This affects both using perf against a single process, in which it crashes
> the targeted guest, or using perf system wide, in which it crashes all running
> guests on the system.
> 
> The issue was introduced in 6.0 by:
> 
> commit c59a1f106f5cd4843c097069ff1bb2ad72103a67
> Author: Like Xu <like.xu at linux.intel.com>
> Date:   Mon Apr 11 18:19:36 2022 +0800
> Subject: KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c59a1f106f5cd4843c097069ff1bb2ad72103a67
> 
> This affects all 6.2 and 6.5 kernels. There is no known workaround, apart from
> not using perf on affected systems.
> 
> [Fix]
> 
> The issue was fixed in 6.7 by:
> 
> commit 971079464001c6856186ca137778e534d983174a
> Author: Paolo Bonzini <pbonzini at redhat.com>
> Date:   Thu Jan 4 16:15:17 2024 +0100
> Subject: KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=971079464001c6856186ca137778e534d983174a
> 
> This reinstates the logic for setting MSR_CORE_PERF_GLOBAL_CTRL to what it was
> before "KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS".
> 
> -               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> +               .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> 
> The faulty logic includes any bit that isn't both marked as exclude_guest and
> using PEBS, while it should really be excluding PEBS from the host.
> 
> [Testcase]
> 
> Start a bare metal server. Enable KVM, start a few VMs. The VMs can be idle,
> they don't require any workload.
> 
> $ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils uvtool
> $ sudo reboot
> $ ssh-keygen
> $ uvt-simplestreams-libvirt sync --source http://cloud-images.ubuntu.com/daily release=jammy arch=amd64
> $ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-a release=jammy arch=amd64
> $ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-b release=jammy arch=amd64
> $ uvt-kvm create --cpu 4 --memory 4096 --disk 10 jammy-c release=jammy arch=amd64
> $ virsh list
>   Id   Name      State
> -------------------------
>   2    jammy-a   running
>   3    jammy-b   running
>   4    jammy-c   running
> $ uvt-kvm ssh jammy-a
> Check it works.
> $ ps aux | grep qemu
> Find the pid of jammy-a
> $ perf top -p $PID
> $ virsh console jammy-a
> Escape character is ^] (Ctrl + ])
> [  357.793039] BUG: unable to handle page fault for address: fffffe49178c6028
> $ uvt-kvm ssh jammy-a
> (no response)
> 
> Test packages are available in the following ppa:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf379502-test
> 
> If you install it, then running perf against the PID of qemu processes will no
> longer crash the guest, and they will be accessible by SSH afterward.
> 
> [Where problems could occur]
> 
> We are rearranging the logic of setting the PEBS MSRs, which affects processor
> sampling of events. This will affect any profiling tools running against KVM
> based virtual machines, namely perf against QEMU.
> 
> If a regression were to occur, running perf against a VM could cause it to
> page fault and subsequently crash, resulting in downtime.
> 
> The only workaround will be to disable all profiling tools until a fix is
> available.
> 
> Paolo Bonzini (1):
>    KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
> 
>   arch/x86/events/intel/core.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
This was already applied via "Mantic update: upstream stable patchset 
2024-02-26". I have adjusted the bug references to include the specific 
bug report as well.

Applied to mantic:linux/master-next. Thanks.

-Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240229/9b339ecd/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20240229/9b339ecd/attachment-0001.sig>


More information about the kernel-team mailing list