[SRU][J/L][PATCH 1/1] x86/xen/time: prefer tsc as clocksource when it is invariant
Krister Johansen
kjlx at templeofstupid.com
Sat Aug 26 01:00:22 UTC 2023
On Fri, Aug 25, 2023 at 05:57:30PM -0700, Krister Johansen wrote:
> BugLink: http://bugs.launchpad.net/bugs/2033122
>
> Kvm elects to use tsc instead of kvm-clock when it can detect that the
> TSC is invariant.
>
> (As of commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> invariant TSC is exposed")).
>
> Notable cloud vendors[1] and performance engineers[2] recommend that Xen
> users preferentially select tsc over xen-clocksource due the performance
> penalty incurred by the latter. These articles are persuasive and
> tailored to specific use cases. In order to understand the tradeoffs
> around this choice more fully, this author had to reference the
> documented[3] complexities around the Xen configuration, as well as the
> kernel's clocksource selection algorithm. Many users may not attempt
> this to correctly configure the right clock source in their guest.
>
> The approach taken in the kvm-clock module spares users this confusion,
> where possible.
>
> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> a tsc as invariant means that it should be considered stable by the OS
> and is elibile to be used as a wall clock source.
>
> In order to obtain better out-of-the-box performance, and reduce the
> need for user tuning, follow kvm's approach and decrease the xen clock
> rating so that tsc is preferable, if it is invariant, stable, and the
> tsc will never be emulated.
>
> [1] https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/
> [2] https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html
> [3] https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html
> [4] Intel 64 and IA-32 Architectures Sofware Developer's Manual Volume
> 3b: System Programming Guide, Part 2, Section 17.17.1, Invariant TSC
>
> Signed-off-by: Krister Johansen <kjlx at templeofstupid.com>
> Code-reviewed-by: David Reaver <me at davidreaver.com>
> Reviewed-by: Juergen Gross <jgross at suse.com>
> Link: https://lore.kernel.org/r/20221216162118.GB2633@templeofstupid.com
> Signed-off-by: Juergen Gross <jgross at suse.com>
> (cherry picked from commit 99a7bcafbd0d04555074554573019096a8c10450)
Apologies, this should be:
(cherry picked from commit caea091e48ed9d3951506507abf26e9918d08e35)
> Signed-off-by: Krister Johansen <kjlx at templeofstupid.com>
> ---
> arch/x86/xen/time.c | 38 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index cc11dd2e2f48..062120dd8b98 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -474,15 +474,51 @@ static void xen_setup_vsyscall_time_info(void)
> xen_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_PVCLOCK;
> }
>
> +/*
> + * Check if it is possible to safely use the tsc as a clocksource. This is
> + * only true if the hypervisor notifies the guest that its tsc is invariant,
> + * the tsc is stable, and the tsc instruction will never be emulated.
> + */
> +static int __init xen_tsc_safe_clocksource(void)
> +{
> + u32 eax, ebx, ecx, edx;
> +
> + if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC)))
> + return 0;
> +
> + if (!(boot_cpu_has(X86_FEATURE_NONSTOP_TSC)))
> + return 0;
> +
> + if (check_tsc_unstable())
> + return 0;
> +
> + /* Leaf 4, sub-leaf 0 (0x40000x03) */
> + cpuid_count(xen_cpuid_base() + 3, 0, &eax, &ebx, &ecx, &edx);
> +
> + /* tsc_mode = no_emulate (2) */
> + if (ebx != 2)
> + return 0;
> +
> + return 1;
> +}
> +
> static void __init xen_time_init(void)
> {
> struct pvclock_vcpu_time_info *pvti;
> int cpu = smp_processor_id();
> struct timespec64 tp;
>
> - /* As Dom0 is never moved, no penalty on using TSC there */
> + /*
> + * As Dom0 is never moved, no penalty on using TSC there.
> + *
> + * If it is possible for the guest to determine that the tsc is a safe
> + * clocksource, then set xen_clocksource rating below that of the tsc
> + * so that the system prefers tsc instead.
> + */
> if (xen_initial_domain())
> xen_clocksource.rating = 275;
> + else if (xen_tsc_safe_clocksource())
> + xen_clocksource.rating = 299;
>
> clocksource_register_hz(&xen_clocksource, NSEC_PER_SEC);
>
> --
> 2.25.1
>
More information about the kernel-team
mailing list