VMware patches for Ubuntu

Alok Kataria akataria at vmware.com
Tue Jan 13 20:05:23 UTC 2009


Hi Tim,

I was supposed to backport these TSC patches - which are required when
run under VMware - for the intrepid tree, but amidst my vacation and
other things these patches just fell off my radar.
I have backported these patches for the intrepid tree and am attaching
all the seven patches with this mail.
Can you please have a look at these patches and apply them to the
intrepid git tree.

Thanks,
Alok

On Fri, 2008-11-07 at 14:18 -0800, Alok Kataria wrote:
> On Fri, 2008-11-07 at 13:33 -0800, Tim Gardner wrote:
> > Alok Kataria wrote:
> > > On Fri, 2008-11-07 at 12:15 -0800, Tim Gardner wrote:
> > >> Alok Kataria wrote:
> > >>> Thats cool, so that would mean you can easily cherry pick these patches
> > >>> for the intrepid tree. Can you let me know for which release can we
> > >>> expect to see these patches.
> > >>>
> > >> If I don't run into issues with Intrepid, then these patches could end
> > >> up in -proposed within the next week or two. 
> > > 
> > > Great, let me know how it goes. Also once the patches get into proposed
> > > tree i can then have a kernel built off that tree for my internal
> > > testing.
> > > 
> > > Thanks,
> > > Alok
> > > 
> > 
> > I think I'll wait until I see what you come up with for the 2.6.24
> > backported patches. For 2.6.27 the first commit
> > b2bcc7b299f37037b4a78dc1538e5d6508ae8110 wants to patch a non-existing
> > file arch/x86/include/asm/cpufeature.h, so I guess its already off into
> > the weeds.
> 
> Ah...yes all these are failing because the x86 architecture specific
> header files were moved into arch/x86/include...,in 2.6.28-rc1.
> 
> That means i will have to backport these for intrepid too, let me do
> that too then.
> 
> Thanks,
> Alok
> > 
> > rtg
-------------- next part --------------
commit b2bcc7b299f37037b4a78dc1538e5d6508ae8110

From: Alok Kataria <akataria at vmware.com>

x86: add a synthetic TSC_RELIABLE feature bit

    Impact: None, bit reservation only

    Add a synthetic TSC_RELIABLE feature bit which will be used to mark
    TSC as reliable so that we could skip all the runtime checks for
    TSC stablity, which have false positives in virtual environment.

    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: Dan Hecht <dhecht at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 include/asm-x86/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index cfcfb0a..0184826 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -82,6 +82,7 @@
 #define X86_FEATURE_11AP	(3*32+19) /* Bad local APIC aka 11AP */
 #define X86_FEATURE_NOPL	(3*32+20) /* The NOPL (0F 1F) instructions */
 #define X86_FEATURE_AMDC1E	(3*32+21) /* AMD C1E detected */
+#define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	(4*32+ 0) /* Streaming SIMD Extensions-3 */
-------------- next part --------------
commit 49ab56ac6e1b907b7dadb72a4012460359feaf0e

From: Alok Kataria <akataria at vmware.com>

x86: add X86_FEATURE_HYPERVISOR feature bit

    Impact: Number declaration only.

    Add X86_FEATURE_HYPERVISOR bit (CPUID level 1, ECX, bit 31).

    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
    Signed-off-by: Alok N Kataria <akataria at vmware.com>
---

 include/asm-x86/cpufeature.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)


diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index 0184826..49e1b59 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -95,6 +95,7 @@
 #define X86_FEATURE_XTPR	(4*32+14) /* Send Task Priority Messages */
 #define X86_FEATURE_DCA		(4*32+18) /* Direct Cache Access */
 #define X86_FEATURE_XMM4_2	(4*32+20) /* Streaming SIMD Extensions-4.2 */
+#define X86_FEATURE_HYPERVISOR	(4*32+31) /* Running on a hypervisor */
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
 #define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore insn) */
@@ -194,6 +195,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_arch_perfmon	boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 #define cpu_has_pat		boot_cpu_has(X86_FEATURE_PAT)
 #define cpu_has_xmm4_2		boot_cpu_has(X86_FEATURE_XMM4_2)
+#define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg		1
-------------- next part --------------
commit 88b094fb8d4fe43b7025ea8d487059e8813e02cd

From: Alok Kataria <akataria at vmware.com>

x86: Hypervisor detection and get tsc_freq from hypervisor

    Impact: Changes timebase calibration on Vmware.

    v3->v2 : Abstract the hypervisor detection and feature (tsc_freq) request
         behind a hypervisor.c file
    v2->v1 : Add a x86_hyper_vendor field to the cpuinfo_x86 structure.
         This avoids multiple calls to the hypervisor detection function.

    This patch adds function to detect if we are running under VMware.
    The current way to check if we are on VMware is following,
    #  check if "hypervisor present bit" is set, if so read the 0x40000000
       cpuid leaf and check for "VMwareVMware" signature.
    #  if the above fails, check the DMI vendors name for "VMware" string
       if we find one we query the VMware hypervisor port to check if we are
       under VMware.

    The DMI + "VMware hypervisor port check" is needed for older VMware products,
    which don't implement the hypervisor signature cpuid leaf.
    Also note that since we are checking for the DMI signature the hypervisor
    port should never be accessed on native hardware.

    This patch also adds a hypervisor_get_tsc_freq function, instead of
    calibrating the frequency which can be error prone in virtualized
    environment, we ask the hypervisor for it. We get the frequency from
    the hypervisor by accessing the hypervisor port if we are running on VMware.
    Other hypervisors too can add code to the generic routine to get frequency on
    their platform.

    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: Dan Hecht <dhecht at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 arch/x86/kernel/cpu/Makefile     |    1 
 arch/x86/kernel/cpu/common.c     |    2 +
 arch/x86/kernel/cpu/common_64.c  |    2 +
 arch/x86/kernel/cpu/hypervisor.c |   48 +++++++++++++++++++++
 arch/x86/kernel/cpu/vmware.c     |   88 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c          |    7 +++
 arch/x86/kernel/tsc.c            |    9 +++-
 include/asm-x86/hypervisor.h     |   26 +++++++++++
 include/asm-x86/processor.h      |    4 ++
 include/asm-x86/vmware.h         |   26 +++++++++++
 10 files changed, 212 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/hypervisor.c
 create mode 100644 arch/x86/kernel/cpu/vmware.c
 create mode 100644 include/asm-x86/hypervisor.h
 create mode 100644 include/asm-x86/vmware.h


diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index ee76eaa..0613c56 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -4,6 +4,7 @@
 
 obj-y			:= intel_cacheinfo.o addon_cpuid_features.o
 obj-y			+= proc.o feature_names.o
+obj-y			+= vmware.o hypervisor.o
 
 obj-$(CONFIG_X86_32)	+= common.o bugs.o
 obj-$(CONFIG_X86_64)	+= common_64.o bugs_64.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4e456bd..0a10238 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -14,6 +14,7 @@
 #include <asm/mce.h>
 #include <asm/pat.h>
 #include <asm/asm.h>
+#include <asm/hypervisor.h>
 #ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/mpspec.h>
 #include <asm/apic.h>
@@ -505,6 +506,7 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 				c->x86, c->x86_model);
 	}
 
+	init_hypervisor(c);
 	/*
 	 * On SMP, boot_cpu_data holds the common feature set between
 	 * all CPUs; so make sure that we indicate which features are
diff --git a/arch/x86/kernel/cpu/common_64.c b/arch/x86/kernel/cpu/common_64.c
index a11f5d4..3450af8 100644
--- a/arch/x86/kernel/cpu/common_64.c
+++ b/arch/x86/kernel/cpu/common_64.c
@@ -34,6 +34,7 @@
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/genapic.h>
+#include <asm/hypervisor.h>
 
 #include "cpu.h"
 
@@ -384,6 +385,7 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 
 	detect_ht(c);
 
+	init_hypervisor(c);
 	/*
 	 * On SMP, boot_cpu_data holds the common feature set between
 	 * all CPUs; so make sure that we indicate which features are
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
new file mode 100644
index 0000000..7bd5506
--- /dev/null
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -0,0 +1,48 @@
+/*
+ * Common hypervisor code
+ *
+ * Copyright (C) 2008, VMware, Inc.
+ * Author : Alok N Kataria <akataria at vmware.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <asm/processor.h>
+#include <asm/vmware.h>
+
+static inline void __cpuinit
+detect_hypervisor_vendor(struct cpuinfo_x86 *c)
+{
+	if (vmware_platform()) {
+		c->x86_hyper_vendor = X86_HYPER_VENDOR_VMWARE;
+	} else {
+		c->x86_hyper_vendor = X86_HYPER_VENDOR_NONE;
+	}
+}
+
+unsigned long get_hypervisor_tsc_freq(void)
+{
+	if (boot_cpu_data.x86_hyper_vendor == X86_HYPER_VENDOR_VMWARE)
+		return vmware_get_tsc_khz();
+	return 0;
+}
+
+void __cpuinit init_hypervisor(struct cpuinfo_x86 *c)
+{
+	detect_hypervisor_vendor(c);
+}
+
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
new file mode 100644
index 0000000..d5d1b75
--- /dev/null
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -0,0 +1,88 @@
+/*
+ * VMware Detection code.
+ *
+ * Copyright (C) 2008, VMware, Inc.
+ * Author : Alok N Kataria <akataria at vmware.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <linux/dmi.h>
+#include <asm/div64.h>
+
+#define CPUID_VMWARE_INFO_LEAF	0x40000000
+#define VMWARE_HYPERVISOR_MAGIC	0x564D5868
+#define VMWARE_HYPERVISOR_PORT	0x5658
+
+#define VMWARE_PORT_CMD_GETVERSION	10
+#define VMWARE_PORT_CMD_GETHZ		45
+
+#define VMWARE_PORT(cmd, eax, ebx, ecx, edx)				\
+	__asm__("inl (%%dx)" :						\
+			"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :	\
+			"0"(VMWARE_HYPERVISOR_MAGIC),			\
+			"1"(VMWARE_PORT_CMD_##cmd),			\
+			"2"(VMWARE_HYPERVISOR_PORT), "3"(0) :		\
+			"memory");
+
+static inline int __vmware_platform(void)
+{
+	uint32_t eax, ebx, ecx, edx;
+	VMWARE_PORT(GETVERSION, eax, ebx, ecx, edx);
+	return eax != (uint32_t)-1 && ebx == VMWARE_HYPERVISOR_MAGIC;
+}
+
+static unsigned long __vmware_get_tsc_khz(void)
+{
+        uint64_t tsc_hz;
+        uint32_t eax, ebx, ecx, edx;
+
+        VMWARE_PORT(GETHZ, eax, ebx, ecx, edx);
+
+        if (eax == (uint32_t)-1)
+                return 0;
+        tsc_hz = eax | (((uint64_t)ebx) << 32);
+        do_div(tsc_hz, 1000);
+        BUG_ON(tsc_hz >> 32);
+        return tsc_hz;
+}
+
+int vmware_platform(void)
+{
+	if (cpu_has_hypervisor) {
+		unsigned int eax, ebx, ecx, edx;
+		char hyper_vendor_id[13];
+
+		cpuid(CPUID_VMWARE_INFO_LEAF, &eax, &ebx, &ecx, &edx);
+		memcpy(hyper_vendor_id + 0, &ebx, 4);
+		memcpy(hyper_vendor_id + 4, &ecx, 4);
+		memcpy(hyper_vendor_id + 8, &edx, 4);
+		hyper_vendor_id[12] = '\0';
+		if (!strcmp(hyper_vendor_id, "VMwareVMware"))
+			return 1;
+	} else if (dmi_available && dmi_name_in_vendors("VMware") &&
+		   __vmware_platform())
+		return 1;
+
+	return 0;
+}
+
+unsigned long vmware_get_tsc_khz(void)
+{
+	BUG_ON(!vmware_platform());
+	return __vmware_get_tsc_khz();
+}
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6d5a3c4..17021f2 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -98,6 +98,7 @@
 
 #include <mach_apic.h>
 #include <asm/paravirt.h>
+#include <asm/hypervisor.h>
 
 #include <asm/percpu.h>
 #include <asm/topology.h>
@@ -902,6 +903,12 @@ void __init setup_arch(char **cmdline_p)
 	e820_reserve_resources();
 	e820_mark_nosave_regions(max_low_pfn);
 
+	/*
+	 * VMware detection requires dmi to be available, so this
+	 * needs to be done after dmi_scan_machine, for the BP.
+	 */
+	init_hypervisor(&boot_cpu_data);
+
 #ifdef CONFIG_X86_32
 	request_resource(&iomem_resource, &video_ram_resource);
 #endif
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index de850e9..e063537 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -15,6 +15,7 @@
 #include <asm/vgtod.h>
 #include <asm/time.h>
 #include <asm/delay.h>
+#include <asm/hypervisor.h>
 
 unsigned int cpu_khz;           /* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -189,9 +190,15 @@ unsigned long native_calibrate_tsc(void)
 {
 	u64 tsc1, tsc2, delta, pm1, pm2, hpet1, hpet2;
 	unsigned long tsc_pit_min = ULONG_MAX, tsc_ref_min = ULONG_MAX;
-	unsigned long flags;
+	unsigned long flags, tsc_khz;
 	int hpet = is_hpet_enabled(), i;
 
+	tsc_khz = get_hypervisor_tsc_freq();
+	if (tsc_khz) {
+		printk(KERN_INFO "TSC: Frequency read from the hypervisor\n");
+		return tsc_khz;
+	}
+
 	/*
 	 * Run 5 calibration loops to get the lowest frequency value
 	 * (the best estimate). We use two different calibration modes
diff --git a/include/asm-x86/hypervisor.h b/include/asm-x86/hypervisor.h
new file mode 100644
index 0000000..369f5c5
--- /dev/null
+++ b/include/asm-x86/hypervisor.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2008, VMware, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+#ifndef ASM_X86__HYPERVISOR_H
+#define ASM_X86__HYPERVISOR_H
+
+extern unsigned long get_hypervisor_tsc_freq(void);
+extern void init_hypervisor(struct cpuinfo_x86 *c);
+
+#endif
diff --git a/include/asm-x86/processor.h b/include/asm-x86/processor.h
index 4df3e2f..a06d9a5 100644
--- a/include/asm-x86/processor.h
+++ b/include/asm-x86/processor.h
@@ -109,6 +109,7 @@ struct cpuinfo_x86 {
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 #endif
+	unsigned int		x86_hyper_vendor;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 #define X86_VENDOR_INTEL	0
@@ -122,6 +123,9 @@ struct cpuinfo_x86 {
 
 #define X86_VENDOR_UNKNOWN	0xff
 
+#define X86_HYPER_VENDOR_NONE  0
+#define X86_HYPER_VENDOR_VMWARE 1
+
 /*
  * capabilities of CPUs
  */
diff --git a/include/asm-x86/vmware.h b/include/asm-x86/vmware.h
new file mode 100644
index 0000000..02dfea5
--- /dev/null
+++ b/include/asm-x86/vmware.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2008, VMware, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+#ifndef ASM_X86__VMWARE_H
+#define ASM_X86__VMWARE_H
+
+extern unsigned long vmware_get_tsc_khz(void);
+extern int vmware_platform(void);
+
+#endif
-------------- next part --------------
commit eca0cd028bdf0f6aaceb0d023e9c7501079a7dda

From: Alok Kataria <akataria at vmware.com>

x86: Add a synthetic TSC_RELIABLE feature bit.

    Impact: Changes timebase calibration on Vmware.

    Use the synthetic TSC_RELIABLE bit to workaround virtualization anomalies.

    Virtual TSCs can be kept nearly in sync, but because the virtual TSC
    offset is set by software, it's not perfect.  So, the TSC
    synchronization test can fail. Even then the TSC can be used as a
    clocksource since the VMware platform exports a reliable TSC to the
    guest for timekeeping purposes. Use this bit to check if we need to
    skip the TSC sync checks.

    Along with this also set the CONSTANT_TSC bit when on VMware, since we
    still want to use TSC as clocksource on VM running over hardware which
    has unsynchronized TSC's (opteron's), since the hypervisor will take
    care of providing consistent TSC to the guest.

    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: Dan Hecht <dhecht at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 arch/x86/kernel/cpu/hypervisor.c |   11 ++++++++++-
 arch/x86/kernel/cpu/vmware.c     |   18 ++++++++++++++++++
 arch/x86/kernel/tsc_sync.c       |    8 +++++++-
 include/asm-x86/vmware.h         |    1 +
 4 files changed, 36 insertions(+), 2 deletions(-)


diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 7bd5506..35ae2b7 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -41,8 +41,17 @@ unsigned long get_hypervisor_tsc_freq(void)
 	return 0;
 }
 
+static inline void __cpuinit
+hypervisor_set_feature_bits(struct cpuinfo_x86 *c)
+{
+	if (boot_cpu_data.x86_hyper_vendor == X86_HYPER_VENDOR_VMWARE) {
+		vmware_set_feature_bits(c);
+		return;
+	}
+}
+
 void __cpuinit init_hypervisor(struct cpuinfo_x86 *c)
 {
 	detect_hypervisor_vendor(c);
+	hypervisor_set_feature_bits(c);
 }
-
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index d5d1b75..2ac4394 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -86,3 +86,21 @@ unsigned long vmware_get_tsc_khz(void)
 	BUG_ON(!vmware_platform());
 	return __vmware_get_tsc_khz();
 }
+
+/*
+ * VMware hypervisor takes care of exporting a reliable TSC to the guest.
+ * Still, due to timing difference when running on virtual cpus, the TSC can
+ * be marked as unstable in some cases. For example, the TSC sync check at
+ * bootup can fail due to a marginal offset between vcpus' TSCs (though the
+ * TSCs do not drift from each other).  Also, the ACPI PM timer clocksource
+ * is not suitable as a watchdog when running on a hypervisor because the
+ * kernel may miss a wrap of the counter if the vcpu is descheduled for a
+ * long time. To skip these checks at runtime we set these capability bits,
+ * so that the kernel could just trust the hypervisor with providing a
+ * reliable virtual TSC that is suitable for timekeeping.
+ */
+void __cpuinit vmware_set_feature_bits(struct cpuinfo_x86 *c)
+{
+	set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
+	set_cpu_cap(c, X86_FEATURE_TSC_RELIABLE);
+}
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 9ffb01c..5977c40 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -108,6 +108,12 @@ void __cpuinit check_tsc_sync_source(int cpu)
 	if (unsynchronized_tsc())
 		return;
 
+	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
+		printk(KERN_INFO
+		       "Skipping synchronization checks as TSC is reliable.\n");
+		return;
+	}
+
 	printk(KERN_INFO "checking TSC synchronization [CPU#%d -> CPU#%d]:",
 			  smp_processor_id(), cpu);
 
@@ -161,7 +167,7 @@ void __cpuinit check_tsc_sync_target(void)
 {
 	int cpus = 2;
 
-	if (unsynchronized_tsc())
+	if (unsynchronized_tsc() || boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
 		return;
 
 	/*
diff --git a/include/asm-x86/vmware.h b/include/asm-x86/vmware.h
index 02dfea5..c11b7e1 100644
--- a/include/asm-x86/vmware.h
+++ b/include/asm-x86/vmware.h
@@ -22,5 +22,6 @@
 
 extern unsigned long vmware_get_tsc_khz(void);
 extern int vmware_platform(void);
+extern void vmware_set_feature_bits(struct cpuinfo_x86 *c);
 
 #endif
-------------- next part --------------
commit 395628ef4ea12ff0748099f145363b5e33c69acb

From: Alok Kataria <akataria at vmware.com>

x86: Skip verification by the watchdog for TSC clocksource.

    Impact: Changes timekeeping on Vmware (or with tsc=reliable).

    This is achieved by resetting the CLOCKSOURCE_MUST_VERIFY flag.

    We add a tsc=reliable commandline option to enable this.
    This enables legacy hardware without HPET, LAPIC, or ACPI timers
    to enter high-resolution timer mode.

    Along with that have extended this to be used in virtualization environement
    too. Now we also set this flag if the X86_FEATURE_TSC_RELIABLE bit is set.

    This is important since there is a wrap-around problem with the acpi_pm timer.
    The acpi_pm counter is just 24bits and this can overflow in ~4 seconds. With
    the NO_HZ kernels in virtualized environment, there can be situations when
    the guest is descheduled for longer duration, as a result we may miss the wrap
    of the acpi counter. When TSC is used as a clocksource and acpi_pm timer is
    being used as the watchdog clocksource this error in acpi_pm results in TSC
    being marked as unstable, and essentially results in time dropping in chunks
    of 4 seconds whenever this wrap is missed. Since the virtualized TSC is
    reliable on VMware, we should always use the TSCs clocksource on VMware, so
    we skip the verfication at runtime, by checking for the feature bit.

    Since we reset the flag for mgeode systems too, i have combined
    the mgeode case with the feature bit check.

    Signed-off-by: Jeff Hansen <jhansen at cardaccess-inc.com>
    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: Dan Hecht <dhecht at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 Documentation/kernel-parameters.txt |    7 +++++++
 arch/x86/kernel/tsc.c               |   33 +++++++++++++++++++++------------
 2 files changed, 28 insertions(+), 12 deletions(-)


diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4e0d37d..a3506d2 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2197,6 +2197,13 @@ and is between 256 and 4096 characters. It is defined in the file
 			Format:
 			<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>
 
+	tsc=		Disable clocksource-must-verify flag for TSC.
+			Format: <string>
+			[x86] reliable: mark tsc clocksource as reliable, this
+			disables clocksource verification at runtime.
+			Used to enable high-resolution timer mode on older
+			hardware, and in virtualized environment.
+
 	turbografx.map[2|3]=	[HW,JOY]
 			TurboGraFX parallel port interface
 			Format:
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e063537..93a4494 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -32,6 +32,7 @@ static int tsc_unstable;
    erroneous rdtsc usage on !cpu_has_tsc processors */
 static int tsc_disabled = -1;
 
+static int tsc_clocksource_reliable;
 /*
  * Scheduler clock - returns current time in nanosec units.
  */
@@ -99,6 +100,15 @@ int __init notsc_setup(char *str)
 
 __setup("notsc", notsc_setup);
 
+static int __init tsc_setup(char *str)
+{
+	if (!strcmp(str, "reliable"))
+		tsc_clocksource_reliable = 1;
+	return 1;
+}
+
+__setup("tsc=", tsc_setup);
+
 #define MAX_RETRIES     5
 #define SMI_TRESHOLD    50000
 
@@ -564,24 +574,21 @@ static struct dmi_system_id __initdata bad_tsc_dmi_table[] = {
 	{}
 };
 
-/*
- * Geode_LX - the OLPC CPU has a possibly a very reliable TSC
- */
+static void __init check_system_tsc_reliable(void)
+{
 #ifdef CONFIG_MGEODE_LX
-/* RTSC counts during suspend */
+	/* RTSC counts during suspend */
 #define RTSC_SUSP 0x100
-
-static void __init check_geode_tsc_reliable(void)
-{
 	unsigned long res_low, res_high;
 
 	rdmsr_safe(MSR_GEODE_BUSCONT_CONF0, &res_low, &res_high);
+	/* Geode_LX - the OLPC CPU has a possibly a very reliable TSC */
 	if (res_low & RTSC_SUSP)
-		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-}
-#else
-static inline void check_geode_tsc_reliable(void) { }
+		tsc_clocksource_reliable = 1;
 #endif
+	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
+		tsc_clocksource_reliable = 1;
+}
 
 /*
  * Make an educated guess if the TSC is trustworthy and synchronized
@@ -616,6 +623,8 @@ static void __init init_tsc_clocksource(void)
 {
 	clocksource_tsc.mult = clocksource_khz2mult(tsc_khz,
 			clocksource_tsc.shift);
+	if (tsc_clocksource_reliable)
+		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
 	/* lower the rating if we already know its unstable: */
 	if (check_tsc_unstable()) {
 		clocksource_tsc.rating = 0;
@@ -676,7 +685,7 @@ void __init tsc_init(void)
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
 
-	check_geode_tsc_reliable();
+	check_system_tsc_reliable();
 	init_tsc_clocksource();
 }
 
-------------- next part --------------
commit 6bdbfe99916398dbb28d83833cc04757110f2738

From: Alok Kataria <akataria at vmware.com>

x86: VMware: Fix vmware_get_tsc code

    Impact: Fix possible failure to calibrate the TSC on Vmware near 4 GHz

    The current version of the code to get the tsc frequency from
    the VMware hypervisor, will be broken on processor with frequency
    (4G-1) HZ, because on such processors eax will have UINT_MAX
    and that would be legitimate.
    We instead check that EBX did change to decide if we were able to
    read the frequency from the hypervisor.

    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 arch/x86/kernel/cpu/vmware.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 2ac4394..a0905ec 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -36,7 +36,7 @@
 			"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :	\
 			"0"(VMWARE_HYPERVISOR_MAGIC),			\
 			"1"(VMWARE_PORT_CMD_##cmd),			\
-			"2"(VMWARE_HYPERVISOR_PORT), "3"(0) :		\
+			"2"(VMWARE_HYPERVISOR_PORT), "3"(UINT_MAX) :	\
 			"memory");
 
 static inline int __vmware_platform(void)
@@ -53,7 +53,7 @@ static unsigned long __vmware_get_tsc_khz(void)
 
         VMWARE_PORT(GETHZ, eax, ebx, ecx, edx);
 
-        if (eax == (uint32_t)-1)
+        if (ebx == UINT_MAX)
                 return 0;
         tsc_hz = eax | (((uint64_t)ebx) << 32);
         do_div(tsc_hz, 1000);
-------------- next part --------------
commit fd8cd7e1919fc1c27fe2fdccd2a1cd32f791ef0f

From: Alok Kataria <akataria at vmware.com>

x86: vmware: look for DMI string in the product serial key

    Impact: Should permit VMware detection on older platforms where the
    vendor is changed.  Could theoretically cause a regression if some
    weird serial number scheme contains the string "VMware" by pure
    chance.  Seems unlikely, especially with the mixed case.

    In some user configured cases, VMware may choose not to put a VMware specific
    DMI string, but the product serial key is always there and is VMware specific.
    Add a interface to check the serial key, when checking for VMware in the DMI
    information.

    Signed-off-by: Alok N Kataria <akataria at vmware.com>
    Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---

 arch/x86/kernel/cpu/vmware.c |    7 ++++++-
 drivers/firmware/dmi_scan.c  |   11 +++++++++++
 include/linux/dmi.h          |    2 ++
 3 files changed, 19 insertions(+), 1 deletions(-)


diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index a0905ec..c034bda 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -61,6 +61,11 @@ static unsigned long __vmware_get_tsc_khz(void)
         return tsc_hz;
 }
 
+/*
+ * While checking the dmi string infomation, just checking the product
+ * serial key should be enough, as this will always have a VMware
+ * specific string when running under VMware hypervisor.
+ */
 int vmware_platform(void)
 {
 	if (cpu_has_hypervisor) {
@@ -74,7 +79,7 @@ int vmware_platform(void)
 		hyper_vendor_id[12] = '\0';
 		if (!strcmp(hyper_vendor_id, "VMwareVMware"))
 			return 1;
-	} else if (dmi_available && dmi_name_in_vendors("VMware") &&
+	} else if (dmi_available && dmi_name_in_serial("VMware") &&
 		   __vmware_platform())
 		return 1;
 
diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 455575b..4dd780c 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -457,6 +457,17 @@ const char *dmi_get_system_info(int field)
 }
 EXPORT_SYMBOL(dmi_get_system_info);
 
+/**
+ *	dmi_name_in_serial - 	Check if string is in the DMI product serial
+ *				information.
+ */
+int dmi_name_in_serial(const char *str)
+{
+	int f = DMI_PRODUCT_SERIAL;
+	if (dmi_ident[f] && strstr(dmi_ident[f], str))
+		return 1;
+	return 0;
+}
 
 /**
  *	dmi_name_in_vendors - Check if string is anywhere in the DMI vendor information.
diff --git a/include/linux/dmi.h b/include/linux/dmi.h
index 2a063b6..098e292 100644
--- a/include/linux/dmi.h
+++ b/include/linux/dmi.h
@@ -81,6 +81,7 @@ extern const struct dmi_device * dmi_find_device(int type, const char *name,
 extern void dmi_scan_machine(void);
 extern int dmi_get_year(int field);
 extern int dmi_name_in_vendors(const char *str);
+extern int dmi_name_in_serial(const char *str);
 extern int dmi_available;
 extern int dmi_walk(void (*decode)(const struct dmi_header *));
 
@@ -93,6 +94,7 @@ static inline const struct dmi_device * dmi_find_device(int type, const char *na
 static inline void dmi_scan_machine(void) { return; }
 static inline int dmi_get_year(int year) { return 0; }
 static inline int dmi_name_in_vendors(const char *s) { return 0; }
+static inline int dmi_name_in_serial(const char *s) { return 0; }
 #define dmi_available 0
 static inline int dmi_walk(void (*decode)(const struct dmi_header *))
 	{ return -1; }


More information about the kernel-team mailing list