[PATCH] fix TSC clocksource bug

Tim Gardner tcanonical at tpi.com
Thu Apr 24 13:14:10 UTC 2008


Alok Kataria wrote:
> On Wed, 2008-04-23 at 14:16 -0600, Tim Gardner wrote:
>> Alok Kataria wrote:
>>> Hi Tim,
>>>
>>> I was curious whether Ubuntu will be taking this fix in its Hardy
>>> release.
>>>
>>> This new fix for TSC issue was yesterday merged in the X86 git tree. It
>>> works well to keep TSC's synced and at the same time keeps Software
>>> Suspend happy too.
>>>
>>> Thanks,
>>> Alok
>>>
>>> On Sat, 2008-04-12 at 13:50 -0700, Dan Hecht wrote:
>>>> On 04/11/2008 03:51 PM, William Grant wrote:
>>>>> Tim Gardner wrote:
>>>>>> [snip]
>>>>>> =20
>>>>>> OK - reverted.
>>>>> Hm, can we please have another fix, then? The current workaround (adding
>>>>> `clocksource=3Dhpet' to the kernel line) is simple, but it looks rather
>>>>> bad to by default have pause of more than 30 seconds on boot, all the
>>>>> while making an annoying beeping noise...
>>>>>
>>>>> --=20
>>>>> William Grant
>>>>>
>>>>>
>>>> There is a new patch available that resolves the software suspend issue 
>>>> introduced with the earlier patch.  Please see 
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=10410.
>>>>
>>>> I agree that it would be great to get this patch into Hardy as soon as 
>>>> possible since the TSC clocksource bug can result in hangs and 
>>>> timekeeping/timers not working properly.
>>>>
>>>> Dan
>>>>
>> The cherry-pick from upstream didn't work so well. Do you suppose you 
>> could backport it to Hardy and submit a patch? Thanks.
>>
> Hi Tim,
> 
> Attached is the backport, just one hunk was failing and nothing changed
> functionality wise so haven't done a lot of testing. This is just boot
> tested. 
> 
> Also CC'ed Thomas Gleixner, the original author of this fix.
> 
> Thanks,
> Alok
> 
> --
> This is a backport of the patch submitted by Thomas Gleixner to the
> x86 git tree, commit d8bb6f4c1670c8324e4135c61ef07486f7f17379 
> 
> Comments from the original post :
> 
> We already catch most of the TSC problems by sanity checks, but there
> is a subtle bug which has been in the code for ever. This can cause
> time jumps in the range of hours.
> 
> This was reported in:
>      http://lkml.org/lkml/2007/8/23/96
> and
>      http://lkml.org/lkml/2008/3/31/23
> 
> I was able to reproduce the problem with a gettimeofday loop test on a
> dual core and a quad core machine which both have sychronized
> TSCs. The TSCs seems not to be perfectly in sync though, but the
> kernel is not able to detect the slight delta in the sync check. Still
> there exists an extremly small window where this delta can be observed
> with a real big time jump. So far I was only able to reproduce this
> with the vsyscall gettimeofday implementation, but in theory this
> might be observable with the syscall based version as well.
> 
> CPU 0 updates the clock source variables under xtime/vyscall lock and
> CPU1, where the TSC is slighty behind CPU0, is reading the time right
> after the seqlock was unlocked.
> 
> The clocksource reference data was updated with the TSC from CPU0 and
> the value which is read from TSC on CPU1 is less than the reference
> data. This results in a huge delta value due to the unsigned
> subtraction of the TSC value and the reference value. This algorithm
> can not be changed due to the support of wrapping clock sources like
> pm timer.
> 
> The huge delta is converted to nanoseconds and added to xtime, which
> is then observable by the caller. The next gettimeofday call on CPU1
> will show the correct time again as now the TSC has advanced above the
> reference value.
> 
> To prevent this TSC specific wreckage we need to compare the TSC value
> against the reference value and return the latter when it is larger
> than the actual TSC value.
> 
> I pondered to mark the TSC unstable when the readout is smaller than
> the reference value, but this would render an otherwise good and fast
> clocksource unusable without a real good reason.
> 
> Signed-off-by: Alok Kataria <akataria at vmware.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
> 
> ---
>  arch/x86/kernel/tsc_32.c  |   15 ++++++++++++++-
>  arch/x86/kernel/tsc_64.c  |   23 ++++++++++++++++++++---
>  kernel/time/timekeeping.c |    2 ++
>  3 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..1b50e60 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -268,14 +268,27 @@ core_initcall(cpufreq_tsc);
>  /* clock source code */
>  
>  static unsigned long current_tsc_khz = 0;
> +static struct clocksource clocksource_tsc;
>  
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp issue. This can be observed in
> + * a very small window right after one CPU updated cycle_last under
> + * xtime lock and the other CPU reads a TSC value which is smaller
> + * than the cycle_last reference value due to a TSC which is slighty
> + * behind. This delta is nowhere else observable, but in that case it
> + * results in a forward time jump in the range of hours due to the
> + * unsigned delta calculation of the time keeping core code, which is
> + * necessary to support wrapping clocksources like pm timer.
> + */
>  static cycle_t read_tsc(void)
>  {
>  	cycle_t ret;
>  
>  	rdtscll(ret);
>  
> -	return ret;
> +	return ret >= clocksource_tsc.cycle_last ?
> +		ret : clocksource_tsc.cycle_last;
>  }
>  
>  static struct clocksource clocksource_tsc = {
> diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> index 9c70af4..6e8bd1a 100644
> --- a/arch/x86/kernel/tsc_64.c
> +++ b/arch/x86/kernel/tsc_64.c
> @@ -10,6 +10,7 @@
>  
>  #include <asm/hpet.h>
>  #include <asm/timex.h>
> +#include <asm/vgtod.h>
>  
>  static int notsc __initdata = 0;
>  
> @@ -246,18 +247,34 @@ int __init notsc_setup(char *s)
>  
>  __setup("notsc", notsc_setup);
>  
> +static struct clocksource clocksource_tsc;
>  
> -/* clock source code: */
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp. This can be observed in a
> + * very small window right after one CPU updated cycle_last under
> + * xtime/vsyscall_gtod lock and the other CPU reads a TSC value which
> + * is smaller than the cycle_last reference value due to a TSC which
> + * is slighty behind. This delta is nowhere else observable, but in
> + * that case it results in a forward time jump in the range of hours
> + * due to the unsigned delta calculation of the time keeping core
> + * code, which is necessary to support wrapping clocksources like pm
> + * timer.
> + */
>  static cycle_t read_tsc(void)
>  {
>  	cycle_t ret = (cycle_t)get_cycles_sync();
> -	return ret;
> +
> +	return ret >= clocksource_tsc.cycle_last ?
> +		ret : clocksource_tsc.cycle_last;
>  }
>  
>  static cycle_t __vsyscall_fn vread_tsc(void)
>  {
>  	cycle_t ret = (cycle_t)get_cycles_sync();
> -	return ret;
> +
> +	return ret >= __vsyscall_gtod_data.clock.cycle_last ?
> +		ret : __vsyscall_gtod_data.clock.cycle_last;
>  }
>  
>  static struct clocksource clocksource_tsc = {
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index e5e466b..9f97e53 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -189,6 +189,7 @@ static void change_clocksource(void)
>  	if (clock == new)
>  		return;
>  
> +	new->cycle_last = 0;
>  	now = clocksource_read(new);
>  	nsec =  __get_nsec_offset();
>  	timespec_add_ns(&xtime, nsec);
> @@ -301,6 +302,7 @@ static int timekeeping_resume(struct sys_device *dev)
>  	/* Make sure that we have the correct xtime reference */
>  	timespec_add_ns(&xtime, timekeeping_suspend_nsecs);
>  	/* re-base the last cycle value */
> +	clock->cycle_last = 0;
>  	clock->cycle_last = clocksource_read(clock);
>  	clock->error = 0;
>  	timekeeping_suspended = 0;
> 

applied, milestoned for 8.04.1 (July 3)

-- 
Tim Gardner tim.gardner at ubuntu.com




More information about the kernel-team mailing list