[PATCH] fix TSC clocksource bug
Tim Gardner
tcanonical at tpi.com
Thu Apr 24 13:14:10 UTC 2008
Alok Kataria wrote:
> On Wed, 2008-04-23 at 14:16 -0600, Tim Gardner wrote:
>> Alok Kataria wrote:
>>> Hi Tim,
>>>
>>> I was curious whether Ubuntu will be taking this fix in its Hardy
>>> release.
>>>
>>> This new fix for TSC issue was yesterday merged in the X86 git tree. It
>>> works well to keep TSC's synced and at the same time keeps Software
>>> Suspend happy too.
>>>
>>> Thanks,
>>> Alok
>>>
>>> On Sat, 2008-04-12 at 13:50 -0700, Dan Hecht wrote:
>>>> On 04/11/2008 03:51 PM, William Grant wrote:
>>>>> Tim Gardner wrote:
>>>>>> [snip]
>>>>>> =20
>>>>>> OK - reverted.
>>>>> Hm, can we please have another fix, then? The current workaround (adding
>>>>> `clocksource=3Dhpet' to the kernel line) is simple, but it looks rather
>>>>> bad to by default have pause of more than 30 seconds on boot, all the
>>>>> while making an annoying beeping noise...
>>>>>
>>>>> --=20
>>>>> William Grant
>>>>>
>>>>>
>>>> There is a new patch available that resolves the software suspend issue
>>>> introduced with the earlier patch. Please see
>>>> http://bugzilla.kernel.org/show_bug.cgi?id=10410.
>>>>
>>>> I agree that it would be great to get this patch into Hardy as soon as
>>>> possible since the TSC clocksource bug can result in hangs and
>>>> timekeeping/timers not working properly.
>>>>
>>>> Dan
>>>>
>> The cherry-pick from upstream didn't work so well. Do you suppose you
>> could backport it to Hardy and submit a patch? Thanks.
>>
> Hi Tim,
>
> Attached is the backport, just one hunk was failing and nothing changed
> functionality wise so haven't done a lot of testing. This is just boot
> tested.
>
> Also CC'ed Thomas Gleixner, the original author of this fix.
>
> Thanks,
> Alok
>
> --
> This is a backport of the patch submitted by Thomas Gleixner to the
> x86 git tree, commit d8bb6f4c1670c8324e4135c61ef07486f7f17379
>
> Comments from the original post :
>
> We already catch most of the TSC problems by sanity checks, but there
> is a subtle bug which has been in the code for ever. This can cause
> time jumps in the range of hours.
>
> This was reported in:
> http://lkml.org/lkml/2007/8/23/96
> and
> http://lkml.org/lkml/2008/3/31/23
>
> I was able to reproduce the problem with a gettimeofday loop test on a
> dual core and a quad core machine which both have sychronized
> TSCs. The TSCs seems not to be perfectly in sync though, but the
> kernel is not able to detect the slight delta in the sync check. Still
> there exists an extremly small window where this delta can be observed
> with a real big time jump. So far I was only able to reproduce this
> with the vsyscall gettimeofday implementation, but in theory this
> might be observable with the syscall based version as well.
>
> CPU 0 updates the clock source variables under xtime/vyscall lock and
> CPU1, where the TSC is slighty behind CPU0, is reading the time right
> after the seqlock was unlocked.
>
> The clocksource reference data was updated with the TSC from CPU0 and
> the value which is read from TSC on CPU1 is less than the reference
> data. This results in a huge delta value due to the unsigned
> subtraction of the TSC value and the reference value. This algorithm
> can not be changed due to the support of wrapping clock sources like
> pm timer.
>
> The huge delta is converted to nanoseconds and added to xtime, which
> is then observable by the caller. The next gettimeofday call on CPU1
> will show the correct time again as now the TSC has advanced above the
> reference value.
>
> To prevent this TSC specific wreckage we need to compare the TSC value
> against the reference value and return the latter when it is larger
> than the actual TSC value.
>
> I pondered to mark the TSC unstable when the readout is smaller than
> the reference value, but this would render an otherwise good and fast
> clocksource unusable without a real good reason.
>
> Signed-off-by: Alok Kataria <akataria at vmware.com>
> CC: Thomas Gleixner <tglx at linutronix.de>
>
> ---
> arch/x86/kernel/tsc_32.c | 15 ++++++++++++++-
> arch/x86/kernel/tsc_64.c | 23 ++++++++++++++++++++---
> kernel/time/timekeeping.c | 2 ++
> 3 files changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..1b50e60 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -268,14 +268,27 @@ core_initcall(cpufreq_tsc);
> /* clock source code */
>
> static unsigned long current_tsc_khz = 0;
> +static struct clocksource clocksource_tsc;
>
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp issue. This can be observed in
> + * a very small window right after one CPU updated cycle_last under
> + * xtime lock and the other CPU reads a TSC value which is smaller
> + * than the cycle_last reference value due to a TSC which is slighty
> + * behind. This delta is nowhere else observable, but in that case it
> + * results in a forward time jump in the range of hours due to the
> + * unsigned delta calculation of the time keeping core code, which is
> + * necessary to support wrapping clocksources like pm timer.
> + */
> static cycle_t read_tsc(void)
> {
> cycle_t ret;
>
> rdtscll(ret);
>
> - return ret;
> + return ret >= clocksource_tsc.cycle_last ?
> + ret : clocksource_tsc.cycle_last;
> }
>
> static struct clocksource clocksource_tsc = {
> diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> index 9c70af4..6e8bd1a 100644
> --- a/arch/x86/kernel/tsc_64.c
> +++ b/arch/x86/kernel/tsc_64.c
> @@ -10,6 +10,7 @@
>
> #include <asm/hpet.h>
> #include <asm/timex.h>
> +#include <asm/vgtod.h>
>
> static int notsc __initdata = 0;
>
> @@ -246,18 +247,34 @@ int __init notsc_setup(char *s)
>
> __setup("notsc", notsc_setup);
>
> +static struct clocksource clocksource_tsc;
>
> -/* clock source code: */
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp. This can be observed in a
> + * very small window right after one CPU updated cycle_last under
> + * xtime/vsyscall_gtod lock and the other CPU reads a TSC value which
> + * is smaller than the cycle_last reference value due to a TSC which
> + * is slighty behind. This delta is nowhere else observable, but in
> + * that case it results in a forward time jump in the range of hours
> + * due to the unsigned delta calculation of the time keeping core
> + * code, which is necessary to support wrapping clocksources like pm
> + * timer.
> + */
> static cycle_t read_tsc(void)
> {
> cycle_t ret = (cycle_t)get_cycles_sync();
> - return ret;
> +
> + return ret >= clocksource_tsc.cycle_last ?
> + ret : clocksource_tsc.cycle_last;
> }
>
> static cycle_t __vsyscall_fn vread_tsc(void)
> {
> cycle_t ret = (cycle_t)get_cycles_sync();
> - return ret;
> +
> + return ret >= __vsyscall_gtod_data.clock.cycle_last ?
> + ret : __vsyscall_gtod_data.clock.cycle_last;
> }
>
> static struct clocksource clocksource_tsc = {
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index e5e466b..9f97e53 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -189,6 +189,7 @@ static void change_clocksource(void)
> if (clock == new)
> return;
>
> + new->cycle_last = 0;
> now = clocksource_read(new);
> nsec = __get_nsec_offset();
> timespec_add_ns(&xtime, nsec);
> @@ -301,6 +302,7 @@ static int timekeeping_resume(struct sys_device *dev)
> /* Make sure that we have the correct xtime reference */
> timespec_add_ns(&xtime, timekeeping_suspend_nsecs);
> /* re-base the last cycle value */
> + clock->cycle_last = 0;
> clock->cycle_last = clocksource_read(clock);
> clock->error = 0;
> timekeeping_suspended = 0;
>
applied, milestoned for 8.04.1 (July 3)
--
Tim Gardner tim.gardner at ubuntu.com
More information about the kernel-team
mailing list