[PATCH 0/1] Switch to jiffies for native_sched_clock() when TSC warps

Wed Mar 10 22:56:18 UTC 2010

On Wed, Mar 10, 2010 at 5:37 PM, Stefan Bader
<stefan.bader at canonical.com> wrote:
> Chase Douglas wrote:
>> I took a look at the x86 code handling the clock to see what could be done
>> about the TSC warping coming out of resume on some of the newer processors. The
>> code includes a built-in fallback path that uses the jiffies count instead of
>> the TSC register if "notsc" is used on the command line. This patch merely sets
>> this option at runtime if two TSC time stamps differ by more than 6 years.
>>
>> I'm sending this here first because I've not touched clocking code before. I'm
>> not sure whether this is a feasible approach, and I would like feedback. Note
>> that the TSC warping hasn't caused any noticeable issues beyond triggering some
>> oops messages, so even if there's some skew in the switch from TSC to jiffies
>> it should hopefully not cause too much of an issue.
>>
>> The only truly negative outcome I foresee is that the clock won't be stable on
>> a single CPU. Programs needing accurate clock timing can pin themselves to a
>> single CPU in order to get TSC time stamps that are monotonic and accurate (The
>> TSC register is per cpu, and there may be skew between CPUs). However, if the
>> TSC has warped we are beyond that point anyways. If you have a warping
>> processor you should run with notsc if you care about accuracy, even though
>> precision would be reduced.
>>
>>
>
> From my feeling, to change the sched_clock to jiffies after resume sounds not
> like a good idea. What was wrong with Colin's approach of just fixing the math?

Colin's patch fixes soft lockup bugs from being fired. That's fixing
merely one symptom, but not the real problem. There are other paths
that are causing oops messages [1]. Further bugs may be caused by TSC
warping that we just haven't seen yet.

Also, the TSC warping issue seems more prevalent than first thought.
Originally, Colin believed the issue was confined to new Arrandale
processors, but we're seeing the issue hit Core 2 processors as well
[1].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/535077