[REGRESSION 2.6.30][PATCH v3] sched: update load count only once per cpu in 10 tick update window
chase.douglas at canonical.com
Thu Apr 22 13:18:32 UTC 2010
On Thu, Apr 22, 2010 at 7:08 AM, Peter Zijlstra <peterz at infradead.org> wrote:
> On Tue, 2010-04-13 at 16:19 -0700, Chase Douglas wrote:
>> There's a period of 10 ticks where calc_load_tasks is updated by all the
>> cpus for the load avg. Usually all the cpus do this during the first
>> tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
>> However, if they wake up calc_load_tasks is not incremented. Thus, if
>> cpus go idle during the 10 tick period, calc_load_tasks may be
>> decremented to a non-representative value. This issue can lead to
>> systems having a load avg of exactly 0, even though the real load avg
>> could theoretically be up to NR_CPUS.
>> This change defers calc_load_tasks accounting after each cpu updates the
>> count until after the 10 tick update window.
>> A few points:
>> * A global atomic deferral counter, and not per-cpu vars, is needed
>> because a cpu may go NOHZ idle and not be able to update the global
>> calc_load_tasks variable for subsequent load calculations.
>> * It is not enough to add calls to account for the load when a cpu is
>> - Load avg calculation must be independent of cpu load.
>> - If a cpu is awakend by one tasks, but then has more scheduled before
>> the end of the update window, only the first task will be accounted.
> Ok, so delaying the whole ILB angle for now, the below is a similar
> approach to yours but with a more explicit code flow.
> Does that work for you?
This looks good. I'll run my test case to make sure it fixes the
scenario we hit, and then I'll ack it when I've confirmed it works.
More information about the kernel-team