[PATCH v3?] sched: update load count only once per cpu in 10 tick update window
Stefan Bader
stefan.bader at canonical.com
Mon Apr 12 15:18:32 UTC 2010
Chase Douglas wrote:
> On Thu, Apr 8, 2010 at 12:02 PM, Chase Douglas
> <chase.douglas at canonical.com> wrote:
>> There's a period of 10 ticks where calc_load_tasks is updated by all the
>> cpus for the load avg. Usually all the cpus do this during the first
>> tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
>> However, if they wake up calc_load_tasks is not incremented. Thus, if
>> cpus go idle during the 10 tick period, calc_load_tasks may be
>> decremented to a non-representative value. This issue can lead to
>> systems having a load avg of exactly 0, even though the real load avg
>> could theoretically be up to NR_CPUS.
>>
>> This change defers calc_load_tasks accounting after each cpu updates the
>> count until after the 10 tick update window.
>>
>> A few points:
>>
>> * A global atomic deferral counter, and not per-cpu vars, is needed
>> because a cpu may go NOHZ idle and not be able to update the global
>> calc_load_tasks variable for subsequent load calculations.
>> * It is not enough to add calls to account for the load when a cpu is
>> awakened:
>> - Load avg calculation must be independent of cpu load.
>> - If a cpu is awakend by one tasks, but then has more scheduled before
>> the end of the update window, only the first task will be accounted.
>>
>> BugLink: http://bugs.launchpad.net/bugs/513848
>>
>> Signed-off-by: Chase Douglas <chase.douglas at canonical.com>
>> ---
>> kernel/sched.c | 24 ++++++++++++++++++++++--
>> 1 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 81ede13..c372249 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -2967,6 +2967,7 @@ unsigned long nr_iowait(void)
>>
>> /* Variables and functions for calc_load */
>> static atomic_long_t calc_load_tasks;
>> +static atomic_long_t calc_load_tasks_deferred;
>> static unsigned long calc_load_update;
>> unsigned long avenrun[3];
>> EXPORT_SYMBOL(avenrun);
>> @@ -3021,7 +3022,7 @@ void calc_global_load(void)
>> */
>> static void calc_load_account_active(struct rq *this_rq)
>> {
>> - long nr_active, delta;
>> + long nr_active, delta, deferred;
>>
>> nr_active = this_rq->nr_running;
>> nr_active += (long) this_rq->nr_uninterruptible;
>> @@ -3029,6 +3030,25 @@ static void calc_load_account_active(struct rq *this_rq)
>> if (nr_active != this_rq->calc_load_active) {
>> delta = nr_active - this_rq->calc_load_active;
>> this_rq->calc_load_active = nr_active;
>> +
>> + /*
>> + * Update calc_load_tasks only once per cpu in 10 tick update
>> + * window.
>> + */
>> + if (unlikely(time_before(jiffies, this_rq->calc_load_update) &&
>> + time_after_eq(jiffies, calc_load_update))) {
>> + if (delta)
>> + atomic_long_add(delta,
>> + &calc_load_tasks_deferred);
>> + return;
>> + }
>> +
>> + if (atomic_long_read(&calc_load_tasks_deferred)) {
>> + deferred = atomic_long_xchg(&calc_load_tasks_deferred,
>> + 0);
>> + delta += deferred;
>> + }
>> +
>> atomic_long_add(delta, &calc_load_tasks);
>> }
>> }
>> @@ -3072,8 +3092,8 @@ static void update_cpu_load(struct rq *this_rq)
>> }
>>
>> if (time_after_eq(jiffies, this_rq->calc_load_update)) {
>> - this_rq->calc_load_update += LOAD_FREQ;
>> calc_load_account_active(this_rq);
>> + this_rq->calc_load_update += LOAD_FREQ;
>> }
>> }
>>
>> --
>> 1.7.0
>
> Now that this has been accepted for Lucid, please consider this for
> Karmic as well. SRU justification can be found in bug 513848. Thanks!
>
> -- Chase
>
If you could just resend the patch with a karmic header. Then I can assign it to
my todo queue in patchworks and will less likely forget it. Thanks
Stefan
More information about the kernel-team
mailing list