[apparmor] [PATCH] [parser] Fix jobs not scaling up to meet available resources when cpus are brought online during compilation
john.johansen at canonical.com
Tue Apr 5 22:21:44 UTC 2016
On 04/05/2016 03:02 PM, Christian Boltz wrote:
> Am Dienstag, 5. April 2016, 14:16:01 CEST schrieb John Johansen:
>> On 04/05/2016 01:51 PM, Christian Boltz wrote:
>>> Am Dienstag, 5. April 2016, 13:22:19 CEST schrieb Seth Arnold:
>>>> On Tue, Apr 05, 2016 at 12:37:07PM -0700, John Johansen wrote:
>>>>> Enable dynamically scaling max jobs if new resources are brought
>>>>> BugLink: http://bugs.launchpad.net/bugs/1566490
>>>>> This patch enables to parser to scale the max jobs if new
>>>>> are being brought online by the scheduler.
>>>>> It only enables the scaling check if there is a difference between
>>>>> the maximum number of cpus (CONF) and the number of online (ONLN)
>>>>> Instead of checking for more resources regardless, of whether the
>>>>> online cpu count is increasing it limits its checking to a maximum
>>>>> of MAX CPUS + 1 - ONLN cpus times. With each check coming after
>>>>> fork spawns a new work unit, giving the scheduler a chance to
>>>>> new cpus online before the next check. The +1 ensures the checks
>>>>> will be done at least once after the scheduling task sleeps
>>>>> for its children giving the scheduler an extra chance to bring
>>> Will it also reduce the number of processes if some CPUs are sent to
>> It does not. This is specifically addressing the case of the hotplug
>> governor (and a few other ones in use on mobile devices), which
>> offlines cpus when load is low, and then brings them back online as
>> load ramps up.
>> I was also trying to minimize the cost of the check, by limiting the
>> number of times we call out to check how many cpus are available. Its
>> extra overhead that really isn't needed on the devices where we are
>> seeing this problem. So the simple solution of just check every
>> time isn't ideal.
>> The reverse case of cpus going offline while load is high seems some
>> what degenerate, and is a case where I am willing to live with a few
>> extra processes hanging around. Hopefully its not a common case
>> and would only result in one or two extra processes.
> Agreed. If you switch off CPUs, you want things to become slower ;-)
>>>>> Signed-off-by: John Johansen <john.johansen at canonical.com>
>>>> This feels more complicated than it could be but I must admit I
>>>> suggest any modifications to the algorithm to simplify it.
>>> It sounds too simple, and it might start too many jobs in some
>>> but - why not use the total number of CPUs from the beginning
>>> instead of the currently online CPUs?
>>> The only possible disadvantage is running "too many" jobs - would
>>> that do any harm?
>> it does, too many jobs actually slows things down. I am however
>> willing to revisit this when we manage to convert to true threading
>> instead of the fork model we are using today. Then we could
>> preallocate all possible threads and just not use them if it would
>> cause contention.
>> Note, also this patch does not deal with cgroups and actual number
>> of cpus available to be used, which could be less than what is
>> online. I need to spend some time evaluating the best solution
>> for doing this.
> I wonder if we really need to implement this ourself - finding out how
> many threads should be used / how many CPUs are actually available
> sounds like something that has been done before ;-)
>> We could use pthread_getaffinity_np() which is probably the best
>> solution and we are already linking against pthreads because of the
>> library, but we want to go directly to sched_getaffinity(), or maybe
>> there is something else I haven't hit yet.
> Sounds like the "has been done before" code I was looking for.
> I'd recommend to swith to one of those functions instead of improving
> our re-invented wheel even more ;-) - but that shouldn't stop you from
> commiting this patch.
well eventually, the plan is to move to pthreads, or maybe even cilk,
but we have some work to eliminate globals before we can safely switch to
a threading model.
More information about the AppArmor