[apparmor] [PATCH] [parser] Fix jobs not scaling up to meet available resources when cpus are brought online during compilation

Tue Apr 5 22:21:44 UTC 2016

On 04/05/2016 03:02 PM, Christian Boltz wrote:
> Hello,
> 
> Am Dienstag, 5. April 2016, 14:16:01 CEST schrieb John Johansen:
>> On 04/05/2016 01:51 PM, Christian Boltz wrote:
>>> Am Dienstag, 5. April 2016, 13:22:19 CEST schrieb Seth Arnold:
>>>> On Tue, Apr 05, 2016 at 12:37:07PM -0700, John Johansen wrote:
>>>>> Enable dynamically scaling max jobs if new resources are brought
>>>>> online
>>>>>
>>>>> BugLink: http://bugs.launchpad.net/bugs/1566490
>>>>>
>>>>> This patch enables to parser to scale the max jobs if new
>>>>> resources
>>>>> are being brought online by the scheduler.
>>>>>
>>>>> It only enables the scaling check if there is a difference between
>>>>> the maximum number of cpus (CONF) and the number of online (ONLN)
>>>>> cpus.
>>>>>
>>>>> Instead of checking for more resources regardless, of whether the
>>>>> online cpu count is increasing it limits its checking to a maximum
>>>>> of MAX CPUS + 1 - ONLN cpus times. With each check coming after
>>>>> fork spawns a new work unit, giving the scheduler a chance to
>>>>> bring
>>>>> new cpus online before the next check.  The +1 ensures the checks
>>>>> will be done at least once after the scheduling task sleeps
>>>>> waiting
>>>>> for its children giving the scheduler an extra chance to bring
>>>>> cpus
>>>>> online.
>>>
>>> Will it also reduce the number of processes if some CPUs are sent to
>>> vacation?
>>
>> It does not. This is specifically addressing the case of the hotplug
>> governor (and a few other ones in use on mobile devices), which
>> offlines cpus when load is low, and then brings them back online as
>> load ramps up.
>>
>> I was also trying to minimize the cost of the check, by limiting the
>> number of times we call out to check how many cpus are available. Its
>> extra overhead that really isn't needed on the devices where we are
>> seeing this problem. So the simple solution of just check every
>> time isn't ideal.
>>
>> The reverse case of cpus going offline while load is high seems some
>> what degenerate, and is a case where I am willing to live with a few
>> extra processes hanging around. Hopefully its not a common case
>> and would only result in one or two extra processes.
> 
> Agreed. If you switch off CPUs, you want things to become slower ;-)
> 
>>>>> Signed-off-by: John Johansen <john.johansen at canonical.com>
>>>>
>>>> This feels more complicated than it could be but I must admit I
>>>> can't
>>>> suggest any modifications to the algorithm to simplify it.
>>>
>>> It sounds too simple, and it might start too many jobs in some
>>> cases,
>>> but - why not use the total number of CPUs from the beginning
>>> instead of the currently online CPUs?
>>>
>>> The only possible disadvantage is running "too many" jobs - would
>>> that do any harm?
>>
>> it does, too many jobs actually slows things down. I am however
>> willing to revisit this when we manage to convert to true threading
>> instead of the fork model we are using today.  Then we could
>> preallocate all possible threads and just not use them if it would
>> cause contention.
>>
>> Note, also this patch does not deal with cgroups and actual number
>> of cpus available to be used, which could be less than what is
>> online. I need to spend some time evaluating the best solution
>> for doing this.
> 
> I wonder if we really need to implement this ourself - finding out how 
> many threads should be used / how many CPUs are actually available 
> sounds like something that has been done before ;-)
> 
>> We could use pthread_getaffinity_np() which is probably the best
>> solution and we are already linking against pthreads because of the
>> library, but we want to go directly to sched_getaffinity(), or maybe
>> there is something else I haven't hit yet.
> 
> Sounds like the "has been done before" code I was looking for. 
> 
> I'd recommend to swith to one of those functions instead of improving 
> our re-invented wheel even more ;-)  - but that shouldn't stop you from 
> commiting this patch.
> 
well eventually, the plan is to move to pthreads, or maybe even cilk,
but we have some work to eliminate globals before we can safely switch to
a threading model.