[apparmor] [PATCH] [parser] Fix jobs not scaling up to meet available resources when cpus are brought online during compilation

Christian Boltz apparmor at cboltz.de
Tue Apr 5 22:02:13 UTC 2016


Hello,

Am Dienstag, 5. April 2016, 14:16:01 CEST schrieb John Johansen:
> On 04/05/2016 01:51 PM, Christian Boltz wrote:
> > Am Dienstag, 5. April 2016, 13:22:19 CEST schrieb Seth Arnold:
> >> On Tue, Apr 05, 2016 at 12:37:07PM -0700, John Johansen wrote:
> >>> Enable dynamically scaling max jobs if new resources are brought
> >>> online
> >>> 
> >>> BugLink: http://bugs.launchpad.net/bugs/1566490
> >>> 
> >>> This patch enables to parser to scale the max jobs if new
> >>> resources
> >>> are being brought online by the scheduler.
> >>> 
> >>> It only enables the scaling check if there is a difference between
> >>> the maximum number of cpus (CONF) and the number of online (ONLN)
> >>> cpus.
> >>> 
> >>> Instead of checking for more resources regardless, of whether the
> >>> online cpu count is increasing it limits its checking to a maximum
> >>> of MAX CPUS + 1 - ONLN cpus times. With each check coming after
> >>> fork spawns a new work unit, giving the scheduler a chance to
> >>> bring
> >>> new cpus online before the next check.  The +1 ensures the checks
> >>> will be done at least once after the scheduling task sleeps
> >>> waiting
> >>> for its children giving the scheduler an extra chance to bring
> >>> cpus
> >>> online.
> > 
> > Will it also reduce the number of processes if some CPUs are sent to
> > vacation?
> 
> It does not. This is specifically addressing the case of the hotplug
> governor (and a few other ones in use on mobile devices), which
> offlines cpus when load is low, and then brings them back online as
> load ramps up.
> 
> I was also trying to minimize the cost of the check, by limiting the
> number of times we call out to check how many cpus are available. Its
> extra overhead that really isn't needed on the devices where we are
> seeing this problem. So the simple solution of just check every
> time isn't ideal.
> 
> The reverse case of cpus going offline while load is high seems some
> what degenerate, and is a case where I am willing to live with a few
> extra processes hanging around. Hopefully its not a common case
> and would only result in one or two extra processes.

Agreed. If you switch off CPUs, you want things to become slower ;-)

> >>> Signed-off-by: John Johansen <john.johansen at canonical.com>
> >> 
> >> This feels more complicated than it could be but I must admit I
> >> can't
> >> suggest any modifications to the algorithm to simplify it.
> > 
> > It sounds too simple, and it might start too many jobs in some
> > cases,
> > but - why not use the total number of CPUs from the beginning
> > instead of the currently online CPUs?
> > 
> > The only possible disadvantage is running "too many" jobs - would
> > that do any harm?
> 
> it does, too many jobs actually slows things down. I am however
> willing to revisit this when we manage to convert to true threading
> instead of the fork model we are using today.  Then we could
> preallocate all possible threads and just not use them if it would
> cause contention.
> 
> Note, also this patch does not deal with cgroups and actual number
> of cpus available to be used, which could be less than what is
> online. I need to spend some time evaluating the best solution
> for doing this.

I wonder if we really need to implement this ourself - finding out how 
many threads should be used / how many CPUs are actually available 
sounds like something that has been done before ;-)

> We could use pthread_getaffinity_np() which is probably the best
> solution and we are already linking against pthreads because of the
> library, but we want to go directly to sched_getaffinity(), or maybe
> there is something else I haven't hit yet.

Sounds like the "has been done before" code I was looking for. 

I'd recommend to swith to one of those functions instead of improving 
our re-invented wheel even more ;-)  - but that shouldn't stop you from 
commiting this patch.


Regards,

Christian Boltz
-- 
Linux: und wo bitte ist mein blauer Bildschirm?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.ubuntu.com/archives/apparmor/attachments/20160406/92fddafe/attachment.pgp>


More information about the AppArmor mailing list