[apparmor] [patch] v3 - parser: add basic support for parallel compiles and loads

Thu Dec 10 23:07:06 UTC 2015

On 12/10/2015 02:43 PM, Steve Beattie wrote:
> On Thu, Dec 10, 2015 at 12:52:53PM -0800, John Johansen wrote:
>> commit 7bb90276fd110c3da605e8633dc7a7d220f2ed26
>> Author: John Johansen <john.johansen at canonical.com>
>> Date:   Sat Oct 17 00:28:46 2015 -0700
>>
>>     parser: add basic support for parallel compiles and loads
>>     
>>     This adds a basic support for parallel compiles. It uses a fork()/wait
>>     model due to the parsers current dependence on global variables and
>>     structures. It has been setup in a similar manner to how cilk handles
>>     multithreading to make it easy to port to a managed thread model once
>>     the parser removes the dependence on global compute structures in the
>>     backend.
>>     
>>     This patch adds two new command line flags
>>       -j or --jobs  which follows the make syntax of specifying parallel jobs
>>          currently defaults to -jauto
>>          -j8 or --jobs=8			allows for 8 parallel jobs
>>          -j  or -jauto  or  --jobs=auto	sets the jobs to the # of cpus
>>          -j*4 or --jobs=*4			sets the jobs to # of cpus * 4
>>          -j*1 is equivalent to -jauto
>>     
>>       --max_jobs=n		allows setting hard cap on the number of jobs
>>         that can be specified by --jobs.
>>         It defaults to the number of processors in the system * 8. It supports
>>         the "auto" and "max" keywords, and using *n for a multiple of the
>>         available cpus.
>>     
>>     additionally the -d flag has been modified to take an optional parameter
>>     and
>>       --debug=jobs
>>     will output debug information for the job control logic.
>>     
>>     In light testing on one machine the job control logic provides a nice
>>     performance boost.  On an x86 test machine with 60 profiles in the
>>     /etc/apparmor.d/ directory, for the command
>>       time apparmor_parser -QT /etc/apparmor.d/
>>     
>>       old (equiv of -j1):
>>          real  0m10.968s
>>          user  0m10.888s
>>          sys   0m0.088s
>>     
>>       ubuntu parallel load using xargs:
>>          real  0m8.003s
>>          user  0m21.680s
>>          sys   0m0.216s
>>     
>>       -j:
>>          real  0m6.547s
>>          user  0m17.900s
>>          sys   0m0.132s
>>     
>>     Signed-off-by: John Johansen <john.johansen at canonical.com>
> 
> I've just started playing around with this, so not a full review. I did
> notice one thing, that was discussed a bit in the previous thread:
> 
>>>> +#define work_spawn(WORK, RESULT)					\
>>>> +do {									\
>>>> +	if (jobs_max == 1) {						\
>>>> +		/* no parallel work so avoid fork() overhead */		\
>>>> +		RESULT(WORK);						\
>>>> +		break;							\
>>>> +	}								\
>>>
>>> I'm not sure I like this optimization; even though it looks correct, I
>>> fear it would complicate future maintenance efforts. (If we want to use
>>> work_spawn to work on a single item, then call work_sync_one(), the whole
>>> thing will exit because of an error return from wait().) If it really
>>> saves a ton of time maybe it's worth the extra maintenance effort but I
>>> think even the tiniest of environments will still benefit from two jobs.
>>>
>> Do you really think, this 1 little condition tucked away is that much more
>> maintenance? The way it is set up, it is completely transparent to the
>> rest of the code.  That is we can delete it and the rest of the code doesn't
>> need to change.
>>
>> Notice this isn't about working on a single item, like work_sync_one()
>> would be for, this is you can have 1 job running (jobs_max == 1), we
>> still may be processing 50 profiles, they are just all done sequentially.
>>
>> So atm if you have jobs_max > 1 and only pass it a single profile the
>> parser does fork an process that job in a separate forked child.
> 
> Note that there is one behavioral difference by doing so: if a specific
> policy compilation fails in such a way that exit() is called, in the
> -j1 case it stops there, but in the -jN where N > 1 case it continues
> on attempting to load additional policies. That's a behavioral change
> that I don't think users would expect from switching the number of
> threads used.
> 
is this different than the current shell based parallel compile using
xargs -P, or any other parallel based situation?

I can't stop compiles/loads that are happening in parallel. The best we
can do is on error stop compiling/loading any new jobs. Is this what
you want