[apparmor] [PATCH] Add compressed dfa matching routines to library, and a base test program

Mon Jan 11 10:51:15 UTC 2016

On 01/10/2016 07:22 AM, Christian Boltz wrote:
> Hello,
> 
> those tools look quite interesting :-)
> 
> Some comments and questions inline - I hope you don't need to answer 
> everything with "learn C!!!!" ;-)
> 
> Am Freitag, 8. Januar 2016 schrieb John Johansen:
>> diff --git a/devtools/Makefile b/devtools/Makefile
>> new file mode 100644
>> index 0000000..b0cd26e
>> --- /dev/null
>> +++ b/devtools/Makefile
> 
>> +test_re: test_re.o ../parser/libapparmor_re/libapparmor_re.a
>> ../parser/parser.h ../parser/common_optarg.h
>> ../parser/common_optarg.o ../parser/lib.o 
>> +	$(CXX) ${CXXFLAGS} -o $@
>> $^  -L ../libraries/libapparmor/src/.libs/ -static -lapparmor 
> 
> test_re.cc has
> 
>     #include "../parser/common_optarg.h"
>     #include "../parser/libapparmor_re/aare_rules.h"
> 
>     #include "../libraries/libapparmor/include/sys/apparmor.h"
>     #include "../libraries/libapparmor/src/hfa.h"
>     #include "../libraries/libapparmor/src/hfa_private.h"
> 
> Should those files (and their *.c counterparts) be added as make 
> dependency?
> 
yes

> Also, should the -L depend on USE_SYSTEM?
> 
>> +print_hfa: print_hfa.o
>> +	$(CC) ${CFLAGS} -o $@ $^  -L ../libraries/libapparmor/src/.libs/
>> -static -lapparmor
> 
> Should this -L depend on USE_SYSTEM?
> (I'm not sure which files are needed, so maybe I'm wrong ;-)
> 
well, not yet. Eventually we will get there but this being the first pass
I actually wanted to restrict it to the in tree builds atm

>> +print_hfa.o: print_hfa.c
>> +       $(CC) ${CFLAGS} -c -o $@ $<
> 
> print_hfa.c contains
>     #include "../libraries/libapparmor/src/hfa.h"
> which brings up two questions:
> - should hfa.h (and hfa.c) be added as make dependency?

yes

> - what about -L and USE_SYSTEM?
> 
not yet

>> diff --git a/devtools/README b/devtools/README
>> new file mode 100644
>> index 0000000..b16f6f0
>> --- /dev/null
>> +++ b/devtools/README
>> @@ -0,0 +1,25 @@
>> +Basic developer utilities not ready for or needed by regular users
> 
> I'd drop the "not ready for" part - if a tool is intended for regular 
> users, it should not end up in devtools IMHO.
> 
true, but I would say not really ready for most devs either. Really this
is mostly infrastructure work so we can start building tests, and do
so rudimentary debugging.

>> diff --git a/devtools/expr.txt b/devtools/expr.txt
>> new file mode 100644
>> index 0000000..4b7d12b
>> --- /dev/null
>> +++ b/devtools/expr.txt
>> @@ -0,0 +1,772 @@
>> +1 /
>> +1 /([^\0000])*\.[Bb][Mm][Pp]
>> +1 /([^\0000])*\.[Bb][Zz]2
>> +1 /([^\0000])*\.[Cc][Bb][7RZrz]
>> +1 /([^\0000])*\.[Dd][Jj][Vv][Uu]
>> +1 /([^\0000])*\.[Dd][Vv][Ii]
> 
> What's the reason for using that syntax (looks like "real" regexes or 
> PCRE) instead of AARE?
> 
because it is a PCRE syntax which is more powerful than what AARE can
express and what is actually used internally on the backend.

> I'd prefer to feed the devtools with AARE, since this is what we use in 
> the profiles. A requirement "variables must be expanded" would be ok.
> 
Nope, and nope.

This is targeting testing the backend dev work. It allows more and will
also start allowing "kernel" vars etc which won't be expanded.

Eventually we may open up some of the PCRE syntax to the front end
profiles, certainly if not the syntax some of the capabilities.

We do have some unit tests for converting between the two, but more tests
are always good

>> diff --git a/devtools/match.txt b/devtools/match.txt
>> new file mode 100644
>> index 0000000..549f43d
>> --- /dev/null
>> +++ b/devtools/match.txt
>> @@ -0,0 +1,14 @@
>> +1 /abdcd
>> +1 /defg
>> +1 /home/
>> +0 abc
>> +1
>> /asdfjhlnlwknrlwknklwenflwknfsam.,smdfakdflkafa;ldfkwekjlmnsmnvaslkdf
>> jwfnkaef +1 /alkoijnlkn lkasdfasfwedvasvsdvasv
>> +1 /lkasjdflkasjdflaskflj;kjwef;kjav;klansvlkjaoifjaskl;vnda;lsdvna
>> +1 /home/alskdf;jasldka;lsvnkasvlajk
>> hnqlkjwnlkjavn;asldkjvkalsjf;lasjkdf;lsdf +1
>> /home/a;dsjfk;lasjkfslafkj;aslfkjasljfk;alsfkj;slafkj;las +0
>> laksjdf;lasjkdf;aslkdfj;alsjkdf;alsf
>> +0
>> ;lasjkdf;lasjkdf;lajh;oiwfwiovnsadjkvbaskdvblavjbklwjvbaskdjvb;wiefuh
>> sa;jdkvb +0 jjjjjjasldfasldflll
>> +0 asdf
>> +0 a
> 
> Looks like the "I finally cleaned my keyboard" testcase ;-))
> 
hehe, not really so much of a testcase as a sample file of what the input
should look like.

>> diff --git a/devtools/print_hfa.c b/devtools/print_hfa.c
>> new file mode 100644
>> index 0000000..c5a5b65
>> --- /dev/null
>> +++ b/devtools/print_hfa.c
> 
>> +const char *short_options = "";
>> +struct option long_options[] = {
>> +	{NULL,		0, 0, 0},
>> +};
>> +
>> +static int process_args(int argc, char *argv[])
>> +{
>> +	int c, o;
>> +
>> +	while ((c = getopt_long(argc, argv, short_options, long_options,
>> &o)) != -1) { +		//long tmp;
>> +		switch (c) {
>> +		case 0:
>> +			fprintf(stderr, "Error in getopt_long handling\n");
>> +			exit(1);
>> +			break;
>> +		}
>> +
>> +	}
>> +
>> +	return optind;
>> +}
> 
> Silly question - what exactly does this function do? (well, parsing the 
> options, but... ;-)
> 
atm nothing, but I did have an option in there at one point and left the
stub code in because I do want to add some back in.

> Also, how can the error condition (case 0) happen? From reading 
> getopt_long(3), I'd guess it happens if a long option is matched - but 
> it's very unlikely that NULLL will ever match ;-)
> (Also, why do we need that NULL in long_options at all?)
> 
the null is a terminator, without it you can get strange option processing
based on random junk in memory, and yes we had a bug related to this once

> That said - what about a --help?
> 
yes that is one of the options that needs to be added

> Final question - it seems main() handles argv itsself, so is this 
> function called/used at all?
> 
it is not atm, however even when it is the remaining args that don't
match will be handled by main

>> +char *load_file(const char *path, ssize_t *fsize)
>> +{
>> +	struct stat stat;
>> +	char *buffer;
>> +	ssize_t remaining;
>> +	int fd, res;
>> +
>> +	fd = open(path, O_RDONLY);
>> +	if (fd == -1) {
>> +		fprintf(stderr, " Error '%s', failed to open '%s'\n",
>> strerror(errno), path); 
>> +		goto out;
>> +	}
>> +
>> +	res = fstat(fd, &stat);
>> +	if (res != 0) {
>> +		fprintf(stderr, "Error '%s', could not get size of file '%s'\n",
>> strerror(errno), path); 
>> +		close(fd);
>> +		goto out_fd;
> 
> You are closing fd here, and out_fd: does it again. Does double-closing 
> a file hurt?
> 
well maybe, it will depending on if there are any opens in between or if your
code is threaded. Anyways its not a good idea

>> +	}
>> +
>> +	buffer = (char *) malloc(stat.st_size + 1);
>> +	if (!buffer) {
>> +		fprintf(stderr, "Error could not allocate buffer '%s'\n", path);
> 
> "... buffer _for_ '%s'""
> 
>> +		close(fd);
>> +		goto out_fd;
> 
> Double-closing fd again.
> 
thanks

>> +	}
>> +
>> +	remaining = stat.st_size;
>> +	do {
>> +		ssize_t size;
>> +		size = read(fd, buffer, remaining);
>> +		if (size == -1) {
>> +			if (errno != EAGAIN) {
>> +				fprintf(stderr, "Error reading file '%s'\n", path);
>> +				goto out_buffer;
>> +			}
>> +		} else
>> +			remaining -= size;
>> +	} while (remaining);
>> +
>> +	close(fd);
>> +	buffer[stat.st_size] = 0;
>> +
>> +	*fsize = stat.st_size;
>> +	return buffer;
>> +
>> +out_buffer:
>> +	free(buffer);
>> +out_fd:
>> +	close(fd);
>> +out:
>> +	return NULL;
>> +}
> 
> Reading a file into a buffer doesn't look that easy in C ;-)
> 
> Is there really no function available somewhere that shortens all this 
> to something like
>     buffer = read_file(filename)
> + checking errno?
> 
meh, there are alternatives, like using fopen and fread, mmap, ...

I'm not sure why I through in a more low level approach here. It can be
changed

>> diff --git a/devtools/test_re.cc b/devtools/test_re.cc
>> new file mode 100644
>> index 0000000..57ce55e
>> --- /dev/null
>> +++ b/devtools/test_re.cc
> 
>> +static int process_args(int argc, char *argv[])
>> +{
>> +	int c, o;
>> +
>> +	while ((c = getopt_long(argc, argv, short_options, long_options,
>> &o)) != -1) { +		long tmp;
>> +		switch (c) {
>> +		case 0:
>> +			fprintf(stderr, "Error in getopt_long handling\n");
> 
> In which cases can this happen?

Its a feature of getopt_long that is currently unused. If this gets hit
the option table has been extended without extending the code to handle
the flag (3rd arg in table).

So this is just defensive programming at this point

> If there is a clear answer, can we also have a more helpful error 
> message?
> 
nah, its just a dev message.

>> +		case 'O':
>> +			if (!optarg) {
>> +				/* todo: default optimization */
> 
> Is "do nothing" really an ok-ish default behaviour, or is some code (or 
> error message) needed?
> 
yes it is fine.

>> +char *load_file(const char *path)
> 
> That looks like the same load_file() as in print_hfa.c, therefore
> - please avoid copy&paste programming and move it to a separate file
>   that can be used by both binaries

no, this really does not warrant sharing, especially not at this time

> - the same questions and comments as in print_hfa.c apply ;-)
> 
meh, again low level and could be replaced

>> +/* return the number of failures */
>> +int do_matches(aa_hfa *hfa, unsigned int count)
> ...
>> +	t = clock() - t;
>> +	unsigned int secs = t/CLOCKS_PER_SEC;
>> +	if (g_print_time)
>> +		printf("elapsed time: %u.%3.3lds\n", secs, (((t %
>> CLOCKS_PER_SEC)*1000+500)/CLOCKS_PER_SEC)); +
>> +	return failures;
>> +}
> 
> You can save some CPU cycles if you move
>     t = clock() - t;
>     unsigned int secs = t/CLOCKS_PER_SEC;
> inside the "if (g_print_time)" block (for cases where the time doesn't 
> need to be printed).
> 
meh, I'm not worried about those cpu cycles. The timing bits are to help
with dev when twiddling with the match routine to help check for performance
regressions, the rest of the code doesn't need to be optimized

> If someone argues that this means an additional "if" will be measured 
> and falsify the result,
> a) tell him to get a faster computer ;-)
> b) move at least the calculation of secs inside the if block
> 
again, its not code that needs to be optimized

>> diff --git a/libraries/libapparmor/src/hfa.c
>> b/libraries/libapparmor/src/hfa.c new file mode 100644
>> index 0000000..ec31a15
>> --- /dev/null
>> +++ b/libraries/libapparmor/src/hfa.c
>> @@ -0,0 +1,304 @@
>> +/*
>> + * AppArmor security module
>> + *
>> + * This file contains AppArmor eHFA based regular expression matching
>> engine + *
>> + * Copyright 2016 Canonical Ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License as
>> + * published by the Free Software Foundation, version 2 of the
>> + * License.
>> + *
>> + * AppArmor's extended Hybrid Finite Atomata (eHFA) matching engine
>> is + * essentially an NFA with additional DFA constraints, with some
>> limited + * function callouts and scratch memory (it is not turing
>> complete nor + * even a full PFA).
>> + *
>> + * It is not a true NFA as it does not allow for lambda transitions,
>> nor + * does it allow for an arbitrary set of follow states for any
>> given + * character transition. Instead it should be thought of as a
>> DFA with + * limited NFA states that have constraints such that the
>> number of + * potential NFA states to track in parallel is known in
>> advanced and + * of a fixed number.
>> + *
>> + * Atm this code only implements the DFA portion of the match and
>> does + * not allow for NFA states nor the extended work memory.
>> + */
> 
> I don't say that I completely understand this, but a short and simplified 
> summary seems to be that this code does matching like the kernel does 
> when deciding if an action is allowed/denied. 
> Right?
> 
yes

> Does this also mean this code is "stolen" from the kernel code?
> 
Largely it is a fresh rewrite, and the kernel will be picking up this
code once its ready. There are a few bits and pieces from the kernel
code, largely around the defines in the .h files

>> +/**
>> + * aa_hfa_can_fail - evaluates whether @state could potentially fail
>> for @hfa + * @hfa: hfa to consider
>> + * @hfa: state struct to be paired with @hfa
>> + *
>> + * Returns: true if the combination can fail, else false
>> + */
>> +bool aa_hfa_can_fail(struct aa_hfa *hfa, struct aa_state *state)
>> +{
>> +	return false;
>> +}
> 
> That a) looks very optimistic ("nothing will fail") or b) misses at 
> least a TODO note ;-)
> 
atm nothing can fail, and that was noted above.

* Atm this code only implements the DFA portion of the match and does
* not allow for NFA states nor the extended work memory.

so yes it is a to do

>> +/**
>> + * aa_hfa_matchn_cont - traverse @hfa from @state to find state @str
> 
> I can't find aa_hfa_matchn_cont in your patch or the existing code. 
> Wrong/superfluous comment? Or did you just change the function name after 
> writing the comment?
> 
name changed, and this got missed

>> stops at + * @hfa: the hfa to match @str against  (NOT NULL)
>> + * @state: the state of the hfa to start matching in
>> + * @str: the string of bytes to match against the hfa  (NOT NULL)
>> + * @len: length of the string of bytes to match
>> + *
>> + * aa_hfa_matchn will match @str against the hfa and return the state
>> it + * finished matching in. The final state can be used to look up
>> the accepting + * label, or as the start state of a continuing match.
>> + *
>> + * This function will happily match again the 0 byte and only
> 
> match again_st_?
> 
yep

>> finishes + * when @len input is consumed.
>> + *
>> + * Returns: 0 on success else error.
>> + *
>> + * Note: most matches are guarenteed to succeed, see aa_hfa_reset,
>> and + *       aa_hfa_can_fail.
>> + */
>> +int aa_hfa_continuen(struct aa_hfa *hfa, struct aa_state *state,
>> +		     const char *str, size_t len)
> 
> 
>> +/**
>> + * aa_hfa_matchn - reset and traverse @hfa to find state @str stops
>> at + * @hfa: the hfa to match @str against  (NOT NULL)
>> + * @state: the state of the hfa to start matching in
>> + * @str: the string of bytes to match against the hfa  (NOT NULL)
>> + * @len: length of the string of bytes to match
>> + *
>> + * aa_hfa_matchn will match @str against the hfa and return the state
>> it + * finished matching in. The final state can be used to look up
>> the accepting + * label, or as the start state of a continuing match.
>> + *
>> + * This function will happily match again the 0 byte and only
> 
> match again_st_?
> 
yep

>> finishes + * when @len input is consumed.
>> + *
>> + * Returns: 0 on success else error.
>> + *
>> + * Note: most matches are guarenteed to succeed, see aa_hfa_reset,
>> and + *       aa_hfa_can_fail.
>> + */
>> +int aa_hfa_matchn(struct aa_hfa *hfa, struct aa_state *state, const
>> char *str, +		  size_t len)
> 
> 
> 
>> diff --git a/libraries/libapparmor/src/hfa_unpack.c
>> b/libraries/libapparmor/src/hfa_unpack.c new file mode 100644
>> index 0000000..c23f214
>> --- /dev/null
>> +++ b/libraries/libapparmor/src/hfa_unpack.c
> 
>> +static u32 remap_accept_to_idx(u32 nstates, char *accept1, char
>> *accept2, +			       struct accept64 **accept, u16 **idx)
> ...
>> +	UNPACK_TO_STRUCT(tmp_accept, accept2, nstates, u32, be32_to_cpu,
>> +			 accept2);
>> +	/* set up an state table to sort and do merge based off of */
> 
> set up _a_ state table
> 
yes

> 
>> diff --git a/libraries/libapparmor/swig/SWIG/libapparmor.i
>> b/libraries/libapparmor/swig/SWIG/libapparmor.i index
>> 69b4cc2..47213a1 100644
>> --- a/libraries/libapparmor/swig/SWIG/libapparmor.i
>> +++ b/libraries/libapparmor/swig/SWIG/libapparmor.i
> 
>> +extern void aa_dfa_free(struct aa_dfa *dfa);
>> +extern struct aa_dfa *aa_dfa_unpack(void *blob, size_t size, int
>> flags); 
>> +extern unsigned int aa_dfa_match_len(struct aa_dfa *dfa,
>> unsigned int start, +				     const char *str, int len);
>> +extern unsigned int aa_dfa_match(struct aa_dfa *dfa, unsigned int
>> start, +				 const char *str);
>> +extern unsigned int aa_dfa_next(struct aa_dfa *dfa, unsigned int
>> state, +				const char c);
>> +extern unsigned int aa_dfa_accept(struct aa_dfa *dfa, unsigned int
>> state);
> 
> It's nice to see more library functions exported towards the swig 
> bindings.
> 
ah crud I missed ripping those out :)

we aren't ready to be exporting these functions yet.

> Now to the practical question - are some of those functions useful to, 
> let's say, handle the AARE matching in the tools?
> 
Not yet, but that is the goal