[apparmor] [PATCH] Add compressed dfa matching routines to library, and a base test program

Sun Jan 10 15:22:22 UTC 2016

Hello,

those tools look quite interesting :-)

Some comments and questions inline - I hope you don't need to answer 
everything with "learn C!!!!" ;-)

Am Freitag, 8. Januar 2016 schrieb John Johansen:
> diff --git a/devtools/Makefile b/devtools/Makefile
> new file mode 100644
> index 0000000..b0cd26e
> --- /dev/null
> +++ b/devtools/Makefile

> +test_re: test_re.o ../parser/libapparmor_re/libapparmor_re.a
> ../parser/parser.h ../parser/common_optarg.h
> ../parser/common_optarg.o ../parser/lib.o 
> +	$(CXX) ${CXXFLAGS} -o $@
> $^  -L ../libraries/libapparmor/src/.libs/ -static -lapparmor 

test_re.cc has

    #include "../parser/common_optarg.h"
    #include "../parser/libapparmor_re/aare_rules.h"

    #include "../libraries/libapparmor/include/sys/apparmor.h"
    #include "../libraries/libapparmor/src/hfa.h"
    #include "../libraries/libapparmor/src/hfa_private.h"

Should those files (and their *.c counterparts) be added as make 
dependency?

Also, should the -L depend on USE_SYSTEM?

> +print_hfa: print_hfa.o
> +	$(CC) ${CFLAGS} -o $@ $^  -L ../libraries/libapparmor/src/.libs/
> -static -lapparmor

Should this -L depend on USE_SYSTEM?
(I'm not sure which files are needed, so maybe I'm wrong ;-)

> +print_hfa.o: print_hfa.c
> +       $(CC) ${CFLAGS} -c -o $@ $<

print_hfa.c contains
    #include "../libraries/libapparmor/src/hfa.h"
which brings up two questions:
- should hfa.h (and hfa.c) be added as make dependency?
- what about -L and USE_SYSTEM?

> diff --git a/devtools/README b/devtools/README
> new file mode 100644
> index 0000000..b16f6f0
> --- /dev/null
> +++ b/devtools/README
> @@ -0,0 +1,25 @@
> +Basic developer utilities not ready for or needed by regular users

I'd drop the "not ready for" part - if a tool is intended for regular 
users, it should not end up in devtools IMHO.

> diff --git a/devtools/expr.txt b/devtools/expr.txt
> new file mode 100644
> index 0000000..4b7d12b
> --- /dev/null
> +++ b/devtools/expr.txt
> @@ -0,0 +1,772 @@
> +1 /
> +1 /([^\0000])*\.[Bb][Mm][Pp]
> +1 /([^\0000])*\.[Bb][Zz]2
> +1 /([^\0000])*\.[Cc][Bb][7RZrz]
> +1 /([^\0000])*\.[Dd][Jj][Vv][Uu]
> +1 /([^\0000])*\.[Dd][Vv][Ii]

What's the reason for using that syntax (looks like "real" regexes or 
PCRE) instead of AARE?

I'd prefer to feed the devtools with AARE, since this is what we use in 
the profiles. A requirement "variables must be expanded" would be ok.

> diff --git a/devtools/match.txt b/devtools/match.txt
> new file mode 100644
> index 0000000..549f43d
> --- /dev/null
> +++ b/devtools/match.txt
> @@ -0,0 +1,14 @@
> +1 /abdcd
> +1 /defg
> +1 /home/
> +0 abc
> +1
> /asdfjhlnlwknrlwknklwenflwknfsam.,smdfakdflkafa;ldfkwekjlmnsmnvaslkdf
> jwfnkaef +1 /alkoijnlkn lkasdfasfwedvasvsdvasv
> +1 /lkasjdflkasjdflaskflj;kjwef;kjav;klansvlkjaoifjaskl;vnda;lsdvna
> +1 /home/alskdf;jasldka;lsvnkasvlajk
> hnqlkjwnlkjavn;asldkjvkalsjf;lasjkdf;lsdf +1
> /home/a;dsjfk;lasjkfslafkj;aslfkjasljfk;alsfkj;slafkj;las +0
> laksjdf;lasjkdf;aslkdfj;alsjkdf;alsf
> +0
> ;lasjkdf;lasjkdf;lajh;oiwfwiovnsadjkvbaskdvblavjbklwjvbaskdjvb;wiefuh
> sa;jdkvb +0 jjjjjjasldfasldflll
> +0 asdf
> +0 a

Looks like the "I finally cleaned my keyboard" testcase ;-))

> diff --git a/devtools/print_hfa.c b/devtools/print_hfa.c
> new file mode 100644
> index 0000000..c5a5b65
> --- /dev/null
> +++ b/devtools/print_hfa.c

> +const char *short_options = "";
> +struct option long_options[] = {
> +	{NULL,		0, 0, 0},
> +};
> +
> +static int process_args(int argc, char *argv[])
> +{
> +	int c, o;
> +
> +	while ((c = getopt_long(argc, argv, short_options, long_options,
> &o)) != -1) { +		//long tmp;
> +		switch (c) {
> +		case 0:
> +			fprintf(stderr, "Error in getopt_long handling\n");
> +			exit(1);
> +			break;
> +		}
> +
> +	}
> +
> +	return optind;
> +}

Silly question - what exactly does this function do? (well, parsing the 
options, but... ;-)

Also, how can the error condition (case 0) happen? From reading 
getopt_long(3), I'd guess it happens if a long option is matched - but 
it's very unlikely that NULLL will ever match ;-)
(Also, why do we need that NULL in long_options at all?)

That said - what about a --help?

Final question - it seems main() handles argv itsself, so is this 
function called/used at all?

> +char *load_file(const char *path, ssize_t *fsize)
> +{
> +	struct stat stat;
> +	char *buffer;
> +	ssize_t remaining;
> +	int fd, res;
> +
> +	fd = open(path, O_RDONLY);
> +	if (fd == -1) {
> +		fprintf(stderr, " Error '%s', failed to open '%s'\n",
> strerror(errno), path); 
> +		goto out;
> +	}
> +
> +	res = fstat(fd, &stat);
> +	if (res != 0) {
> +		fprintf(stderr, "Error '%s', could not get size of file '%s'\n",
> strerror(errno), path); 
> +		close(fd);
> +		goto out_fd;

You are closing fd here, and out_fd: does it again. Does double-closing 
a file hurt?

> +	}
> +
> +	buffer = (char *) malloc(stat.st_size + 1);
> +	if (!buffer) {
> +		fprintf(stderr, "Error could not allocate buffer '%s'\n", path);

"... buffer _for_ '%s'""

> +		close(fd);
> +		goto out_fd;

Double-closing fd again.

> +	}
> +
> +	remaining = stat.st_size;
> +	do {
> +		ssize_t size;
> +		size = read(fd, buffer, remaining);
> +		if (size == -1) {
> +			if (errno != EAGAIN) {
> +				fprintf(stderr, "Error reading file '%s'\n", path);
> +				goto out_buffer;
> +			}
> +		} else
> +			remaining -= size;
> +	} while (remaining);
> +
> +	close(fd);
> +	buffer[stat.st_size] = 0;
> +
> +	*fsize = stat.st_size;
> +	return buffer;
> +
> +out_buffer:
> +	free(buffer);
> +out_fd:
> +	close(fd);
> +out:
> +	return NULL;
> +}

Reading a file into a buffer doesn't look that easy in C ;-)

Is there really no function available somewhere that shortens all this 
to something like
    buffer = read_file(filename)
+ checking errno?

> diff --git a/devtools/test_re.cc b/devtools/test_re.cc
> new file mode 100644
> index 0000000..57ce55e
> --- /dev/null
> +++ b/devtools/test_re.cc

> +static int process_args(int argc, char *argv[])
> +{
> +	int c, o;
> +
> +	while ((c = getopt_long(argc, argv, short_options, long_options,
> &o)) != -1) { +		long tmp;
> +		switch (c) {
> +		case 0:
> +			fprintf(stderr, "Error in getopt_long handling\n");

In which cases can this happen?
If there is a clear answer, can we also have a more helpful error 
message?

> +		case 'O':
> +			if (!optarg) {
> +				/* todo: default optimization */

Is "do nothing" really an ok-ish default behaviour, or is some code (or 
error message) needed?

> +char *load_file(const char *path)

That looks like the same load_file() as in print_hfa.c, therefore
- please avoid copy&paste programming and move it to a separate file
  that can be used by both binaries
- the same questions and comments as in print_hfa.c apply ;-)

> +/* return the number of failures */
> +int do_matches(aa_hfa *hfa, unsigned int count)
...
> +	t = clock() - t;
> +	unsigned int secs = t/CLOCKS_PER_SEC;
> +	if (g_print_time)
> +		printf("elapsed time: %u.%3.3lds\n", secs, (((t %
> CLOCKS_PER_SEC)*1000+500)/CLOCKS_PER_SEC)); +
> +	return failures;
> +}

You can save some CPU cycles if you move
    t = clock() - t;
    unsigned int secs = t/CLOCKS_PER_SEC;
inside the "if (g_print_time)" block (for cases where the time doesn't 
need to be printed).

If someone argues that this means an additional "if" will be measured 
and falsify the result,
a) tell him to get a faster computer ;-)
b) move at least the calculation of secs inside the if block

> diff --git a/libraries/libapparmor/src/hfa.c
> b/libraries/libapparmor/src/hfa.c new file mode 100644
> index 0000000..ec31a15
> --- /dev/null
> +++ b/libraries/libapparmor/src/hfa.c
> @@ -0,0 +1,304 @@
> +/*
> + * AppArmor security module
> + *
> + * This file contains AppArmor eHFA based regular expression matching
> engine + *
> + * Copyright 2016 Canonical Ltd.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + * AppArmor's extended Hybrid Finite Atomata (eHFA) matching engine
> is + * essentially an NFA with additional DFA constraints, with some
> limited + * function callouts and scratch memory (it is not turing
> complete nor + * even a full PFA).
> + *
> + * It is not a true NFA as it does not allow for lambda transitions,
> nor + * does it allow for an arbitrary set of follow states for any
> given + * character transition. Instead it should be thought of as a
> DFA with + * limited NFA states that have constraints such that the
> number of + * potential NFA states to track in parallel is known in
> advanced and + * of a fixed number.
> + *
> + * Atm this code only implements the DFA portion of the match and
> does + * not allow for NFA states nor the extended work memory.
> + */

I don't say that I completely understand this, but a short and simplified 
summary seems to be that this code does matching like the kernel does 
when deciding if an action is allowed/denied. 
Right?

Does this also mean this code is "stolen" from the kernel code?

> +/**
> + * aa_hfa_can_fail - evaluates whether @state could potentially fail
> for @hfa + * @hfa: hfa to consider
> + * @hfa: state struct to be paired with @hfa
> + *
> + * Returns: true if the combination can fail, else false
> + */
> +bool aa_hfa_can_fail(struct aa_hfa *hfa, struct aa_state *state)
> +{
> +	return false;
> +}

That a) looks very optimistic ("nothing will fail") or b) misses at 
least a TODO note ;-)

> +/**
> + * aa_hfa_matchn_cont - traverse @hfa from @state to find state @str

I can't find aa_hfa_matchn_cont in your patch or the existing code. 
Wrong/superfluous comment? Or did you just change the function name after 
writing the comment?

> stops at + * @hfa: the hfa to match @str against  (NOT NULL)
> + * @state: the state of the hfa to start matching in
> + * @str: the string of bytes to match against the hfa  (NOT NULL)
> + * @len: length of the string of bytes to match
> + *
> + * aa_hfa_matchn will match @str against the hfa and return the state
> it + * finished matching in. The final state can be used to look up
> the accepting + * label, or as the start state of a continuing match.
> + *
> + * This function will happily match again the 0 byte and only

match again_st_?

> finishes + * when @len input is consumed.
> + *
> + * Returns: 0 on success else error.
> + *
> + * Note: most matches are guarenteed to succeed, see aa_hfa_reset,
> and + *       aa_hfa_can_fail.
> + */
> +int aa_hfa_continuen(struct aa_hfa *hfa, struct aa_state *state,
> +		     const char *str, size_t len)

> +/**
> + * aa_hfa_matchn - reset and traverse @hfa to find state @str stops
> at + * @hfa: the hfa to match @str against  (NOT NULL)
> + * @state: the state of the hfa to start matching in
> + * @str: the string of bytes to match against the hfa  (NOT NULL)
> + * @len: length of the string of bytes to match
> + *
> + * aa_hfa_matchn will match @str against the hfa and return the state
> it + * finished matching in. The final state can be used to look up
> the accepting + * label, or as the start state of a continuing match.
> + *
> + * This function will happily match again the 0 byte and only

match again_st_?

> finishes + * when @len input is consumed.
> + *
> + * Returns: 0 on success else error.
> + *
> + * Note: most matches are guarenteed to succeed, see aa_hfa_reset,
> and + *       aa_hfa_can_fail.
> + */
> +int aa_hfa_matchn(struct aa_hfa *hfa, struct aa_state *state, const
> char *str, +		  size_t len)

> diff --git a/libraries/libapparmor/src/hfa_unpack.c
> b/libraries/libapparmor/src/hfa_unpack.c new file mode 100644
> index 0000000..c23f214
> --- /dev/null
> +++ b/libraries/libapparmor/src/hfa_unpack.c

> +static u32 remap_accept_to_idx(u32 nstates, char *accept1, char
> *accept2, +			       struct accept64 **accept, u16 **idx)
...
> +	UNPACK_TO_STRUCT(tmp_accept, accept2, nstates, u32, be32_to_cpu,
> +			 accept2);
> +	/* set up an state table to sort and do merge based off of */

set up _a_ state table

> diff --git a/libraries/libapparmor/swig/SWIG/libapparmor.i
> b/libraries/libapparmor/swig/SWIG/libapparmor.i index
> 69b4cc2..47213a1 100644
> --- a/libraries/libapparmor/swig/SWIG/libapparmor.i
> +++ b/libraries/libapparmor/swig/SWIG/libapparmor.i

> +extern void aa_dfa_free(struct aa_dfa *dfa);
> +extern struct aa_dfa *aa_dfa_unpack(void *blob, size_t size, int
> flags); 
> +extern unsigned int aa_dfa_match_len(struct aa_dfa *dfa,
> unsigned int start, +				     const char *str, int len);
> +extern unsigned int aa_dfa_match(struct aa_dfa *dfa, unsigned int
> start, +				 const char *str);
> +extern unsigned int aa_dfa_next(struct aa_dfa *dfa, unsigned int
> state, +				const char c);
> +extern unsigned int aa_dfa_accept(struct aa_dfa *dfa, unsigned int
> state);

It's nice to see more library functions exported towards the swig 
bindings.

Now to the practical question - are some of those functions useful to, 
let's say, handle the AARE matching in the tools?

Regards,

Christian Boltz
-- 
Versuchst du mal bitte zu formulieren, was du eigentlich möchtest?
Mit diesem Posting hast du gute Chancen auf den "Marcel Stein Award".
[Christian Paul in suse-linux]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.ubuntu.com/archives/apparmor/attachments/20160110/f21e2011/attachment.pgp>