[apparmor] [patch] Add Differential State Compression to the DFA

Sat Dec 14 19:10:32 UTC 2013

On 12/13/2013 09:37 PM, Seth Arnold wrote:
> On Tue, Dec 10, 2013 at 06:38:51AM -0800, John Johansen wrote:
>>     This is a slightly updated (to apply against current dev) version of the
>>     differential compression patch.
>>
>>     Differential state compression encodes a state's transitions as the
>>     difference between the state and its default state (the state it is
>>     relative too).
>> [...]
>>     Signed-off-by: John Johansen <john.johansen at canonical.com>
> 
> 
> Impressive; I think I understood very nearly all of it, thanks for the
> copius documentation, comments, and above-all, clear code, though I'm
> afraid I haven't yet figured out if this requires kernel modifications to
> support the differential compression.
> 
yes it does, in particular its the case where the base transition is used

old pseudo code
  if state.next == state.check
    next_state = state.next
  else
    next.state = state.default

new pseudo code
loop:
  if state.next == state.check
    next_state = state.next
  else if state.isdiffencoded
    state = state.default
    goto loop
  else
    next_state = state.default

of course we need extra in checks in the validation steps too.

While diff encoding does potentially slow down matches, its is guaranteed
to be never more than walking 2x + 1 as many states. So for a string of length
on a none diff encoded dfa, we would walk 10 states, for the diff enoded
we walk at most 20, and likely only a few more. I'll try to explain

Our diff encode has a few properites
- each state keeps track of whether it is diff encoded. A diff encoding
  chain will stop when it hits a state thats diff encoded (this is
  in constrast to some other implementations of diff encoding where each
  chain goes back to a root state because they didn't want to smuggle in
  the extra bit flag.)
- the diff encode is done using a DAG walk of the DFA from the
  start state.
  - the start and none matching states are never diff encoded
  - states in the second level of the DAG are only diff encoded against
    the start state.
  - states in the 3rd level are encoded against only the 2nd and start
    states ...
  - states are only diffed against connected states backward in the DAG
    or a connected neighbour. The connected neighbour is an interesting
    exception but the point is we are only diff encoding against states
    that are connected directly or historically in the DAG. This provides
    some interesting properties.
    - it greatly reduces the number of states diff encoding compression
      needs to examine, making it fast.
    - when matching
      - the states we step back to in a diff encode chain have a good
        chance of being hot in the cache, because they are states that
        previously may have been walked making the current match.
        There is no guarantee that it will be step back to a state
        that was used in a specific match but the odds are good with
        many states having only 2-5 potential immediate ancestors.
      - when we do step back the DAG gives us guarantees about the
        number of states that can be walked in aggregate

        Say we are matching a string of length 10, and we make it to
        position 9 before we hit a diff encoded state, and its encoding
        chain is a worst case walking through 8 previous states before
        it stops (this gets us back to the start state), making us walk
        a total of 9+8 = 17 states total so far. We then match the next
        char, and final char, moving us forward 1 state, and its a diff
        state too. However due to DAG encoding even though we are
        matching the tenth char, we are only at a level 2 in the DAG
        and it can only step back 1 state to the the start state.
        So this case comes to 19 states to match walked to match a
        string of 10 chars.

        If we had hit a diff state say at the 5 character then the
        diff chain is constrained to be at most 4 states long, and
        subsequent chains are also constrained by the DAG because
        the 5 char is like we are at the beginning of the match from
        the pov from the DAG encoding.  So say now the 8 character
        is hits a diff encode state, its chain can walk back at
        most 3 states because that state is at most the 3rd level
        in the DAG.

        Similarly if every character match hits a diff encode state
        we never get past level 2 in the DAG so each diff chain is
        at most 1 long, meaning 2 states per character matched.

In reality we almost never hit a worst case and we walk maybe 30-40%
more states for a match, and each state walk also picks up an extra
condition and masking operation to separate the flags from the base.

So how much does it slow us down? Well that really depends on the
dfa and the match. We gain a much smaller dfa that can better fit
in cache and the extra state walks have a decent chance of being
hot in cache, even the subsequent forward walking has a better chance
of being in cache both because the dfa is smaller but also because
we could have walked it earlier in the match. So generally we shouldn't
loose as much performance as the % of extra states would imply.

Also to help compensate I have another set of patches that reorg how
the dfa works to make it better align with the cache and that should
give us a performance boost.

Me and steve are working on getting some tests for all this to provide
regression testing but also to get some actual timed numbers to give
a better what all changes work out to. However even if its a little
slower the matching is still faster than the path lookup, so the
amount of regression on a given mediation path should be in the noise.

> I have some questions inline.
> 
> With the caveat that I haven't looked at how well the _other_ code
> will handle this, only the code in this patch:
> 
> Acked-by: Seth Arnold <seth.arnold at canonical.com>
> 
>> === modified file 'parser/libapparmor_re/aare_rules.cc'
>> --- parser/libapparmor_re/aare_rules.cc	2013-10-14 21:36:05 +0000
>> +++ parser/libapparmor_re/aare_rules.cc	2013-12-10 13:24:18 +0000
>> @@ -315,6 +315,13 @@
>>  		} else if (flags & DFA_DUMP_EQUIV)
>>  			cerr << "\nDFA did not generate an equivalence class\n";
>>  
>> +		if (flags & DFA_CONTROL_DIFF_ENCODE) {
>> +			dfa.diff_encode(flags);
>> +
>> +			if (flags & DFA_DUMP_DIFF_ENCODE)
>> +				dfa.dump_diff_encode(cerr);
>> +		}
>> +
>>  		CHFA chfa(dfa, eq, flags);
>>  		if (flags & DFA_DUMP_TRANS_TABLE)
>>  			chfa.dump(cerr);
>>
>> === modified file 'parser/libapparmor_re/apparmor_re.h'
>> --- parser/libapparmor_re/apparmor_re.h	2013-09-27 23:13:22 +0000
>> +++ parser/libapparmor_re/apparmor_re.h	2013-12-10 13:27:08 +0000
>> @@ -31,7 +31,11 @@
>>  #define DFA_CONTROL_FILTER_DENY 	(1 << 6)
>>  #define DFA_CONTROL_REMOVE_UNREACHABLE  (1 << 7)
>>  #define DFA_CONTROL_TRANS_HIGH		(1 << 8)
>> +#define DFA_CONTROL_DIFF_ENCODE		(1 << 9)
>>  
>> +#define DFA_DUMP_DIFF_PROGRESS		(1 << 10)
>> +#define DFA_DUMP_DIFF_ENCODE		(1 << 11)
>> +#define DFA_DUMP_DIFF_STATS		(1 << 12)
>>  #define DFA_DUMP_MIN_PARTS 		(1 << 13)
>>  #define DFA_DUMP_UNIQ_PERMS 		(1 << 14)
>>  #define DFA_DUMP_MIN_UNIQ_PERMS 	(1 << 15)
>>
>> === modified file 'parser/libapparmor_re/chfa.cc'
>> --- parser/libapparmor_re/chfa.cc	2012-02-24 12:21:59 +0000
>> +++ parser/libapparmor_re/chfa.cc	2013-12-10 14:26:52 +0000
>> @@ -32,6 +32,7 @@
>>  #include "hfa.h"
>>  #include "chfa.h"
>>  #include "../immunix.h"
>> +#include "flex-tables.h"
>>  
>>  void CHFA::init_free_list(vector<pair<size_t, size_t> > &free_list,
>>  				     size_t prev, size_t start)
>> @@ -53,6 +54,11 @@
>>  	if (flags & DFA_DUMP_TRANS_PROGRESS)
>>  		fprintf(stderr, "Compressing HFA:\r");
>>  
>> +	if (dfa.diffcount)
>> +		flags = YYTH_FLAG_DIFF_ENCODE;
>> +	else
>> +		flags = 0;
>> +
>>  	if (eq.empty())
>>  		max_eq = 255;
>>  	else {
>> @@ -242,6 +248,8 @@
>>  	}
>>  
>>  do_insert:
>> +	if (from->flags & DiffEncodeFlag)
>> +		base |= DiffEncodeBit32;
>>  	default_base.push_back(make_pair(default_state, base));
>>  }
>>  
>> @@ -279,7 +287,7 @@
>>  			   << *next_check[i].second << " -> "
>>  			   << *next_check[i].first << ": ";
>>  
>> -			size_t offs = i - default_base[num[next_check[i].second]].second;
>> +			size_t offs = i - base_mask_size(default_base[num[next_check[i].second]].second);
>>  			if (eq.size())
>>  				os << offs;
>>  			else
>> @@ -296,7 +304,6 @@
>>   * (Only the -Cf and -Ce formats are currently supported.)
>>   */
>>  
>> -#include "flex-tables.h"
>>  #define YYTH_REGEX_MAGIC 0x1B5E783D
>>  
>>  static inline size_t pad64(size_t i)
>> @@ -395,6 +402,7 @@
>>  
>>  	size_t hsize = pad64(sizeof(th) + sizeof(th_version) + strlen(name) + 1);
>>  	th.th_magic = htonl(YYTH_REGEX_MAGIC);
>> +	th.th_flags = htonl(flags);
>>  	th.th_hsize = htonl(hsize);
>>  	th.th_ssize = htonl(hsize +
>>  			    flex_table_size(accept.begin(), accept.end()) +
>>
>> === modified file 'parser/libapparmor_re/chfa.h'
>> --- parser/libapparmor_re/chfa.h	2012-02-24 12:21:59 +0000
>> +++ parser/libapparmor_re/chfa.h	2013-12-10 14:19:05 +0000
>> @@ -26,6 +26,10 @@
>>  
>>  #include "hfa.h"
>>  
>> +#define BASE32_FLAGS 0xff000000
>> +#define DiffEncodeBit32 0x80000000
>> +#define base_mask_size(X) ((X) & ~BASE32_FLAGS)
>> +
> 
> I know with only one use it's not a big deal, but I do prefer macros in
> upper-case.
> 
>>  using namespace std;
>>  
>>  class CHFA {
>> @@ -51,6 +55,7 @@
>>  	map<uchar, uchar> &eq;
>>  	uchar max_eq;
>>  	size_t first_free;
>> +	unsigned int flags;
>>  };
>>  
>>  #endif /* __LIBAA_RE_CHFA_H */
>>
>> === modified file 'parser/libapparmor_re/flex-tables.h'
>> --- parser/libapparmor_re/flex-tables.h	2010-11-09 21:39:18 +0000
>> +++ parser/libapparmor_re/flex-tables.h	2013-12-10 14:17:09 +0000
>> @@ -5,6 +5,7 @@
>>  #include <stdint.h>
>>  
>>  #define YYTH_MAGIC	0xF13C57B1
>> +#define YYTH_FLAG_DIFF_ENCODE 1
>>  
>>  struct table_set_header {
>>  	uint32_t	th_magic;	/* TH_MAGIC */
>>
>> === modified file 'parser/libapparmor_re/hfa.cc'
>> --- parser/libapparmor_re/hfa.cc	2012-05-30 21:31:41 +0000
>> +++ parser/libapparmor_re/hfa.cc	2013-12-10 14:06:00 +0000
>> @@ -73,6 +73,194 @@
>>  	return os;
>>  }
>>  
>> +/**
>> + * diff_weight - Find differential compression distance between @rel and @this
>> + * @rel: State to compare too
>> + * Returns: An integer indicating how good rel is as a base, larger == better
>> + *
>> + * Find the relative weighted difference for differential state compression
>> + * with queried state being compressed against @rel
>> + *
>> + * +1 for each transition that matches (char and dest - saves a transition)
>> + * 0 for each transition that doesn't match and exists in both states
>> + * 0  for transition that self has and @other doesn't (no extra required)
>> + * -1 for each transition that is in @rel and not in @this (have to override)
>> + *
>> + * @rel should not be a state that has already been made differential or it may
>> + * introduce extra transitions as it does not recurse to find all transitions
>> + *
>> + * Should be applied after state minimization 
>> +*/
>> +int State::diff_weight(State *rel)
>> +{
>> +	int weight = 0;
>> +
>> +	if (this == rel)
>> +		return 0;
>> +
>> +	if (rel->diff->rel) {
>> +		/* Can only be diff encoded against states that are relative
>> +		 * to a state of a lower depth. ie, at most one sibling in
>> +		 * the chain
>> +		 */
>> +		if (rel->diff->rel->diff->depth >= this->diff->depth)
>> +			return 0;
>> +	} else if (rel->diff->depth >= this->diff->depth)
>> +		return 0;
>> +
>> +	if (rel->flags & DiffEncodeFlag) {
>> +		for (uchar i = 0; i < 256; i++) {
> 
> I think I'd rather see this as a standard 'int' or 'unsigned int'; '256'
> isn't expressible as an unsigned char, so every i < 256 will involve the
> usual integer promotion rules. Might as well just make 'i' an integer and
> avoid the whole thing. :)
> 
hrmm or we could go to <= 255, either way will update

>> +			State *state = rel->next(i);
>> +			StateTrans::iterator j = trans.find(i);
>> +			if (j != trans.end()) {
>> +				if (state == j->second)
>> +					weight++;
>> +				/* else
>> +				   0 - keep transition to mask
>> +				*/
>> +			} else if (state == otherwise) {
>> +				/* 0 - match of default against @rel
>> +				 * We don't save a transition but don't have
>> +				 * to mask either
>> +				 */
>> +			} else {
>> +				/* @rel has transition not covered by @this.
>> +				 * Need to add a transition to mask it
>> +				 */
>> +				weight--;
>> +			}
>> +		}
>> +		return weight;
>> +	}
>> +
>> +	unsigned int count = 0;
>> +	for (StateTrans::iterator i = rel->trans.begin(); i != rel->trans.end();
>> + i++) {
>> +		StateTrans::iterator j = trans.find(i->first);
>> +		if (j != trans.end()) {
>> +			if (i->second == j->second)
>> +				weight++;
>> +			/* } else {
>> +				0 - keep transition to mask
>> +			*/
>> +			count++;
>> +		} else if (i->second == otherwise) {
>> +			/* 0 - match of default against @rel
>> +			 * We don't save a transition but don't have to
>> +			 * mask either
>> +			 */
>> +		} else {
>> +			/* rel has transition not covered by @this.  Need to
>> +			 * add a transition to mask
>> +			 */
>> +			weight--;
>> +		}
>> +	}
>> +
>> +	/* cover transitions in @this but not in @rel */
>> +	unsigned int this_count = 0;
>> +	if (count < trans.size()) {
>> +		for (StateTrans::iterator i = trans.begin(); i != trans.end(); i++) {
>> +			StateTrans::iterator j = rel->trans.find(i->first);
>> +			if (j == rel->trans.end()) {
>> +				this_count++;
>> +				if (i->second == rel->otherwise)
>> +					/* replaced by rel->cases.otherwise */
>> +					weight++;
>> +			}
>> +		}
>> +	}
> 
> The previous two loops look like they are an O(M*log(N) + N*log(M)) operation
> (assuming trans.find() is O(log(N)) operation) -- while in the long run,
> this would be obviously better than an O(N^2) nested loops, I wonder if
> the smaller amounts of data that I suspect will be involved here if the
> simplicity of nested loops would go more quickly.
> 
Err, its hard to say. It will all be dependent on the template and interator
implementation. Its possible that the find is an O(1) operation, though its
likely an rb-tree using a log(M).

For walking the iterator if the data is in a tree instead of a vector then,
the walking of the iterator is more expensive by a certain amount vs walking
a vector. So its likely not a whole lot better than just doing a find.

I think there are cases where a nested loop will go quicker, but not by much
and on average it will be very close for small chains. Where the current
implementation wins if chains ever start getting a little longer (which can
happen).

I'd rather have the code that is slightly slower in the small cases but prevents
the larger cases from blowing up.

Also remember overall the diff encode is fast, and it generally results in
speeding up the dfa creation because it reduces the amount of work needed in
the chfa step. So overall I'm just not worried here

> 
>> +	if (rel->otherwise != otherwise) {
>> +		/* rel default transitions have to be masked with transitions
>> +		 * This covers all transitions not covered above
>> +		 */
>> +		weight -= 256 - (rel->trans.size() + this_count);
>> +	}
>> +
>> +	return weight;
>> +}
>> +
>> +/**
>> + * make_relative - Make this state relative to @rel
>> + * @rel: state to make this state relative too
>> + *
>> + * @rel can be a relative (differentially compressed state)
>> + */
>> +int State::make_relative(State *rel)
>> +{
>> +	int weight = 0;
>> +
>> +	if (this == rel || !rel)
>> +		return 0;
>> +
>> +	if (flags & DiffEncodeFlag)
>> +		return 0;
>> +
>> +	flags |= DiffEncodeFlag;
>> +
>> +	for (int i = 0; i < 256 ; i++) {
>> +		State *next = rel->next(i);
>> +
>> +		StateTrans::iterator j = trans.find(i);
>> +		if (j != trans.end()) {
>> +			if (j->second == next) {
>> +				trans.erase(j);
>> +				weight++;
>> +			}
>> +			/* else keep transition to mask */
>> +		} else if (otherwise == next) {
>> +			/* do nothing, otherwise transition disappears when
>> +			 * reassigned
>> +			 */
>> +		} else {
>> +			/* need a new transition to mask those in lower state */
>> +			trans[i] = otherwise;
>> +			weight--;
>> +		}
>> +	}
>> +
>> +	otherwise = rel;
>> +
>> +	return weight;
>> +}
>> +
>> +/**
>> + * flatten_differential - remove differential encode from this state
>> + */
>> +void State::flatten_relative(void)
>> +{
>> +	if (!(flags & DiffEncodeFlag))
>> +		return;
>> +
>> +	map<State *, int> count;
>> +
>> +	for (int i = 0; i < 256; i++)
>> +		count[next(i)] += 1;
>> +
>> +	int j = 0;
>> +	State *def = next(0);
>> +	for (int i = 1; i < 256; i++) {
>> +		if (count[next(i)] > count[next(j)]) {
>> +			j = i;
>> +			def = next(i);
>> +		}
>> +	}
>> +
>> +	for (int i = 0; i < 256; i++) {
>> +		if (trans.find(i) != trans.end()) {
>> +			if (trans[i] == def)
>> +				trans.erase(i);
>> +		} else {
>> +			if (trans[i] != def)
>> +				trans[i] = next(i);
>> +		}
>> +	}
>> +
>> +	otherwise = def;
>> +	flags = flags & ~DiffEncodeFlag;
>> +}
>> +
>>  static void split_node_types(NodeSet *nodes, NodeSet **anodes, NodeSet **nnodes
>>  )
>>  {
>> @@ -175,6 +363,7 @@
>>  DFA::DFA(Node *root, dfaflags_t flags): root(root)
>>  {
>>  	int i = 0;
>> +	diffcount = 0;		/* set by diff_encode */
>>  
>>  	if (flags & DFA_DUMP_PROGRESS)
>>  		fprintf(stderr, "Creating dfa:\r");
>> @@ -606,6 +795,239 @@
>>  	}
>>  }
>>  
>> +
>> +/* diff_encode helper functions */
>> +static unsigned int add_to_dag(DiffDag *dag, State *state,
>> +			       State *parent)
>> +{
>> +	unsigned int rc = 0;
>> +	if (!state->diff) {
>> +		dag->rel = NULL;
>> +		if (parent)
>> +			dag->depth = parent->diff->depth + 1;
>> +		else
>> +			dag->depth = 1;
>> +		dag->state = state;
>> +		state->diff = dag;
>> +		rc = 1;
>> +	}
>> +	if (parent && parent->diff->depth < state->diff->depth)
>> +		state->diff->parents.push_back(parent);
>> +	return rc;
>> +}
>> +
>> +static int diff_partition(State *state, Partition &part, State **candidate)
>> +{
>> +	int weight = 0;
>> +	*candidate = NULL;
>> +
>> +	for (Partition::iterator i = part.begin(); i != part.end(); i++) {
>> +		if (*i == state)
>> +			continue;
>> +
>> +		int tmp = state->diff_weight(*i);
>> +		if (tmp > weight) {
>> +			weight = tmp;
>> +			*candidate = *i;
>> +		}
>> +	}
>> +	return weight;
>> +}
>> +
>> +/**
>> + * diff_encode - compress dfa by differentially encoding state transitions
>> + * @dfa_flags: flags controling dfa creation
>> + *
>> + * This function reduces the number of transitions that need to be stored
>> + * by encoding transitions as the difference between the state and a
>> + * another transitions that is set as the states default.
>> + *
>> + * For performance reasons this function does not try to compute the
>> + * absolute best encoding (maximal spanning tree) but instead computes
>> + * a very good encoding within the following limitations.
>> + *   - Not all states have to be differentially encoded.  This allows for
>> + *     multiple states to be used as a terminating basis.
>> + *   - The number of state transitions needed to match an input of length
>> + *     m will be 2m
>> + *
>> + * To guarentee this the ordering and distance calculation is done in the
>> + * following manner.
>> + * - A DAG of the DFA is created starting with the start state(s).
>> + * - A state can only be relative (have a differential encoding) to
>> + *   another state if that state has
>> + *   - a lower depth in the DAG
>> + *   - is a sibling (same depth) that is not relative
>> + *   - is a sibling that is relative to a state with lower depth in the DAG
>> + *
>> + * The run time constraints are maintained by the DAG ordering + relative
>> + * state constraints.  For any input character C when at state S with S being
>> + * at level N in the DAG then at most 2N states must be traversed to find the
>> + * transition for C.  However on the maximal number of transitions is not m*m,
>> + * because  when a character is matched and forward movement is made through
>> + * the DFA any relative transition search will move back through the DAG order.
>> + * So say for character C we start matching on a state S that is at depth 10
>> + * in the DAG.  The transition for C is not found in S and we recurse backwards
>> + * to a depth of 6.  A transition is found and it steps to the next state, but
>> + * the state transition at most will only move 1 deeper into the DAG so for
>> + * the next state the maximum number of states traversed is 2*7.
>> + */
>> +void DFA::diff_encode(dfaflags_t flags)
>> +{
>> +	DiffDag *dag;
>> +	unsigned int xcount = 0, xweight = 0, transitions = 0, depth = 0;
>> +
>> +	/* clear the depth flag */
>> +	for (Partition::iterator i = states.begin(); i != states.end(); i++) {
>> +		(*i)->diff = NULL;
>> +		transitions += (*i)->trans.size();
>> +	}
>> +
>> +	/* Prealloc structures we need.  We know the exact number of elements,
>> +	 * and once setup they don't change so we don't need the flexibility
>> +	 * or overhead of stl, just allocate the needed data as an array
>> +	 */
>> +	dag = new DiffDag [states.size()];
>> +
>> +	/* Generate DAG ordering and parent sets */
>> +	add_to_dag(&dag[0], nonmatching, NULL);
>> +	add_to_dag(&dag[1], start, NULL);
>> +
>> +	unsigned int tail = 2;
>> +	for (unsigned int i = 1; i < tail; i++) {
>> +		State *state = dag[i].state;
>> +		State *child = dag[i].state->otherwise;
>> +		if (child)
>> +			tail += add_to_dag(&dag[tail], child, state);
>> +
>> +		for (StateTrans::iterator j = state->trans.begin(); j != state->trans.end(); j++) {
>> +			child = j->second;
>> +			tail += add_to_dag(&dag[tail], child, state);
>> +		}
>> +	}
>> +	depth = dag[tail - 1].depth;
>> +
>> +	/* calculate which state to make a transitions relative too */
>> +	for (unsigned int i = 2; i < tail; i++) {
>> +		State *state = dag[i].state;
>> +		State *candidate = NULL;
>> +
>> +		int weight = diff_partition(state,
>> +					    state->otherwise->diff->parents,
>> +					    &candidate);
>> +
>> +		for (StateTrans::iterator j = state->trans.begin(); j != state->trans.end(); j++) {
>> +			State *tmp_candidate;
>> +			int tmp = diff_partition(state,
>> +						 j->second->diff->parents,
>> +						 &tmp_candidate);
>> +			if (tmp > weight) {
>> +				weight = tmp;
>> +				candidate = tmp_candidate;
>> +			}
>> +		}
>> +
>> +		if ((flags & DFA_DUMP_DIFF_PROGRESS) && (i % 100 == 0))
>> +			cerr << "\033[2KDiff Encode: " << i << " of "
>> +			     << tail << ".  Diff states " << xcount
>> +			     << " Savings " << xweight << "\r";
>> +
>> +		state->diff->rel = candidate;
>> +		if (candidate) {
>> +			xcount++;
>> +			xweight += weight;
>> +		}
>> +	}
>> +
>> +	/* now make transitions relative, start at the back of the list so
>> +	 * as to start with the last transitions and work backwards to avoid
>> +	 * having to traverse multiple previous states (that have been made
>> +	 * relative already) to reconstruct previous state transition table
>> +	 */
>> +	unsigned int aweight = 0;
>> +	diffcount = 0;
>> +	for (int i = tail - 1; i > 1; i--) {
>> +		if (dag[i].rel) {
>> +			int weight = dag[i].state->make_relative(dag[i].rel);
>> +			aweight += weight;
>> +			diffcount++;
>> +		}
>> +	}
>> +
>> +	if (flags & DFA_DUMP_DIFF_STATS)
>> +		cerr << "Diff encode  states: " << diffcount << " of "
>> +                     << tail << " reached @ depth "  << depth << ". "
>> +		     <<  aweight << " trans removed\n";
>> +
>> +	if (xweight != aweight)
>> +		cerr << "Diff encode error: actual savings " << aweight
>> +		     << " != expected " << xweight << "\n";
>> +
>> +	if (xcount != diffcount)
>> +		cerr << "Diff encode error: actual count " << diffcount
>> +		     << " != expected " << xcount << " \n";
>> +
>> +	/* cleanup */
>> +	for (unsigned int i = 0; i < tail; i++)
>> +		dag[i].parents.clear();
>> +	delete [] dag;
> 
> Something about the dag[i].parents.clear() seems out of place; will it
> leak the 'shells' of the data structures? Can these be fiddled with so
> they'll be properly cleaned up when they go out of scope?
> 
hrmmm, we might be able to clean this up some but, its not a simple as
it seems most of the dag information is stored in a union in the state,
that never goes out of scope.

Having it in the union instead of allocating and deallocating objects
was faster, and to my C oriented mind cleaner.

>> +}
>> +
>> +/**
>> + * flatten_differential - remove differential state encoding
>> + *
>> + * Flatten the dfa back into a flat encoding.
>> + */
>> +void DFA::undiff_encode(void)
>> +{
>> +	for (Partition::iterator i = states.begin(); i != states.end(); i++)
>> +		(*i)->flatten_relative();
>> +	diffcount = 0;
>> +}
>> +
>> +void DFA::dump_diff_chain(ostream &os, map<State *, Partition> &relmap,
>> +			  Partition &chain, State *state, unsigned int &count,
>> +			  unsigned int &total, unsigned int &max)
>> +{
>> +	if (relmap[state].size() == 0) {
>> +		for (Partition::iterator i = chain.begin(); i != chain.end(); i++)
>> +			os << **i << " <- ";
>> +		os << *state << "\n";
>> +
>> +		count++;
>> +		total += chain.size() + 1;
>> +		if (chain.size() + 1 > max)
>> +			max = chain.size() + 1;
>> +	}
>> +
>> +	chain.push_back(state);
>> +	for (Partition::iterator i = relmap[state].begin(); i != relmap[state].end(); i++)
>> +		dump_diff_chain(os, relmap, chain, *i, count, total, max);
>> +	chain.pop_back();
>> +}
>> +
>> +/* Dump the DFA diff_encoding chains */
>> +void DFA::dump_diff_encode(ostream &os)
>> +{
>> +	map<State *, Partition> rel;
>> +	Partition base, chain;
>> +
>> +	for (Partition::iterator i = states.begin(); i != states.end(); i++) {
>> +		if ((*i)->flags & DiffEncodeFlag)
>> +			rel[(*i)->otherwise].push_back(*i);
>> +		else
>> +			base.push_back(*i);
>> +	}
>> +
>> +	unsigned int count = 0, total = 0, max = 0;
>> +	for (Partition::iterator i = base.begin(); i != base.end(); i++)
>> +		dump_diff_chain(os, rel, chain, *i, count, total, max);
>> +
>> +	os << base.size() << " non-differentially encoded states\n";
>> +	os << "chains: " << count - base.size() << "\n";
>> +	os << "average chain size: " << (double) (total - base.size()) / (double) (count - base.size()) << "\n";
>> +	os << "longest chain: " << max << "\n";
>> +}
>> +
>>  /**
>>   * text-dump the DFA (for debugging).
>>   */
>>
>> === modified file 'parser/libapparmor_re/hfa.h'
>> --- parser/libapparmor_re/hfa.h	2012-05-30 21:31:41 +0000
>> +++ parser/libapparmor_re/hfa.h	2013-12-10 14:10:29 +0000
>> @@ -28,10 +28,13 @@
>>  #include <map>
>>  #include <vector>
>>  
>> +#include <assert.h>
>>  #include <stdint.h>
>>  
>>  #include "expr-tree.h"
>>  
>> +#define DiffEncodeFlag 1
>> +
>>  class State;
>>  
>>  typedef map<uchar, State *> StateTrans;
>> @@ -334,6 +337,19 @@
>>  	}
>>  };
>>  
>> +/* Temporary state structure used when building differential encoding
>> + * @parents - set of states that have transitions to this state
>> + * @depth - level in the DAG
>> + * @state - back reference to state this DAG entry belongs
>> + * @rel - state that this state is relative to for differential encoding
>> + */
>> +struct DiffDag {
>> +	Partition parents;
>> +	int depth;
>> +	State *state;
>> +	State *rel;
>> +};
>> +
>>  /*
>>   * State - DFA individual state information
>>   * label: a unique label to identify the state used for pretty printing
>> @@ -352,7 +368,7 @@
>>  class State {
>>  public:
>>  	State(int l, ProtoState &n, State *other) throw(int):
>> -	label(l), perms(), trans()
>> +		label(l), flags(0), perms(), trans()
>>  	{
>>  		int error;
>>  
>> @@ -372,15 +388,30 @@
>>  	};
>>  
>>  	State *next(uchar c) {
>> -		StateTrans::iterator i = trans.find(c);
>> -		if (i != trans.end())
>> -			return i->second;
>> -		return otherwise;
>> -	};
>> +		State *state = this;
>> +		do {
>> +			StateTrans::iterator i = state->trans.find(c);
>> +			if (i != state->trans.end())
>> +				return i->second;
>> +
>> +			if (!(state->flags & DiffEncodeFlag))
>> +				return state->otherwise;
>> +			state = state->otherwise;
>> +		} while (state);
>> +
>> +		/* never reached */
>> +		assert(0);
>> +		return NULL;
>> +	}
>> +
>> +	int diff_weight(State *rel);
>> +	int make_relative(State *rel);
>> +	void flatten_relative(void);
>>  
>>  	int apply_and_clear_deny(void) { return perms.apply_and_clear_deny(); }
>>  
>>  	int label;
>> +	int flags;
>>  	perms_t perms;
>>  	StateTrans trans;
>>  	State *otherwise;
>> @@ -389,6 +420,7 @@
>>  	union {
>>  		Partition *partition;	/* used during minimization */
>>  		ProtoState proto;	/* used during creation */
>> +		DiffDag *diff;		/* used during diff encoding */
>>  	};
>>  };
>>  
>> @@ -429,12 +461,16 @@
>>  	}
>>  };
>>  
>> +
>>  /* Transitions in the DFA. */
>> -
>>  class DFA {
>>  	void dump_node_to_dfa(void);
>>  	State *add_new_state(NodeSet *nodes, State *other);
>>  	void update_state_transitions(State *state);
>> +	void dump_diff_chain(ostream &os, map<State *, Partition> &relmap,
>> +			     Partition &chain, State *state,
>> +			     unsigned int &count, unsigned int &total,
>> +			     unsigned int &max);
>>  
>>  	/* temporary values used during computations */
>>  	NodeCache anodes_cache;
>> @@ -455,11 +491,19 @@
>>  	size_t hash_trans(State *s);
>>  	void minimize(dfaflags_t flags);
>>  	int apply_and_clear_deny(void);
>> +
>> +	void diff_encode(dfaflags_t flags);
>> +	void undiff_encode(void);
>> +	void dump_diff_encode(ostream &os);
>> +
>>  	void dump(ostream &os);
>>  	void dump_dot_graph(ostream &os);
>>  	void dump_uniq_perms(const char *s);
>> +
>>  	map<uchar, uchar> equivalence_classes(dfaflags_t flags);
>>  	void apply_equivalence_classes(map<uchar, uchar> &eq);
>> +
>> +	unsigned int diffcount;
>>  	Node *root;
>>  	State *nonmatching, *start;
>>  	Partition states;
>>
>> === modified file 'parser/parser_main.c'
>> --- parser/parser_main.c	2013-10-30 00:03:23 +0000
>> +++ parser/parser_main.c	2013-12-10 14:34:13 +0000
>> @@ -194,10 +194,10 @@
>>  	  DFA_DUMP_SIMPLE_TREE },
>>  	{ 1, "stats", "Dump all compile stats",
>>  	  DFA_DUMP_TREE_STATS | DFA_DUMP_STATS | DFA_DUMP_TRANS_STATS |
>> -	  DFA_DUMP_EQUIV_STATS },
>> +	  DFA_DUMP_EQUIV_STATS | DFA_DUMP_DIFF_STATS },
>>  	{ 1, "progress", "Dump progress for all compile phases",
>>  	  DFA_DUMP_PROGRESS | DFA_DUMP_STATS | DFA_DUMP_TRANS_PROGRESS |
>> -	  DFA_DUMP_TRANS_STATS },
>> +	  DFA_DUMP_TRANS_STATS | DFA_DUMP_DIFF_PROGRESS | DFA_DUMP_DIFF_STATS },
>>  	{ 1, "dfa-progress", "Dump dfa creation as in progress",
>>  	  DFA_DUMP_PROGRESS | DFA_DUMP_STATS },
>>  	{ 1, "dfa-stats", "Dump dfa creation stats", DFA_DUMP_STATS },
>> @@ -222,6 +222,12 @@
>>  	{ 1, "equiv-stats", "Dump equivance class stats",
>>  	  DFA_DUMP_EQUIV_STATS },
>>  	{ 1, "equiv", "Dump equivance class", DFA_DUMP_EQUIV },
>> +	{ 1, "diff-encode", "Dump differential encoding",
>> +	  DFA_DUMP_DIFF_ENCODE },
>> +	{ 1, "diff-stats", "Dump differential encoding stats",
>> +	  DFA_DUMP_DIFF_STATS },
>> +	{ 1, "diff-progress", "Dump progress of differential encoding",
>> +	  DFA_DUMP_DIFF_PROGRESS | DFA_DUMP_DIFF_STATS },
>>  	{ 0, NULL, NULL, 0 },
>>  };
>>  
>> @@ -251,6 +257,8 @@
>>  	  DFA_CONTROL_TRANS_HIGH },
>>  	{ 2, "compress-fast", "do faster dfa transition table compression",
>>  	  DFA_CONTROL_TRANS_HIGH },
>> +	{ 1, "diff-encode", "Differentially encode transitions",
>> +	  DFA_CONTROL_DIFF_ENCODE },
>>  	{ 0, NULL, NULL, 0 },
>>  };
>>  
> 
> 
> Thanks!
> 
> 
>