[apparmor] [PATCH 09/10] Split the nodeset used in computing the dfa into two sets, accepting and non-accepting, and have the proto-state use them.

Tue Nov 1 04:24:55 UTC 2011

On 10/31/2011 06:50 PM, Seth Arnold wrote:
>> Forgive the lengthy, and meandering of the reply.  Hopefully it will fully
>> explain what is going on, and why certain choices where made :)
> 
> Thanks for the effort :) It all makes some amount of sense until I
> start re-reading the code again and trying to line up what you wrote
> with what I can read. It _sounds_ a little like the effort required to
> perform state minimization but that might just be me grasping at
> straws. But it sure sounds like _you_ know what's going on, which is
> reassuring. :)
> 
Hrmmm not really.  Minimization looks at a state's transitions, and who
it transitions to, to find functionally equivalent states.  All we are
doing here is reducing memory usage.  The previous implementation already
used sets and maps to remove the duplicates.

If you look at the states created pre and post split, they generate the
same number of states, but where the duplicate elimination happens has
been moved around some.  In fact, with the patches we have introduced
some redundancy in the elimination.  Most of the nnodes eliminations match
the eliminations done in proto-states, and that is responsible for the
slow down.  I am looking into a couple of ways to improve or fix this.

The case where we win is the accept states because they where already
being shared in the expr tree (see below).  So instead of generating
long unique sets of nnodes with anodes that are duplicated in another
protostate, we can remove the anode dups early.  This is what is
responsible for the slight memory reduction in the split patch, with
the rest coming in the nodevec patch

* There are a couple of things we are doing in the front end of the
  DFA engine to reduce the states being generated that are similar
  to minimization.  In that they reduce the number of states generated
  so minimization has less work todo.

1. From the beginning (2.10 we made the observation that accept nodes
   could be shared. This makes the expr tree generated actually be a DAG,
   all other important nodes can't be shared, as position is important (if
   you interested look in the Dragon book, you can borrow my copy if you
   want).

   This reduces the number of unique Nodes in the set which makes more
   generated sets match, which means we generate less nodes.

   Interestingly we are partly undoing this to track permissions better,
   but we are being very selective and making sure to put the appropriate
   optimizations node splitting etc, in place before this happens, so
   the impact of the added functionality will be minimal.

2. With 2.3 we introduced simple tree factoring to eliminate as much
   redundancy as we could  (eg. (a | a) -> a

   When you dump stats this is reflected by
   expr tree: c 12263, [] 60, [^] 1074, | 2538, + 0, * 668, . 79, cat 13433
   simplified expr tree: c 6127, [] 47, [^] 351, | 774, + 0, * 206, . 4, cat 5978

   you can see that we almost halved the number of character nodes c
   drop the number of character sets [] by some (60 -> 47)

   significantly reduced inverted character classes [^] (1074) -> 351

   And you can do the same comparisons for
   | - alternations
   + - pcre + == (w)(w)*
   * - pcre * aka kleen closure
   . - match any char aka []

   Doing this early elimination is a big win, and is similar to minimization
   it just goes about it at an early stage, and can't catch everything.
   Unfortunately its really inefficient in some cases and needs to be rewritten.

   However the rewrite of it will improve it, not just in factoring speed
   but in the number of nodes eliminated, which means smaller nodes sets in
   DFA construction, which will reduce memory usage and make things faster yet
   again.

>> Created dfa: states 12160 proto { cache: size=12160 dups=71797 longest=324 avg=28 }, nnodes { cache: size=11841 dups=72116 longest=322 avg=22 }, anodes { cache: size=55 dups=64655 longest=20 avg=7 }
> 
> So this portion here sure sounds compelling -- is this really reducing
> a DFA of 12k states down to 55 total accepting nodes -- when the more
> "naive" construction would have resulted in 64k total accepting nodes?
> 
nope, or maybe sort of :)

We are actually generating about 9,000 accept states.  Sadly we seem to not be
providing those stats for the creation but minimization provides something
to look at
  Minimized dfa: final partitions 9039 (accept 6918)	init 3234 (accept 2643)
So for the minimized version 6918 of the 9039 states contain some accept
information.

What it does say is we only 55 unique sets of accept permissions, for this
DFA construction.  The final unique set may be a different number do to state
merging in minimization, and you will get to see that soon.

A completely naive construction would do something worse. Our previous
implementation wasn't completely naive, and would eliminate dups where both
the nnodes and anodes together matched.

Previously you would have just seen
  Created dfa: states 12161,	eliminated duplicates 71795,	protostate sets: longest 324, avg 28

which didn't contain the information of accept node duplicates, just the pairing
of the nnodes and anodes together.  Note: the numbers are slightly different because
of the trap state handling.

The naive implementation wouldn't even have the shared accept states and factoring,
and I can't tell you what that would result in as I can't turn off shared accept
states (yet).  But turning off factoring (-O no-expr-simplify) results in

  Created dfa: states 39776,	eliminated duplicates 229603,	protostate sets: longest 677, avg 61

or with the split patches

  Created dfa: states 39775 proto { cache: size=39775 dups=229605 longest=677 avg=61 }, nnodes { cache: size=39246 dups=230134 longest=675 avg=54 }, anodes { cache: size=55 dups=221530 longest=20 avg=7 }