[apparmor] towards an hfa

Sun Nov 21 13:38:36 GMT 2010

AppArmor currently uses a dfa for its pattern matching, this however
can lead to a rather large memory foot print for policy, result in
policy that is not computable, and can't support some of the features
we want.

I have recommended moving towards a hybrid finite automata in the past
and this is a rough outline of what I believe needs to be done and the
steps to get there.  The steps are outlined roughly in order but there
can be some overlap and reordering.

= Recap =
using a dfa has some advantages
- kN matching runtime for any string of length N
- small matching algorithm, easy to verify
- no runtime bound checks, tables can be computed and verified in
  userspace, and kernel can do table consistency checks at load time.

The dfa has certain constraints we would like to work around
- policy size, which is affected by
  - state count (which results in larger policy) due to state explosion
    caused by
    - counting constraints
    - repetitive subpatterns (pcre .* [^x]* etc)
  - dfa storage format
  - redundancy elimination (pre format compression)
    - we reduce to minimum states per profile but profiles could share
      dfas
  - effectiveness compression routine (current routine is an N2
    approximation of np complete packing)
- no internal variables
- no back reference support (super set of internal variables)

Also the current dfa format is not as efficient as it could be
- compression can't take advantage of state differences
- apparmor accept states are directly encoded into the accept
  table.  Making them larger than they need to be and not as
  flexible as we need them to be.

= Goals =
- compilation
  - faster
  - smaller peak memory foot print

- policy
  - smaller size
  - extensible permissions (allow for new permission features)
  - support proposed patterns {^* } and {^** }
  - in pattern variables  eg. /proc/@{PID}/**
  - counting constraints 
  - back references

- run time
  - faster average match, maintaining an O(kN) match time
  - smaller memory foot print
  - maintain consistency load time consistency checks
  - maintain known boundable runtime memory resources

Recommended plan of action

= Phase 1 - DFA improvements =
0. General dfa code cleanups, should happen through out as is applicable
   and opportune, ie. while working on a particular area of code clean
   it up.

1. Improve current dfa generation code
   1.1 expr tree improvements
   1.1.1 character set merging. Reduces protosets size, also reduces
         number of important nodes to compare against when factoring
   1.1.2 Improve factoring.  Code does a lot more comparisons than is
         necessary.  Factoring is crucial to reducing extraneous
         state generation during dfa creation.
   1.2 dfa memory use reduction
       - convert to using array of nodes instead of set of nodes
   1.3 fast bit vector search for table comb compression
       - speed up current compression, currently >50% of dfa generation
         time

2. Split the accept state encoding out into its own table, using the
   accept state as an index to the permission table.  This will allow
   us to share common accept permissions and expand the accept
   permission table without its memory foot print being directly
   affected by state count.  Eg. for the test evince profile we have
   about 60 unique accept permissions replicated across 9000 states.
   Also Note to expand the permission set offered by AppArmor, the
   old mapping must be extended or augmented is some way as it is
   currently maxed out.
   - this will allow us to drop the accept table down to 1 16bit entry
     per state, from the current 2 32 bit entries.
     Backwards compatibility can be maintained by load time mapping.
   - this allows the expansion compression permission entries into
     a more manageable and expandable form
   - this allows exposing internal functionality and permissions that
     can't be done without reworking the current mappings.
   - this allows us to get rid of the run time mapping we currently
     do and replace it with load time mapping.
   2.1 patch kernel to convert old mapping to new internal mappings
     at load time.
   2.2 patch parser to generate dfas using new format
   2.3 patch kernel to load new format directly when presented.

3. Logically group tables in the kernel
   Currently the dfa is split into distinct tables
      accept1, accept2, default, base, next and check
   which means doing 6 different table lookups per state (step 1 above
   gets rid of accept2 reducing this to only 5)

   The tables can be merge at load time into tables of structs.
   table1: accept1, default, base
   table2: next, check

   This will result in a performance increase for 2 reasons:
   there will be only 2 variable based indexes (the other 3 will be
   constant based), this will move the table entries together onto
   the same cache line, currently they exist on 6 different cache
   lines.

4. Dfa dominance to resolve permission collisions
   This is needed to properly handle overlapping x permissions

5. support dfa logical set operations & | - merge
   This will allow combining computed dfas into a single dfa

6. split rule set into multiple rule sets, and compile to multiple
   dfas.  Use dfa set logic to recombine.

   proposed {^* } and {^** } operator aren't possible without this
   also allows separating deny and allow rules, and doing a clean
   logical subtraction.
   6.1 split deny and allow into separate rule sets
   6.2 add support for {^* }, and {^**}

7. Split dfa rules into logical chunks that can be reused and
   compiled individually.  eg.  an compile include file independent
   of profile.  And then combine with profile rules dfa to make
   final dfa.

   Make all profiles being processed at the same to share the
   chunks (currently only profiles within the same file).

8. Support combining multiple dfas into a single table.
   Currently each profile gets its own dfa and many profiles dfas
   are actually very similar, and could be combined together into
   a single shared dfa, with each profile using its own accept
   table (yet another reason to split the table off).

   8.1 Add dfa combine support to the compiler.  The combination
       can be done at any of: pre regex, post regex, or post dfa
       pre table compression (experiment to determine what is best).
   8.2 Add support to combine dfas for profiles that are being
       processed together.  Currently this would be everything in
       the same profile file.
   8.3 Write out policies with a shared dfa header.
   8.4 extend kernel to load policy (it already supports the concept
       of shared dfa, as the dfa is ref counted).

9. combine multiple dfas into a single table, single start state
  While the base dfas can share a common start state, extended match
  rules will either need their own dfas, or to allow the dfa table
  to have multiple start states.

  multiple start states will reduce duplication and accounting/tracking
  overhead and keep the code simpler.

  9.1 update parser dfa combine code to allow specifying combining or
      keeping start state separate.
      update start state to hold a set of states
      update unreachable code elimination, dfa minimization and table
      compression to push all start states onto the queue.
  9.2 emit optional start state parameter to binary policy (kernel
      already supports multiple start states so nothing to do here)

10. Update init scripts to pass profiles to the parser in a single
   invocation.  This will allow best combination of policy into a
   single dfa.

11. Improve state compression by using default differential state
   encoding + with control for non-failure default, computed with
   a maximal spanning tree giving a runtime bound of O(2kN), and
   compute bound of O(n2).

   This will provide a nice short term to mid term improvement on
   compiled policy size.
   11.1 new phase after dfa minimization before table compression
       policy compiler computes maximal spanning
       tree, and differentially encodes dfa.
   11.2 kernel matching routine extended to support state
        differential encoding

If improved compression is deemed immediately important phase 11 can
be moved up, but I am currently off the mind that it should be the
last of the dfa improvements.

= Phase 2 - move to hybrid fa =

0. Research hybrid fa (research to run parallel with other AA development)
   0.1 - Finish up with reach papers
   0.2 - update dfa generation code to generate hybrid fa
   0.3 - Add phase to handle proper splitting of hfa into sections
   0.4 - Add in recombination of tails
   0.5 - settle on compressed table format (with eye towards future features)
   0.6 - update policy (and its version) to use new hybrid fa format
1. Merge in base hybrid fa - this will only provide the same functionality
   as the dfa, but should reduce # of states and memory use.
2. Add support for counting constraints
3. Add support for variables
4. Add support for back references
5. support alphabet reduction - further compression technique necessary to
   support stride doubling.
6. support stride doubling - performance technique enabling transitioning
   of two states at a time.