[apparmor] towards an hfa
John Johansen
john.johansen at canonical.com
Sun Nov 21 13:38:36 GMT 2010
AppArmor currently uses a dfa for its pattern matching, this however
can lead to a rather large memory foot print for policy, result in
policy that is not computable, and can't support some of the features
we want.
I have recommended moving towards a hybrid finite automata in the past
and this is a rough outline of what I believe needs to be done and the
steps to get there. The steps are outlined roughly in order but there
can be some overlap and reordering.
= Recap =
using a dfa has some advantages
- kN matching runtime for any string of length N
- small matching algorithm, easy to verify
- no runtime bound checks, tables can be computed and verified in
userspace, and kernel can do table consistency checks at load time.
The dfa has certain constraints we would like to work around
- policy size, which is affected by
- state count (which results in larger policy) due to state explosion
caused by
- counting constraints
- repetitive subpatterns (pcre .* [^x]* etc)
- dfa storage format
- redundancy elimination (pre format compression)
- we reduce to minimum states per profile but profiles could share
dfas
- effectiveness compression routine (current routine is an N2
approximation of np complete packing)
- no internal variables
- no back reference support (super set of internal variables)
Also the current dfa format is not as efficient as it could be
- compression can't take advantage of state differences
- apparmor accept states are directly encoded into the accept
table. Making them larger than they need to be and not as
flexible as we need them to be.
= Goals =
- compilation
- faster
- smaller peak memory foot print
- policy
- smaller size
- extensible permissions (allow for new permission features)
- support proposed patterns {^* } and {^** }
- in pattern variables eg. /proc/@{PID}/**
- counting constraints
- back references
- run time
- faster average match, maintaining an O(kN) match time
- smaller memory foot print
- maintain consistency load time consistency checks
- maintain known boundable runtime memory resources
Recommended plan of action
= Phase 1 - DFA improvements =
0. General dfa code cleanups, should happen through out as is applicable
and opportune, ie. while working on a particular area of code clean
it up.
1. Improve current dfa generation code
1.1 expr tree improvements
1.1.1 character set merging. Reduces protosets size, also reduces
number of important nodes to compare against when factoring
1.1.2 Improve factoring. Code does a lot more comparisons than is
necessary. Factoring is crucial to reducing extraneous
state generation during dfa creation.
1.2 dfa memory use reduction
- convert to using array of nodes instead of set of nodes
1.3 fast bit vector search for table comb compression
- speed up current compression, currently >50% of dfa generation
time
2. Split the accept state encoding out into its own table, using the
accept state as an index to the permission table. This will allow
us to share common accept permissions and expand the accept
permission table without its memory foot print being directly
affected by state count. Eg. for the test evince profile we have
about 60 unique accept permissions replicated across 9000 states.
Also Note to expand the permission set offered by AppArmor, the
old mapping must be extended or augmented is some way as it is
currently maxed out.
- this will allow us to drop the accept table down to 1 16bit entry
per state, from the current 2 32 bit entries.
Backwards compatibility can be maintained by load time mapping.
- this allows the expansion compression permission entries into
a more manageable and expandable form
- this allows exposing internal functionality and permissions that
can't be done without reworking the current mappings.
- this allows us to get rid of the run time mapping we currently
do and replace it with load time mapping.
2.1 patch kernel to convert old mapping to new internal mappings
at load time.
2.2 patch parser to generate dfas using new format
2.3 patch kernel to load new format directly when presented.
3. Logically group tables in the kernel
Currently the dfa is split into distinct tables
accept1, accept2, default, base, next and check
which means doing 6 different table lookups per state (step 1 above
gets rid of accept2 reducing this to only 5)
The tables can be merge at load time into tables of structs.
table1: accept1, default, base
table2: next, check
This will result in a performance increase for 2 reasons:
there will be only 2 variable based indexes (the other 3 will be
constant based), this will move the table entries together onto
the same cache line, currently they exist on 6 different cache
lines.
4. Dfa dominance to resolve permission collisions
This is needed to properly handle overlapping x permissions
5. support dfa logical set operations & | - merge
This will allow combining computed dfas into a single dfa
6. split rule set into multiple rule sets, and compile to multiple
dfas. Use dfa set logic to recombine.
proposed {^* } and {^** } operator aren't possible without this
also allows separating deny and allow rules, and doing a clean
logical subtraction.
6.1 split deny and allow into separate rule sets
6.2 add support for {^* }, and {^**}
7. Split dfa rules into logical chunks that can be reused and
compiled individually. eg. an compile include file independent
of profile. And then combine with profile rules dfa to make
final dfa.
Make all profiles being processed at the same to share the
chunks (currently only profiles within the same file).
8. Support combining multiple dfas into a single table.
Currently each profile gets its own dfa and many profiles dfas
are actually very similar, and could be combined together into
a single shared dfa, with each profile using its own accept
table (yet another reason to split the table off).
8.1 Add dfa combine support to the compiler. The combination
can be done at any of: pre regex, post regex, or post dfa
pre table compression (experiment to determine what is best).
8.2 Add support to combine dfas for profiles that are being
processed together. Currently this would be everything in
the same profile file.
8.3 Write out policies with a shared dfa header.
8.4 extend kernel to load policy (it already supports the concept
of shared dfa, as the dfa is ref counted).
9. combine multiple dfas into a single table, single start state
While the base dfas can share a common start state, extended match
rules will either need their own dfas, or to allow the dfa table
to have multiple start states.
multiple start states will reduce duplication and accounting/tracking
overhead and keep the code simpler.
9.1 update parser dfa combine code to allow specifying combining or
keeping start state separate.
update start state to hold a set of states
update unreachable code elimination, dfa minimization and table
compression to push all start states onto the queue.
9.2 emit optional start state parameter to binary policy (kernel
already supports multiple start states so nothing to do here)
10. Update init scripts to pass profiles to the parser in a single
invocation. This will allow best combination of policy into a
single dfa.
11. Improve state compression by using default differential state
encoding + with control for non-failure default, computed with
a maximal spanning tree giving a runtime bound of O(2kN), and
compute bound of O(n2).
This will provide a nice short term to mid term improvement on
compiled policy size.
11.1 new phase after dfa minimization before table compression
policy compiler computes maximal spanning
tree, and differentially encodes dfa.
11.2 kernel matching routine extended to support state
differential encoding
If improved compression is deemed immediately important phase 11 can
be moved up, but I am currently off the mind that it should be the
last of the dfa improvements.
= Phase 2 - move to hybrid fa =
0. Research hybrid fa (research to run parallel with other AA development)
0.1 - Finish up with reach papers
0.2 - update dfa generation code to generate hybrid fa
0.3 - Add phase to handle proper splitting of hfa into sections
0.4 - Add in recombination of tails
0.5 - settle on compressed table format (with eye towards future features)
0.6 - update policy (and its version) to use new hybrid fa format
1. Merge in base hybrid fa - this will only provide the same functionality
as the dfa, but should reduce # of states and memory use.
2. Add support for counting constraints
3. Add support for variables
4. Add support for back references
5. support alphabet reduction - further compression technique necessary to
support stride doubling.
6. support stride doubling - performance technique enabling transitioning
of two states at a time.
More information about the AppArmor
mailing list