[apparmor] [PATCH] parser - more regex unittests and fixes (was Re: [PATCH] [parsers] allow for nested alternations expressions)
John Johansen
john.johansen at canonical.com
Thu Nov 7 19:35:53 UTC 2013
On 11/07/2013 10:02 AM, Steve Beattie wrote:
> On Wed, Nov 06, 2013 at 04:38:09PM -0800, John Johansen wrote:
>> On 11/06/2013 04:19 PM, Steve Beattie wrote:
>>> On Mon, Nov 04, 2013 at 05:42:29PM -0800, Seth Arnold wrote:
>>>> I suspect there is more work to be done in this block of code; '[' may
>>>> need a corresponding if (incharclass == 1) test, unless this is supposed
>>>> to work: [[] /* character class with a [ inside the class */
>>>
>> err that is perfectly valid pcre, [ in a character class is only special if
>> its part of a posix character class.
>>
>> http://perldoc.perl.org/perlrecharclass.html#Bracketed-Character-Classes
>> Special Characters Inside a Bracketed Character Class subsection of
>> Bracketed Character Classes
>>
>> and ] has the exception that it isn't special if its the first character
>> within the character set to match (where a leading ^ isn't the first character
>> to match because it indicates inverting the class
>> ie.
>> []]
>> [^]]
>> are valid
>
> Okay, patch withdrawn for consideration. How much of pcre's character
> classes are you hoping to support? Because there's a bunch of it
> that we don't (POSIX character classes, etc.)...
>
>
Good question. I'm not really sure. I know I don't want to support all of it
but I would like to have more than we have. I think the set available in
aare globbing should be smaller than the set we make available in the
pcre syntax.
eg.
@{var} is the variable expansion in aare, but for the pcre syntax I was
considering using \@{var}
I know on the pcre side I want positive and negative lookaheads and
back references (though it would not use the confusing \# syntax of
pcre, but might use \g#. I'm not sure it makes sense to expose these to
aare
I think it would be nice to support some of the posix character classes
and maybe \d \D.
The big ones I want is a way to escape into pcre syntax and back to aare
and accept permission embedding, which save a fair bit of duplication and
extra state creation (and then removal) on the backend.
Eg.
for mount instead of having to provide 5 rules
part1 <perm>
part1\0part2 <perm>
part1\0part2\0part3 <perm>
part1\0part2\0part3\0part4 <perm>
part1\0part2\0part3\0part4\0part5 <perm>
we could get away with ecoding a single rule
part1\<perm>\0part2\<perm>\0part3\<perm>\0part4\<perm>\0part5\<perm>
I think there are 2 questions to answer, what set should we provide
for the pcre style syntax, and what subset for aare?
Below are some notes a have from the last time I was looking at it
(not that they will really clear things up any)
---
\@{variable} variable reference
\^ ?start regex
\$ ?end regex (return to globbing)
\#{perm} ?embedded perm
\- ?logical set operation minus?
\& ?logical set operation and?
see man pcrepattern
\ general escape character
^ assert start of string
$ assert end of string
. match any char including newline
[] character class
[^] negative character class
[x-y] range
[[:xxx:]] POSIX named set
[[:^xxx:]] negative POSIX named set
() subpattern
(?) extended mean for sub pattern
| alternation
? 0 or 1 match, greedy, equiv to {0,1}
+ 1 or more, greedy, equiv to {1,}
* 0 or more, greedy, equiv to {0,}
{n} min/max qualifier exactly n
{,n} min/max qualifier up to n
{n,m} min/max qualifier at least n, no more than m, greedy
{n,} min/max qualifier n or more, greedy
\a alarm - hex 07
\e escape - hex 1B
\f formfeed - hex 0C
\n newline - hex 0A
\r carriage return - hex 0D
\t tab - hex 09
\ddd octal code
\xhh hex code
\cx control-x where x is any ascii character
. any character including newline
\b backspace
\d decimal digit [0-9]
\D not decimal digit [^0-9]
\h horizontal whitespace character
\H not horizontal whitespace character
\N not a newline
\s white space character
\S not a white space character
\v vertical whitespace character
\V not a vertical whitespace character
\w a "word" character
\W not a "word" character
\l lower case
\L
\u
\U upper case
\p property
\P not Property
\R Unicode newline sequence
(?= ) look ahead assertion
(?! ) negative look ahead assertion
(?<= ) look behind assertion
(?<! ) negative look behind assertion
(?(conditional)yes-pattern)
(?(conditional)yes-pattern|no-pattern)
({ } ) callout to fn
\p and \P reserved
NOTE: \n can NOT be used as a back reference
\gn back reference by number
\g{n} back reference by number
\g{-n} relative back reference by number
\k<name> back reference by name
\k'name' back reference by name
\g{name} back reference by name
\k{name} back reference by name (.Net)
(?P=name) back reference by name (Python)
More information about the AppArmor
mailing list