[apparmor] [PATCH] parser - more regex unittests and fixes (was Re: [PATCH] [parsers] allow for nested alternations expressions)

Thu Nov 7 19:35:53 UTC 2013

On 11/07/2013 10:02 AM, Steve Beattie wrote:
> On Wed, Nov 06, 2013 at 04:38:09PM -0800, John Johansen wrote:
>> On 11/06/2013 04:19 PM, Steve Beattie wrote:
>>> On Mon, Nov 04, 2013 at 05:42:29PM -0800, Seth Arnold wrote:
>>>> I suspect there is more work to be done in this block of code; '[' may
>>>> need a corresponding if (incharclass == 1) test, unless this is supposed
>>>> to work: [[]  /* character class with a [ inside the class */
>>>
>> err that is perfectly valid pcre, [ in a character class is only special if
>> its part of a posix character class.
>>
>> http://perldoc.perl.org/perlrecharclass.html#Bracketed-Character-Classes
>> Special Characters Inside a Bracketed Character Class subsection of
>> Bracketed Character Classes
>>
>> and ] has the exception that it isn't special if its the first character
>> within the character set to match (where a leading ^ isn't the first character
>> to match because it indicates inverting the class
>>   ie.
>>    []]
>>    [^]]
>> are valid
> 
> Okay, patch withdrawn for consideration. How much of pcre's character
> classes are you hoping to support?  Because there's a bunch of it
> that we don't (POSIX character classes, etc.)...
> 
> 
Good question. I'm not really sure. I know I don't want to support all of it
but I would like to have more than we have. I think the set available in
aare globbing should be smaller than the set we make available in the
pcre syntax.

eg.
@{var} is the variable expansion in aare, but for the pcre syntax I was
considering using \@{var}

I know on the pcre side I want positive and negative lookaheads and
back references (though it would not use the confusing \# syntax of
pcre, but might use \g#. I'm not sure it makes sense to expose these to
aare

I think it would be nice to support some of the posix character classes
and maybe \d \D.

The big ones I want is a way to escape into pcre syntax and back to aare
and accept permission embedding, which save a fair bit of duplication and
extra state creation (and then removal) on the backend.
Eg.
for mount instead of having to provide 5 rules
part1 <perm>
part1\0part2 <perm>
part1\0part2\0part3 <perm>
part1\0part2\0part3\0part4 <perm>
part1\0part2\0part3\0part4\0part5 <perm>

we could get away with ecoding a single rule
part1\<perm>\0part2\<perm>\0part3\<perm>\0part4\<perm>\0part5\<perm>

I think there are 2 questions to answer, what set should we provide
for the pcre style syntax, and what subset for aare?

Below are some notes a have from the last time I was looking at it
(not that they will really clear things up any)

---

\@{variable}  variable reference
\^	?start regex
\$	?end regex (return to globbing)
\#{perm}    ?embedded perm
\-	?logical set operation minus?
\&	?logical set operation and?

see man pcrepattern

\	general escape character
^	assert start of string
$	assert end of string
.	match any char including newline
[]	character class
[^]	negative character class
[x-y]  range
[[:xxx:]]	POSIX named set
[[:^xxx:]]	negative POSIX named set
()	subpattern
(?)	extended mean for sub pattern
|	alternation
?	0 or 1 match, greedy, equiv to {0,1}
+	1 or more, greedy, equiv to {1,}
*	0 or more, greedy, equiv to {0,}
{n}	min/max qualifier exactly n
{,n}	min/max qualifier up to n
{n,m}	min/max qualifier at least n, no more than m, greedy
{n,}	min/max qualifier n or more, greedy

\a	alarm - hex 07
\e	escape - hex 1B
\f	formfeed - hex 0C
\n	newline - hex 0A
\r	carriage return - hex 0D
\t	tab - hex 09
\ddd	octal code
\xhh	hex code

\cx	control-x where x is any ascii character

.	any character including newline
\b	backspace
\d	decimal digit  [0-9]
\D	not decimal digit [^0-9]
\h	horizontal whitespace character
\H	not horizontal whitespace character
\N	not a newline
\s	white space character
\S	not a white space character
\v	vertical whitespace character
\V	not a vertical whitespace character
\w	a "word" character
\W	not a "word" character
\l	lower case
\L	
\u
\U	upper case
\p	property
\P	not Property
\R	Unicode newline sequence

(?= )	look ahead assertion
(?! )	negative look ahead assertion
(?<= )	look behind assertion
(?<! )	negative look behind assertion
(?(conditional)yes-pattern)
(?(conditional)yes-pattern|no-pattern)

({ } )	callout to fn

\p and \P   reserved

NOTE: \n can NOT be used as a back reference

\gn	back reference by number
\g{n}	back reference by number
\g{-n}	relative back reference by number
\k<name>	 back reference by name
\k'name'	 back reference by name
\g{name}	 back reference by name
\k{name}	 back reference by name (.Net)
(?P=name)	 back reference by name (Python)