[apparmor] [PATCH] parser - fix trailing \\ in regex (was Re: [PATCH] [parsers] allow for nested alternations expressions)

Fri Nov 8 01:00:25 UTC 2013

On 11/07/2013 04:48 PM, Steve Beattie wrote:
> On Wed, Nov 06, 2013 at 04:17:02PM -0800, John Johansen wrote:
>> On 11/06/2013 04:10 PM, Steve Beattie wrote:
>>> On Tue, Nov 05, 2013 at 12:29:56PM -0800, John Johansen wrote:
>>>> On 11/05/2013 11:33 AM, Steve Beattie wrote:
>>>>> On Mon, Nov 04, 2013 at 05:28:22PM -0800, John Johansen wrote:
>>>>>> On 11/04/2013 04:34 PM, Steve Beattie wrote:
>>>>>>> Well, part of the slowdown was me writing some unit tests for that
>>>>>>> function. Here's the patch that does that:
>>>>> [SNIP]
>>>>>>> +	//MY_REGEX_TEST("\\", "\\", ePatternBasic);
>>>>>>> +	MY_REGEX_TEST("\\\\", "\\\\", ePatternBasic);
>>>>>>> +	//MY_REGEX_TEST("\\blort", "\\blort", ePatternBasic);
>>>>>>> +	MY_REGEX_TEST("\\\\blort", "\\\\blort", ePatternBasic);
>>>>>>> +	//MY_REGEX_TEST("blort\\", "blort\\", ePatternBasic);
>>>>>> why are these 3 commented out?
>>>>>
>>>>> Ah, right, I'd forgotten about these. They're commented out
>>>>> because as-is, they fail; however I wasn't sure if that was the
>>>>> correct expected output. Basically, what happens is that if whatever
>>>>> follows isn't expecting an escape character, then the escaping '\' is
>>>>> dropped. Thus, the current behavior is that '\\' becomes '' and both
>>>>> '\\blort' and 'blort\\' become 'blort'.
>>>>>
>>>>> The question is, is this a bug? I think so... but I'm willing to hear
>>>>> countering arguments.
>>>>>
>>>> O_o  Its a bug
>>>
>>> Alright, here's a patch that fixes the issue:
>>>
>>> Signed-off-by: Steve Beattie <steve at nxnw.org>
>>> ---
>>>  parser/parser_regex.c |  140 +++++++++++++++++++++++++++++++++++++++-----------
>>>  1 file changed, 110 insertions(+), 30 deletions(-)
>>>
>>> Index: b/parser/parser_regex.c
>>> ===================================================================
>>> --- a/parser/parser_regex.c
>>> +++ b/parser/parser_regex.c
>>> @@ -326,9 +347,14 @@ static pattern_t convert_aaregex_to_pcre
>>>  		case '(':
>>>  		case ')':
>>>  			STORE("\\", dptr, 1);
>>> -			// fall through to default
>>> +			STORE(sptr, dptr, 1);
>>> +			break;
>>>  
>>>  		default:
>>> +			if (bEscape) {
>>> +				/* just a regular backslash */
>>> +				STORE("\\", dptr, 1);
>>> +			}
>>>  			STORE(sptr, dptr, 1);
>>>  			break;
>>>  		}	/* switch (*sptr) */
>>> @@ -344,6 +370,10 @@ static pattern_t convert_aaregex_to_pcre
>>>  		       progname);
>>>  	}
>>>  
>>> +	if ((error == e_no_error) && bEscape) {
>>> +		/* just a regular backslash */
>>> +		STORE("\\", dptr, 1);
>>> +	}
>>
>> hrmmm, I am thinking this should be a bug
> 
> You mean a parse error?
> 
yeah

>> otherwise if we ever expand the set of supported escape characters we
>> will have issue.
> 
> So, the way it works with perl/pcre[1] is that, if a \ is followed by:
> 
>   - a punctuation character, it eliminates the specialness of that
>     character. '\' followed by punctuation in pcre is supposed to
>     never have a special meaning.
> 
right, though I think we may want to use some special characters escaped
eg. In aare (the globbing syntax)
  @{ } is a variable,
do we make @{ } also be the variable in the extended syntax or try to
stick closer to pcre/other regex languages, and hide it behind an
escape.

>   - an alphanumeric character: can have a special meaning, but if it
>     doesn't, then it just matches the character. Perl is supposed to
>     give warnings for such usage if warnings are turned on.
> 
I think this should just be a parse error, but I could live with warning

>   - a naked trailing '\' results in an error ("Trailing \ in regex ...")
>     with both perl and pcregrep
> 
yeah

> Now, we accept (or intend to accept) pcre style for character classes,
> but not generally for our regexs, so it's not a strict requirement that
> we match behavior here.
> 
true, but we have plans for an extended true regex vs. globbing syntax and
it would be nice to try and be consistent between them

> Anyway, the following patch throws an error if an escaping slash is the
> trailing character and otherwise just drops it for non-special
> characters, and adjusts the relevant test cases.
> 
> [1] info gleaned from http://perldoc.perl.org/perlrebackslash.html
>     except where noted.
> 
> Signed-off-by: Steve Beattie <steve at nxnw.org>

Its good as far as it goes, but I really want at least a warning for the
embedded \a, and would like to settle our plans a little for the extended
syntax so we can make sure the syntax for the aare is what we want.

> ---
>  parser/parser_regex.c |   12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> Index: b/parser/parser_regex.c
> ===================================================================
> --- a/parser/parser_regex.c
> +++ b/parser/parser_regex.c
> @@ -344,6 +344,12 @@ static pattern_t convert_aaregex_to_pcre
>  		       progname);
>  	}
>  
> +	if ((error == e_no_error) && bEscape) {
> +		/* trailing backslash quote */
> +		error = e_parse_error;
> +		PERROR(_("%s: Regex error: trailing '\\' escape character\n"),
> +		       progname);
> +	}
>  	/* anchor end and terminate pattern string */
>  	if ((error == e_no_error) && anchor) {
>  		STORE("$" , dptr, 1);
> @@ -1304,11 +1310,11 @@ static int test_aaregex_to_pcre(void)
>  
>  	MY_REGEX_TEST("/most/basic/test", "/most/basic/test", ePatternBasic);
>  
> -	MY_REGEX_TEST("\\", "\\", ePatternBasic);
> +	MY_REGEX_FAIL_TEST("\\");
>  	MY_REGEX_TEST("\\\\", "\\\\", ePatternBasic);
> -	MY_REGEX_TEST("\\blort", "\\blort", ePatternBasic);
> +	MY_REGEX_TEST("\\blort", "blort", ePatternBasic);
>  	MY_REGEX_TEST("\\\\blort", "\\\\blort", ePatternBasic);
> -	MY_REGEX_TEST("blort\\", "blort\\", ePatternBasic);
> +	MY_REGEX_FAIL_TEST("blort\\");
>  	MY_REGEX_TEST("blort\\\\", "blort\\\\", ePatternBasic);
>  	MY_REGEX_TEST("*", "[^/\\x00]*", ePatternRegex);
>  	MY_REGEX_TEST("blort*", "blort[^/\\x00]*", ePatternRegex);
> 
> 
>