[apparmor] [PATCH] parser - fix trailing \\ in regex (was Re: [PATCH] [parsers] allow for nested alternations expressions)
John Johansen
john.johansen at canonical.com
Fri Nov 8 01:00:25 UTC 2013
On 11/07/2013 04:48 PM, Steve Beattie wrote:
> On Wed, Nov 06, 2013 at 04:17:02PM -0800, John Johansen wrote:
>> On 11/06/2013 04:10 PM, Steve Beattie wrote:
>>> On Tue, Nov 05, 2013 at 12:29:56PM -0800, John Johansen wrote:
>>>> On 11/05/2013 11:33 AM, Steve Beattie wrote:
>>>>> On Mon, Nov 04, 2013 at 05:28:22PM -0800, John Johansen wrote:
>>>>>> On 11/04/2013 04:34 PM, Steve Beattie wrote:
>>>>>>> Well, part of the slowdown was me writing some unit tests for that
>>>>>>> function. Here's the patch that does that:
>>>>> [SNIP]
>>>>>>> + //MY_REGEX_TEST("\\", "\\", ePatternBasic);
>>>>>>> + MY_REGEX_TEST("\\\\", "\\\\", ePatternBasic);
>>>>>>> + //MY_REGEX_TEST("\\blort", "\\blort", ePatternBasic);
>>>>>>> + MY_REGEX_TEST("\\\\blort", "\\\\blort", ePatternBasic);
>>>>>>> + //MY_REGEX_TEST("blort\\", "blort\\", ePatternBasic);
>>>>>> why are these 3 commented out?
>>>>>
>>>>> Ah, right, I'd forgotten about these. They're commented out
>>>>> because as-is, they fail; however I wasn't sure if that was the
>>>>> correct expected output. Basically, what happens is that if whatever
>>>>> follows isn't expecting an escape character, then the escaping '\' is
>>>>> dropped. Thus, the current behavior is that '\\' becomes '' and both
>>>>> '\\blort' and 'blort\\' become 'blort'.
>>>>>
>>>>> The question is, is this a bug? I think so... but I'm willing to hear
>>>>> countering arguments.
>>>>>
>>>> O_o Its a bug
>>>
>>> Alright, here's a patch that fixes the issue:
>>>
>>> Signed-off-by: Steve Beattie <steve at nxnw.org>
>>> ---
>>> parser/parser_regex.c | 140 +++++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 110 insertions(+), 30 deletions(-)
>>>
>>> Index: b/parser/parser_regex.c
>>> ===================================================================
>>> --- a/parser/parser_regex.c
>>> +++ b/parser/parser_regex.c
>>> @@ -326,9 +347,14 @@ static pattern_t convert_aaregex_to_pcre
>>> case '(':
>>> case ')':
>>> STORE("\\", dptr, 1);
>>> - // fall through to default
>>> + STORE(sptr, dptr, 1);
>>> + break;
>>>
>>> default:
>>> + if (bEscape) {
>>> + /* just a regular backslash */
>>> + STORE("\\", dptr, 1);
>>> + }
>>> STORE(sptr, dptr, 1);
>>> break;
>>> } /* switch (*sptr) */
>>> @@ -344,6 +370,10 @@ static pattern_t convert_aaregex_to_pcre
>>> progname);
>>> }
>>>
>>> + if ((error == e_no_error) && bEscape) {
>>> + /* just a regular backslash */
>>> + STORE("\\", dptr, 1);
>>> + }
>>
>> hrmmm, I am thinking this should be a bug
>
> You mean a parse error?
>
yeah
>> otherwise if we ever expand the set of supported escape characters we
>> will have issue.
>
> So, the way it works with perl/pcre[1] is that, if a \ is followed by:
>
> - a punctuation character, it eliminates the specialness of that
> character. '\' followed by punctuation in pcre is supposed to
> never have a special meaning.
>
right, though I think we may want to use some special characters escaped
eg. In aare (the globbing syntax)
@{ } is a variable,
do we make @{ } also be the variable in the extended syntax or try to
stick closer to pcre/other regex languages, and hide it behind an
escape.
> - an alphanumeric character: can have a special meaning, but if it
> doesn't, then it just matches the character. Perl is supposed to
> give warnings for such usage if warnings are turned on.
>
I think this should just be a parse error, but I could live with warning
> - a naked trailing '\' results in an error ("Trailing \ in regex ...")
> with both perl and pcregrep
>
yeah
> Now, we accept (or intend to accept) pcre style for character classes,
> but not generally for our regexs, so it's not a strict requirement that
> we match behavior here.
>
true, but we have plans for an extended true regex vs. globbing syntax and
it would be nice to try and be consistent between them
> Anyway, the following patch throws an error if an escaping slash is the
> trailing character and otherwise just drops it for non-special
> characters, and adjusts the relevant test cases.
>
> [1] info gleaned from http://perldoc.perl.org/perlrebackslash.html
> except where noted.
>
> Signed-off-by: Steve Beattie <steve at nxnw.org>
Its good as far as it goes, but I really want at least a warning for the
embedded \a, and would like to settle our plans a little for the extended
syntax so we can make sure the syntax for the aare is what we want.
> ---
> parser/parser_regex.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> Index: b/parser/parser_regex.c
> ===================================================================
> --- a/parser/parser_regex.c
> +++ b/parser/parser_regex.c
> @@ -344,6 +344,12 @@ static pattern_t convert_aaregex_to_pcre
> progname);
> }
>
> + if ((error == e_no_error) && bEscape) {
> + /* trailing backslash quote */
> + error = e_parse_error;
> + PERROR(_("%s: Regex error: trailing '\\' escape character\n"),
> + progname);
> + }
> /* anchor end and terminate pattern string */
> if ((error == e_no_error) && anchor) {
> STORE("$" , dptr, 1);
> @@ -1304,11 +1310,11 @@ static int test_aaregex_to_pcre(void)
>
> MY_REGEX_TEST("/most/basic/test", "/most/basic/test", ePatternBasic);
>
> - MY_REGEX_TEST("\\", "\\", ePatternBasic);
> + MY_REGEX_FAIL_TEST("\\");
> MY_REGEX_TEST("\\\\", "\\\\", ePatternBasic);
> - MY_REGEX_TEST("\\blort", "\\blort", ePatternBasic);
> + MY_REGEX_TEST("\\blort", "blort", ePatternBasic);
> MY_REGEX_TEST("\\\\blort", "\\\\blort", ePatternBasic);
> - MY_REGEX_TEST("blort\\", "blort\\", ePatternBasic);
> + MY_REGEX_FAIL_TEST("blort\\");
> MY_REGEX_TEST("blort\\\\", "blort\\\\", ePatternBasic);
> MY_REGEX_TEST("*", "[^/\\x00]*", ePatternRegex);
> MY_REGEX_TEST("blort*", "blort[^/\\x00]*", ePatternRegex);
>
>
>
More information about the AppArmor
mailing list