[MERGE] is_ignored improvements...

Fri May 19 06:51:40 BST 2006

On 18 May 2006, Robert Collins <robertc at robertcollins.net> wrote:
> On Thu, 2006-05-18 at 14:34 +0200, Jan Hudec wrote:
> 
> > > The code is very complex compared to the current code, and we end up
> > > with two apis rather than one. I'm working on a variation derived from
> > > your work, which will combine the regexes but only need the single API.
> > > I'll post that up shortly, and I'd appreciate your review on this, as I
> > > know you put a lot of thought into the is_ignored work in your branch.
> > 
> > The two interfaces are there because python has arbitrary limit of 100
> > match groups in a pattern. So if I wanted to tell which pattern matched
> > (the user interface displays that), I had to do the match bunch at a
> > time rather than all at once. But in many places just a boolean is
> > wanted, so all patterns can be matched at once, without captures.

And even aside from the limit, matching them without capture is somewhat
faster, so in the long term having separate interfaces may be better.
But for now Robert's patch looks reasonably simple and I'm +1 on merging
it, if it's OK with Jan.

> +    def _translate_ignore_rule(self, rule):
> +        """Translate a single ignore rule to a regex.
> +
> +        There are three sorts of ignore rules:
> +        root only - regex is the rule itself without the leading './'. These
> +        are identified by a leading './'.
> +        full path - regex is the rule itself and is identified by the 
> +        presenve of a '/' in the path.
> +        basename only rule - regex is a rule that ignores everything up
> +        to the last / in the string before applying the supplied rule.
> +        These are the default case.
> +
> +        :return: The translated regex.
> +        """

You can summarize them more simply:

 There are two types of ignore rules.  Those that do not contain a / are
 matched against the tail of the filename (that is, they do not care
 what directory the file is in.)  Rules which do contain a slash must
 match the entire path.  As a special case, './' at the start of the
 string counts as a slash in the string but is removed before matching 
 (e.g. ./foo.c, ./src/foo.c)

> +        if rule[:2] in ('./', '.\\'):
> +            # rootdir rule
> +            result = fnmatch.translate(rule[2:])
> +        elif '/' in rule or '\\' in rule:
> +            # path prefix 
> +            result = fnmatch.translate(rule)
> +        else:
> +            # default rule style.
> +            result = "(?:.*/)?(?!.*/)" + fnmatch.translate(rule)
> +        assert result[-1] == '$', "fnmatch.translate did not add the expected $"
> +        return "(" + result + ")"

So the ?! says "don't match anything with .*/ after this point", which
is OK but I wonder if it'd be simpler or faster to just translate '*' to [^/]*.
Anyhow, could just add a comment about that until someone has time to
measure it.

-- 
Martin