[MERGE] is_ignored improvements...

Jan Hudec bulb at ucw.cz
Sat May 20 14:51:25 BST 2006


On Fri, May 19, 2006 at 15:51:40 +1000, Martin Pool wrote:
> On 18 May 2006, Robert Collins <robertc at robertcollins.net> wrote:
> > On Thu, 2006-05-18 at 14:34 +0200, Jan Hudec wrote:
> > 
> > > > The code is very complex compared to the current code, and we end up
> > > > with two apis rather than one. I'm working on a variation derived from
> > > > your work, which will combine the regexes but only need the single API.
> > > > I'll post that up shortly, and I'd appreciate your review on this, as I
> > > > know you put a lot of thought into the is_ignored work in your branch.
> > > 
> > > The two interfaces are there because python has arbitrary limit of 100
> > > match groups in a pattern. So if I wanted to tell which pattern matched
> > > (the user interface displays that), I had to do the match bunch at a
> > > time rather than all at once. But in many places just a boolean is
> > > wanted, so all patterns can be matched at once, without captures.
> 
> And even aside from the limit, matching them without capture is somewhat
> faster, so in the long term having separate interfaces may be better.
> But for now Robert's patch looks reasonably simple and I'm +1 on merging
> it, if it's OK with Jan.

It's OK with me. I'll keep my version around and combine them when I finish
the properties stuff. Then I'll combine with Roberts patch and cut down the
call nesting in my version a bit.

Note, that the whole pattern is not that faster than matching in reasonably
large chunks, because the regexp engine is backtracking. What would actually
help is merging common prefixes (so the matcher can skip a lot of patterns at
once if the prefix fails).

> > +    def _translate_ignore_rule(self, rule):
> > +        """Translate a single ignore rule to a regex.
> > +
> > +        There are three sorts of ignore rules:
> > +        root only - regex is the rule itself without the leading './'. These
> > +        are identified by a leading './'.
> > +        full path - regex is the rule itself and is identified by the 
> > +        presenve of a '/' in the path.
> > +        basename only rule - regex is a rule that ignores everything up
> > +        to the last / in the string before applying the supplied rule.
> > +        These are the default case.
> > +
> > +        :return: The translated regex.
> > +        """
> 
> You can summarize them more simply:
> 
>  There are two types of ignore rules.  Those that do not contain a / are
>  matched against the tail of the filename (that is, they do not care
>  what directory the file is in.)  Rules which do contain a slash must
>  match the entire path.  As a special case, './' at the start of the
>  string counts as a slash in the string but is removed before matching 
>  (e.g. ./foo.c, ./src/foo.c)
> 
> > +        if rule[:2] in ('./', '.\\'):
> > +            # rootdir rule
> > +            result = fnmatch.translate(rule[2:])
> > +        elif '/' in rule or '\\' in rule:
> > +            # path prefix 
> > +            result = fnmatch.translate(rule)
> > +        else:
> > +            # default rule style.
> > +            result = "(?:.*/)?(?!.*/)" + fnmatch.translate(rule)
> > +        assert result[-1] == '$', "fnmatch.translate did not add the expected $"
> > +        return "(" + result + ")"
> 
> So the ?! says "don't match anything with .*/ after this point", which
> is OK but I wonder if it'd be simpler or faster to just translate '*' to [^/]*.
> Anyhow, could just add a comment about that until someone has time to
> measure it.

Yes, it would be. And it indeed is what my new converter does. But we can't
tell fnmatch to do it. So I would not do it for this version. Of course we
could bring in the Replacer and reimplement fnmatch.translate in it, but
I don't think it's worth it now.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060520/cac7eb99/attachment.pgp 


More information about the bazaar mailing list