[patch] improved ignore pattern matching (#57637)

John Arbash Meinel john at arbash-meinel.com
Mon Nov 27 23:45:49 GMT 2006


Kent Gibson wrote:
> 
> 
> John Arbash Meinel wrote:
>>> Well, it will also match 'foobar/CVS'. The way it is written, it seems
>>> like '**/' should match either nothing, or something that ends in a
>>> directory separator, which sounds like a good match for (.+/)?
>>>
>>> I also prefer (.+/)? to (|.+/). As an aside, these should probably all
>>> be (?:) style, which uses a group that can't be referenced later, which
>>> can save processing overhead. (And the fact that the python regex engine
>>> has a hard limit on the number of groups it can handle per regex).
>>>
> In the implementation I do use the (?:) form - I omitted it here so as
> to not confuse the discussion.
> Btw, unnumbered groups are no faster than named/numbered, according to
> the Python re documentation, and from experience that seems to be the
> case.
> The reason we use (?:) is so the internals of the pattern wont
> conflict with the aggregation into a single regex and the mapping back
> to globs, not for performance.
> 
> Cheers,
> Kent.

My memory may be faulty, but I seem to recall it making a difference
when you have ~50 groups. It has been a long time since I benchmarked
it, though.

Actually, right now bzr uses matching groups, because it uses the match
to determine what original pattern matched. Used by 'bzr ignored', and a
few other places. We talked about doing it differently. Because I think
it did make a difference if you match 100 patterns all at once, rather
than matching 50 patterns, and then another 50 patterns.

There are probably other more important optimizations, though. Like
maybe being able to split out all of the patterns that match all of the
 path, versus the ones that only match a little bit. So that you can
combine the common prefixes.

 .*/(pat1|pat2|pat3)

versus
 (.*/pat1|.*/pat2|.*/pat3)

Though again, it isn't 100% clear how the actual regex engine translates
things, as to which one it will actually find faster to process. Because
I know I found:
 (pat1$|pat2$|pat3$)
faster than:
 (pat1|pat2|pat3)$

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061127/097545a9/attachment.pgp 


More information about the bazaar mailing list