[patch] improved ignore pattern matching (#57637)

Kent Gibson warthog618 at gmail.com
Tue Nov 28 00:09:59 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



John Arbash Meinel wrote:
>
> My memory may be faulty, but I seem to recall it making a
> difference when you have ~50 groups. It has been a long time since
> I benchmarked it, though.
>
And I've not tested that particular sour spot, so your memory may be
correct.
> Actually, right now bzr uses matching groups, because it uses the
> match to determine what original pattern matched. Used by 'bzr
> ignored', and a few other places. We talked about doing it
> differently. Because I think it did make a difference if you match
> 100 patterns all at once, rather than matching 50 patterns, and
> then another 50 patterns.
I found little difference.
OTOH merging similar patterns, as you suggest below, made a huge
difference.
>
> There are probably other more important optimizations, though. Like
>  maybe being able to split out all of the patterns that match all
> of the path, versus the ones that only match a little bit. So that
> you can combine the common prefixes.
>
> .*/(pat1|pat2|pat3)
>
> versus (.*/pat1|.*/pat2|.*/pat3)
>
The patch does that.  It splits patterns into 3 categories -
extension, basename and full path.
It also matches in that order, on the assumption that more general
patterns are more likely to match.
> Though again, it isn't 100% clear how the actual regex engine
> translates things, as to which one it will actually find faster to
> process. Because I know I found: (pat1$|pat2$|pat3$) faster than:
> (pat1|pat2|pat3)$
>
Interesting.  I'll have to try that.
The patch uses the second form, well sort of - for extensions it would
actually be something like (?:.*\.(pat1)|(pat2)|(pat3))$.

Cheers,
Kent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFa35WgoxTFTi1P8QRAsL6AKCpqez3TqEU6Bm/dkVSF9IDXKfs7gCfY5HV
drb4glAhimmxE++BqRdr4yc=
=MHau
-----END PGP SIGNATURE-----




More information about the bazaar mailing list