[ping] [patch] improved ignore pattern matching (#57637)
John Arbash Meinel
john at arbash-meinel.com
Fri Dec 8 13:05:05 GMT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
This should have been sent to the list. But it is very encouraging to hear.
John
=:->
Kent Gibson wrote:
>
>> Out of curiosity, have you looked at any of the benchmarks to see
>> if it makes a big difference in any of them? I would expect it to
>> have some impact on the kernel-sized "add" tests.
>
> I get a huge amount of variability when running those benchmarks (and
> I'm talking with the same branch here). While there may be an
> improvement it's difficult to quantify with such large error bars.
>
> How many ignores do those bench tests use, and which category
> (extension/basename/full path)? You should see a bigger improvement
> for larger sets of ignores and for basename and extension patterns.
> e.g. my bench_workingtree.py benchmarks showed a marked improvement in
> the speed of is_ignored:
>
> This is the benchmark for my lp57637 branch-point patched with my
> bench_workingtree.py:
> bzr: /home/kent/work/bzr.standalones/lp57637bm/bzr
> bzrlib: /home/kent/work/bzr.standalones/lp57637bm/bzrlib
>
> running 6 tests...
> ...ngtree.WorkingTreeBenchmark.test_is_ignored_1000_patterns OK
> 1139ms/ 1190ms
> ...ingtree.WorkingTreeBenchmark.test_is_ignored_100_patterns OK
> 118ms/ 142ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10824_calls OK
> 126ms/ 150ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10_patterns OK
> 14ms/ 37ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_50_patterns OK
> 43ms/ 66ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_single_call OK
> 1ms/ 24ms
>
> This is the same benchmark on the lp57637 branch:
>
> bzr: /home/kent/work/bzr.standalones/lp57637/bzr
> bzrlib: /home/kent/work/bzr.standalones/lp57637/bzrlib
>
> running 6 tests...
> ...ngtree.WorkingTreeBenchmark.test_is_ignored_1000_patterns OK
> 130ms/ 185ms
> ...ingtree.WorkingTreeBenchmark.test_is_ignored_100_patterns OK
> 14ms/ 39ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10824_calls OK
> 113ms/ 136ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10_patterns OK
> 4ms/ 27ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_50_patterns OK
> 7ms/ 31ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_single_call OK
> 1ms/ 24ms
>
> In both cases I've picked the best of 5 runs.
>
> At the upper end the speed up is ~8.5 times for 1000 and 100 patterns,
> dropping to 3.5 for 10.
>
> Of course those tests primarily use extension patterns.
> Basename patterns should show a similar speed up.
> But I would not expect to see much improvement for full path
> patterns. They are more or less unchanged from the fnmatch version.
> They still use the unoptimised (pat1)|(pat2)|(pat3) form and the regex
> patterns are of similar complexity.
>
> For basename and extension the cause of the speed up is two fold:
> 1. The translated regex patterns are simpler because they don't have
> to deal with '/'.
> 2. The common parts of the pattern a merged in the resulting regex,
> i.e. trimming the path down the the basename or extension.
>
> So any speed up will depend on your mix of extension/basename/fullpath
> ignores.
>
> Using a real world example, I've found that 'bzr status' over the
> bzr.dev tree with it's .bzrignore is consistently 10% faster (old is
> 0.85sec and new is 0.76sec).
> About half of the bzr.dev ignores are fullpath, and half are basename.
> The only extension pattern is *.py[oc].
> Manually removing all the *.pyc from the tree and then running 'bzr
> status' gives 0.76sec with the old and 0.73sec with the new. So about
> 1/3 the speedup is from the improved basename matches with the
> majority coming from the fast matching of *.pyc.
>
> Cheers,
> Kent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFeWMBJdeBCYSNAAMRArYhAJ9W40yNy5/Y8Rm2l1swvDTJcXNwXgCfTG/s
JrImeE9AlKILzWafXyn1ZQs=
=xwlz
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list