[ping] [patch] improved ignore pattern matching (#57637)

John Arbash Meinel john at arbash-meinel.com
Fri Dec 8 13:05:05 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This should have been sent to the list. But it is very encouraging to hear.

John
=:->

Kent Gibson wrote:
> 
>> Out of curiosity, have you looked at any of the benchmarks to see
>> if it makes a big difference in any of them?  I would expect it to
>>  have some impact on the kernel-sized "add" tests.
> 
> I get a huge amount of variability when running those benchmarks (and
> I'm talking with the same branch here).  While there may be an
> improvement it's difficult to quantify with such large error bars.
> 
> How many ignores do those bench tests use, and which category
> (extension/basename/full path)?  You should see a bigger improvement
> for larger sets of ignores and for basename and extension patterns.
> e.g. my bench_workingtree.py benchmarks showed a marked improvement in
> the speed of is_ignored:
> 
> This is the benchmark for my lp57637 branch-point patched with my
> bench_workingtree.py:
>        bzr: /home/kent/work/bzr.standalones/lp57637bm/bzr
>     bzrlib: /home/kent/work/bzr.standalones/lp57637bm/bzrlib
> 
> running 6 tests...
> ...ngtree.WorkingTreeBenchmark.test_is_ignored_1000_patterns   OK
> 1139ms/ 1190ms
> ...ingtree.WorkingTreeBenchmark.test_is_ignored_100_patterns   OK
> 118ms/  142ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10824_calls   OK
> 126ms/  150ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10_patterns   OK
> 14ms/   37ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_50_patterns   OK
> 43ms/   66ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_single_call   OK
> 1ms/   24ms
> 
> This is the same benchmark on the lp57637 branch:
> 
>        bzr: /home/kent/work/bzr.standalones/lp57637/bzr
>     bzrlib: /home/kent/work/bzr.standalones/lp57637/bzrlib
> 
> running 6 tests...
> ...ngtree.WorkingTreeBenchmark.test_is_ignored_1000_patterns   OK
> 130ms/  185ms
> ...ingtree.WorkingTreeBenchmark.test_is_ignored_100_patterns   OK
> 14ms/   39ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10824_calls   OK
> 113ms/  136ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_10_patterns   OK
> 4ms/   27ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_50_patterns   OK
> 7ms/   31ms
> ...kingtree.WorkingTreeBenchmark.test_is_ignored_single_call   OK
> 1ms/   24ms
> 
> In both cases I've picked the best of 5 runs.
> 
> At the upper end the speed up is ~8.5 times for 1000 and 100 patterns,
> dropping to 3.5 for 10.
> 
> Of course those tests primarily use extension patterns.
> Basename patterns should show a similar speed up.
> But I would not expect to see much improvement for full path
> patterns.  They are more or less unchanged from the fnmatch version.
> They still use the unoptimised (pat1)|(pat2)|(pat3) form and the regex
> patterns are of similar complexity.
> 
> For basename and extension the cause of the speed up is two fold:
> 1. The translated regex patterns are simpler because they don't have
> to deal with '/'.
> 2. The common parts of the pattern a merged in the resulting regex,
> i.e. trimming the path down the the basename or extension.
> 
> So any speed up will depend on your mix of extension/basename/fullpath
> ignores.
> 
> Using a real world example, I've found that 'bzr status' over the
> bzr.dev tree with it's .bzrignore is consistently 10% faster (old is
> 0.85sec and new is 0.76sec).
> About half of the bzr.dev ignores are fullpath, and half are basename.
> The only extension pattern is *.py[oc].
> Manually removing all the *.pyc from the tree and then running 'bzr
> status' gives 0.76sec with the old and 0.73sec with the new.  So about
> 1/3 the speedup is from the improved basename matches with the
> majority coming from the fast matching of *.pyc.
> 
> Cheers,
> Kent.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFeWMBJdeBCYSNAAMRArYhAJ9W40yNy5/Y8Rm2l1swvDTJcXNwXgCfTG/s
JrImeE9AlKILzWafXyn1ZQs=
=xwlz
-----END PGP SIGNATURE-----




More information about the bazaar mailing list