dirstate reduces status by 1/6th on bzr.dev trees:

John Arbash Meinel john at arbash-meinel.com
Thu Feb 22 13:58:51 GMT 2007


Robert Collins wrote:
> So with my recent paths2ids implementation, I finally have some concrete
> wins from dirstate:
> 
> On bzr.dev:
> PYTHONPATH=~/source/baz/dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('integration')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
> 10 loops, best of 3: 38.9 msec per loop
> PYTHONPATH=~/source/baz/dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('integration.dirstate')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
> 100 loops, best of 3: 11.7 msec per loop
> 
> That is, a reduction by 75% in the time to go from unlocked tree to
> knowing all the ids in the tree - which status does.
> 
> On a mozilla tree:
> PYTHONPATH=~/bzr.dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('HEAD')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
> 10 loops, best of 3: 1.98 sec per loop
> PYTHONPATH=~/bzr.dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
> 10 loops, best of 3: 579 msec per loop
> 
> Which shows the performance difference scanles reasonably: 71%
> reduction.
> 
> What about the actual performance difference? Well there are still
> problems with full status on mozilla that I'm working on asap, but for
> bzr.dev its a measurable difference:
> 
> bzr status on bzr.dev:
> robertc at lifelesslap:~/source/baz$ time dirstate/bzr status integration
> real    0m0.643s
> user    0m0.540s
> sys     0m0.080s
> robertc at lifelesslap:~/source/baz$ time dirstate/bzr status integration.dirstate
> real    0m0.563s
> user    0m0.480s
> sys     0m0.060s
> 
> These figures are pretty reproducible - we're saving nearly 100ms of
> wall clock on a .640 ms operation. Woot!.
> 

...

> 
> Lastly, there is performance tuning - which is in some respects the
> heavy lifting. I plan to spend tomorrow working half on performance, and
> half on making WorkingTree4 safe as a default tree format, so we can
> merge it to mainline.
> 
> Rob
> 

I first read this as "dirstate reduces status *to* 1/6th" and I was
getting ready to open some champagne. :)

I'll merge my file_ids patch, and get dirstate to where it doesn't
decode file_ids, which should shave even more time off of status.

And then I'll finish up my work on adding bisect functionality, so that
we can do partial operations without having to read the whole dirstate.

My earlier work is pretty close to what we need now, so it shouldn't
take long. The only thing I would like to add is a bisect that can find
all children of a given directory (similar to how
find_ids_across_trees() works). Bisecting for specific files and dirs is
nice, but being able to do the recursive form should be quite useful.

John
=:->



More information about the bazaar mailing list