dirstate reduces status by 1/6th on bzr.dev trees:

Robert Collins robertc at robertcollins.net
Thu Feb 22 06:29:58 GMT 2007


So with my recent paths2ids implementation, I finally have some concrete
wins from dirstate:

On bzr.dev:
PYTHONPATH=~/source/baz/dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('integration')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
10 loops, best of 3: 38.9 msec per loop
PYTHONPATH=~/source/baz/dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('integration.dirstate')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
100 loops, best of 3: 11.7 msec per loop

That is, a reduction by 75% in the time to go from unlocked tree to
knowing all the ids in the tree - which status does.

On a mozilla tree:
PYTHONPATH=~/bzr.dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('HEAD')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
10 loops, best of 3: 1.98 sec per loop
PYTHONPATH=~/bzr.dirstate python -m timeit -s "import bzrlib.workingtree" -s "tree=bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" -s "basis_tree=tree.basis_tree()" "tree.lock_read(); tree.paths2ids([''], [basis_tree]); tree.unlock()"
10 loops, best of 3: 579 msec per loop

Which shows the performance difference scanles reasonably: 71%
reduction.

What about the actual performance difference? Well there are still
problems with full status on mozilla that I'm working on asap, but for
bzr.dev its a measurable difference:

bzr status on bzr.dev:
robertc at lifelesslap:~/source/baz$ time dirstate/bzr status integration
real    0m0.643s
user    0m0.540s
sys     0m0.080s
robertc at lifelesslap:~/source/baz$ time dirstate/bzr status integration.dirstate
real    0m0.563s
user    0m0.480s
sys     0m0.060s

These figures are pretty reproducible - we're saving nearly 100ms of
wall clock on a .640 ms operation. Woot!.

The next step in this saga is a custom InterTree to use dirstate to
generate the actual delta. Right now something is really whacky and
status for larger trees is massively slower with dirstate.

Folk wanting to help at the moment: The most important thing is running
the test suite with dirstate as the default tree format, and where it
breaks making workingtree_implementation tests which reproduce the
breakage, and ensure it will work correctly. [and ideally also fix
format4 trees to work with that test, but just identifying the holes is
important too].

The next most important thing is figuring out this whackiness in delta
generation, so that un-customised code paths using dirstate trees will
still operate at a reasonable speed.

Lastly, there is performance tuning - which is in some respects the
heavy lifting. I plan to spend tomorrow working half on performance, and
half on making WorkingTree4 safe as a default tree format, so we can
merge it to mainline.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070222/65b27856/attachment.pgp 


More information about the bazaar mailing list