some initial timing data for dirstate

Robert Collins robertc at robertcollins.net
Thu Feb 15 23:23:27 GMT 2007


For a nice large working tree - a copy of mozilla with ??? files in it,
we have the following results with the dirstate branch:

With a regular format3 tree:
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
 -s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD')" \
 'tree.lock_read(); tree.read_working_inventory();tree.unlock()'
10 loops, best of 3: 1.02 sec per loop

With a format4 tree:
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
 -s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
 'tree.lock_read(); tree.inventory;tree.unlock()'
10 loops, best of 3: 633 msec per loop

Thats 40% faster for dirstate right now, and we're at the beginning of
its lifecycle, not the end.

We can dig a little deeper - where is the cost within dirstate? 
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
 -s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
'tree.lock_read(); tree.current_dirstate()._read_dirblocks_if_needed();tree.unlock()'
10 loops, best of 3: 594 msec per loop

So 594ms/633ms - 93.8% of the time spent is pulling data from disk and
parsing it into in memory tuples and lists. The cost of supporting the
inventory operation is negligable when we're doing the main job of
parsing from disk.

For operations like 'status in a subdir' doing partial parsing is almost
certainly going to be a win. Or at least, I really really hope so ;).

Anyhow, we have shaved 400ms of the cost of getting to our existing
inventory based API's, and thats nothing to sneeze at. The size of the
dirstate file for this is 13MB. The size of the basis inventory and the
current inventories are 15MB and 8MB respectively in the format3 tree.

What about writing operations?
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
 -s "import bzrlib.workingtree" \
 -s "tree = bzrlib.workingtree.WorkingTree.open('HEAD')" \
 -s "tree.lock_read()" \
 -s "new_inv = tree.inventory" \
 -s "tree.unlock()" \
 'tree.lock_write(); tree._write_inventory(new_inv);tree.unlock()'
10 loops, best of 3: 658 msec per loop

And dirstate? Sadly, 
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
 -s "import bzrlib.workingtree" \
 -s "tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
 -s "tree.lock_read()" \
 -s "new_inv = tree.inventory" \
 -s "tree.unlock()" \
 'tree.lock_write(); tree._write_inventory(new_inv);tree.unlock()'
10 loops, best of 3: 1.78 sec per loop

We're nearly 3 times slower at doing _write_inventory calls at this
point.

Well, back to the coal face; if anyone wants to optimise writes...
please do! I'm working on merging the unique roots tests from Aaron at
the moment.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070216/f11365f4/attachment.pgp 


More information about the bazaar mailing list