some initial timing data for dirstate, take 2 with added data
Robert Collins
robertc at robertcollins.net
Thu Feb 15 23:59:14 GMT 2007
My prior email was glitchy because a bug [in dirstate] during checkout
resulted in all the cache data being there, but none of the 'current
tree' data. I've re-run all the dirstate timings again, and its not so
rosy :(.
For a nice large working tree - a copy of mozilla with ??? files in it,
we have the following results with the dirstate branch:
With a regular format3 tree:
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
-s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD')" \
'tree.lock_read(); tree.read_working_inventory();tree.unlock()'
10 loops, best of 3: 1.02 sec per loop
With a format4 tree:
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
-s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
'tree.lock_read(); tree.inventory;tree.unlock()'
10 loops, best of 3: 2.86 sec per loop
Thats 180% slower for dirstate right now.
We can dig a little deeper - where is the cost within dirstate?
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
-s "import bzrlib.workingtree; tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
'tree.lock_read(); tree.current_dirstate()._read_dirblocks_if_needed();tree.unlock()'
10 loops, best of 3: 626 msec per loop
This is quite encouraging: Most of the increased overhead from my bogus
test and this test is in the inventory creation logic, not in the
parsing: the parsing has gone up by 30ms now that all the files are not
listed as deleted, and all the extra overhead is in inventory creation.
I haven't profiled, but I bet its the find-a-parent-path logic. We can
trivially fast path that with a dict from dir to entry that we build as
we build the inventory, so I'll try that soon and get back with results.
Right now though, we spend 78% of the time to get an inventory in
inventory object creation with dirstate, not in parsing the state file
itself. There are some costs in there that are bad - for instance we use
make_entry, and make_enrry does a unicode normalisation check. So that
needs to change too.
For operations like 'status in a subdir' doing partial parsing is almost
certainly going to be a win as we can hopefully avoid an inventory altogether.
Or at least, I really really hope so ;).
The size of the dirstate file for this is 15.5MB. The size of the basis
inventory and the current inventories are 15MB and 8MB respectively in
the format3 tree, so for operations pulling both current and parent
trees from disk, we should be moving less data, which is a win.
What about writing operations?
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
-s "import bzrlib.workingtree" \
-s "tree = bzrlib.workingtree.WorkingTree.open('HEAD')" \
-s "tree.lock_read()" \
-s "new_inv = tree.inventory" \
-s "tree.unlock()" \
'tree.lock_write(); tree._write_inventory(new_inv);tree.unlock()'
10 loops, best of 3: 658 msec per loop
And dirstate? Sadly,
mozilla$ PYTHONPATH=~/bzr.dirstate python -m timeit \
-s "import bzrlib.workingtree" \
-s "tree = bzrlib.workingtree.WorkingTree.open('HEAD.dirstate')" \
-s "tree.lock_read()" \
-s "new_inv = tree.inventory" \
-s "tree.unlock()" \
'tree.lock_write(); tree._write_inventory(new_inv);tree.unlock()'
10 loops, best of 3: 2.01 sec per loop
We're ~3 times slower at doing _write_inventory calls at this
point - remembering that we are writing all the parent data at the same
time. It may be that we want to revisit the decision to glob them
together, or it may just be that the code needs fast-pathing for the
common case of nothing-has-changed.
Well, back to the coal face; if anyone wants to optimise these specific
cases... please do! I'm working on merging the unique roots tests from
Aaron at the moment.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070216/efa69233/attachment.pgp
More information about the bazaar
mailing list