Performance improvements for bzr-2.4 on large trees
John Arbash Meinel
john at arbash-meinel.com
Tue May 17 15:04:53 UTC 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The last of my patches is queued up to land, so I figured I'd post an
update about the performance improvements I've been working on. I'm also
just excited about how well it has all come together.
There were essentially 3 changes that mattered for performance on large
trees.
1) Fixing iter_entries_by_dir() to preload the data in Repository-
optimal ordering rather than by-request ordering. In large trees
this was causing us to thrash and become pathologically slow.
In the 70k-entry test tree, thrashing took about 3 minutes, the
preloading version takes about 15s. This affected a lot of our
commands, though I guess the next two fixes would actually reduce
the number of commands affected by this.
2) Fixing several code paths to use optimized iter_changes() rather
than the generic iter_changes(). The generic path walks both
inventories iter_entries_by_dir() and compares them. Our 2a format
Repository can do iter_changes without loading the whole tree. (it
internally uses a hash_trie to store the inventory, and so nodes
with matching sub-trees can be skipped for comparison.)
This generally shows up as something that was taking 15s (to load
the whole inventory) dropping to <2s for the improved comparison.
(bzr revert and bzr pull were both directly impacted here)
3) Changing WT.set_parent_trees([one_tree]) to update itself using
current_basis.iter_changes(one_tree), rather than setting the state
from scratch.
This basically adds another case where we can avoid reading the
whole inventory state again, which is another 15s to <2s sort of
change.
This only showed up after fixing (2), because once the tree is
loaded, the other actions are generally pretty quick.
(bzr up, bzr pull)
This is the chart I put together for "whats-new-in-2.4.txt". bzr-2.3.2
will have fix (1), but not (2) or (3), to give a feel for how much of an
impact different fixes have had.
bzr-2.3.1 bzr-2.3.2 bzr-2.4 action
3m39s 1m08s 1m03s bzr co --lightweight
38s 8s 2s bzr revert (in a clean tree)
4m47s 3m56s 15s bzr merge
4m45s 20s 3s bzr pull
4m58s 3m00s 2s bzr up
9m33s 21s 19s bzr uncommit (including a merge)
4m44s 17s 2s bzr uncommit (simple commit)
So yes, some operations that were taking almost 5 minutes have now
dropped down to taking <3s.
You won't see that dramatic of an improvement for smaller trees, though
most cases will have a pleasant improvement. Here is a short list for
the 'Launchpad' tree (with ~8k items).
bzr-2.3.1 bzr-2.4 action
5.3s 5.2s bzr co --lightweight
0.9s 0.3s bzr revert
1.4s 0.4s bzr pull
3.9s 3.7s bzr uncommit (with merge)
0.9s 0.3s bzr uncommit (without merge)
Anyway, I'm quite happy about how much better bzr-2.4 will be in large
trees. The remaining cases for merge/update are something that probably
need a new format to address properly. (Basically, we want to stop
caching extra-parent trees in the dirstate file.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk3SjpQACgkQJdeBCYSNAAN5aQCgl27yfn75mXgTt4zmT4JJQQ9q
GdIAoLbsF6GrmTppddobcgbXNT2cVbnN
=/CFJ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list