Performance improvements for bzr-2.4 on large trees
Martitza Mendez
martitzam at gmail.com
Wed May 18 00:18:25 UTC 2011
On Tue, May 17, 2011 at 8:04 AM, John Arbash Meinel
<john at arbash-meinel.com>wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> The last of my patches is queued up to land, so I figured I'd post an
> update about the performance improvements I've been working on. I'm also
> just excited about how well it has all come together.
>
> There were essentially 3 changes that mattered for performance on large
> trees.
>
> 1) Fixing iter_entries_by_dir() to preload the data in Repository-
> optimal ordering rather than by-request ordering. In large trees
> this was causing us to thrash and become pathologically slow.
> In the 70k-entry test tree, thrashing took about 3 minutes, the
> preloading version takes about 15s. This affected a lot of our
> commands, though I guess the next two fixes would actually reduce
> the number of commands affected by this.
>
> 2) Fixing several code paths to use optimized iter_changes() rather
> than the generic iter_changes(). The generic path walks both
> inventories iter_entries_by_dir() and compares them. Our 2a format
> Repository can do iter_changes without loading the whole tree. (it
> internally uses a hash_trie to store the inventory, and so nodes
> with matching sub-trees can be skipped for comparison.)
> This generally shows up as something that was taking 15s (to load
> the whole inventory) dropping to <2s for the improved comparison.
> (bzr revert and bzr pull were both directly impacted here)
>
> 3) Changing WT.set_parent_trees([one_tree]) to update itself using
> current_basis.iter_changes(one_tree), rather than setting the state
> from scratch.
> This basically adds another case where we can avoid reading the
> whole inventory state again, which is another 15s to <2s sort of
> change.
> This only showed up after fixing (2), because once the tree is
> loaded, the other actions are generally pretty quick.
> (bzr up, bzr pull)
>
> This is the chart I put together for "whats-new-in-2.4.txt". bzr-2.3.2
> will have fix (1), but not (2) or (3), to give a feel for how much of an
> impact different fixes have had.
>
> bzr-2.3.1 bzr-2.3.2 bzr-2.4 action
> 3m39s 1m08s 1m03s bzr co --lightweight
> 38s 8s 2s bzr revert (in a clean tree)
> 4m47s 3m56s 15s bzr merge
> 4m45s 20s 3s bzr pull
> 4m58s 3m00s 2s bzr up
> 9m33s 21s 19s bzr uncommit (including a merge)
> 4m44s 17s 2s bzr uncommit (simple commit)
>
> So yes, some operations that were taking almost 5 minutes have now
> dropped down to taking <3s.
>
> You won't see that dramatic of an improvement for smaller trees, though
> most cases will have a pleasant improvement. Here is a short list for
> the 'Launchpad' tree (with ~8k items).
>
> bzr-2.3.1 bzr-2.4 action
> 5.3s 5.2s bzr co --lightweight
> 0.9s 0.3s bzr revert
> 1.4s 0.4s bzr pull
> 3.9s 3.7s bzr uncommit (with merge)
> 0.9s 0.3s bzr uncommit (without merge)
>
> Anyway, I'm quite happy about how much better bzr-2.4 will be in large
> trees. The remaining cases for merge/update are something that probably
> need a new format to address properly. (Basically, we want to stop
> caching extra-parent trees in the dirstate file.)
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk3SjpQACgkQJdeBCYSNAAN5aQCgl27yfn75mXgTt4zmT4JJQQ9q
> GdIAoLbsF6GrmTppddobcgbXNT2cVbnN
> =/CFJ
> -----END PGP SIGNATURE-----
>
>
This is beyond awesome! Ok, more exclamations: !!!
I will be trying this out on the command line as soon as it lands in lp:bzr
and again when it gets packaged up as part of the standalone installer.
Thank you thank you thank you!!!
~M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20110517/19a9647a/attachment-0001.html>
More information about the bazaar
mailing list