[MERGE] more common-ancestor performance improvements.

Robert Collins robertc at robertcollins.net
Tue May 29 23:43:14 BST 2007


On Sun, 2007-05-27 at 12:58 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Aaron Bentley wrote:
> > This patch both improves the speed of merge-base selection, and improves
> > its behavior with criss-cross merges.
> 
> This patch builds on my previous patch.  It changes the
> Graph.minimal_common method to avoid traversing the whole graph.

This is an interim review; I'm still looking at this and the previous
patch in detail to get a solid understanding. I just had a few immediate
thoughts I figured I would share immediately.

> Instead, it scales with:
> 
> 1. the number of uncommon ancestors
> 2. the lengths of the shortest paths between border common ancestors
> 
> This improves performance somewhat on a local machine, and should have
> more dramatic benefits for remote operations, when retrieving a complete
> graph can be slow, or when the ancestry is very large.

This sounds great. I think we should start a index mapping terms we use
- e.g. border common ancestor - to their definition (which might be on
wikipedia, or inside the code base). We don't always use the formal
definition of a term, or there can be multiple definitions and we should
make it clear which one we have chosen.

In the patch itself..

+    def _find_candidate_mca(self, revisions):
+        walkers = [_AncestryWalker(r, self) for r in revisions]

I think that functions like this really benefit from a docstring which
provides a simple overview of the function.

e.g. - and I dont know if this is right - 
"""Find all distinct common ancestors between revisions.

This implementation performs a breadth first search from each revision
in parallel. It aggressively cuts search paths when no revision on that
path can be distinct - when all searches are able to reach it. In
general this will search history to the depth of the number of linear
unmerged revisions.
"""

Something like that would make it a lot easier for folk looking at the
code - both for review and for eventual usage.


+        self._lines = None

I'm guessing this variable name is cribbed from somewhere else - its
very opaque. What is a line in this context? Perhaps '_revisions' ?

I'm not voting on this just yet, as I'm still getting my head around it.

Cheers,
Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070530/9853398d/attachment.pgp 


More information about the bazaar mailing list