[MERGE] Shortcut 'common_ancestor' when a branch tip is in the ancestry

Tue Apr 10 20:50:07 BST 2007

The attached patch is to address this bug:
  https://bugs.launchpad.net/bzr/+bug/103757

It changes 'common_ancestor' so that if the tip revision is in the
ancestry, it is returned right away, rather than doing the graph search.

This solves this specific bug, though it probably isn't a complete fix
for 'common_ancestor' speed.

It seems like searching 2 graphs of 16k revisions for a common ancestor
should complete in less than 2 minutes.

Aaron has a good point about caching the distance from null for each
revision, since it doesn't change often. But I'm also wondering if that
is the algorithm we want to be using.

Unless we can find a way to handle partial graphs, I don't see any
algorithm scaling well when we have 160k revisions. (Like the Mozilla
tree, or a kernel tree).

I think accepting a slightly less optimal base in exchange for much
faster performance could be reasonable.

I also wonder if we could just shortcut the graph itself if we know
common ancestors faster. I think using dominators is a pretty clear
case. But also you could just look through the left-hand ancestry for
revisions in common on both branches.

At least in the case of bzr and the Launchpad codebase, the length of
the mainline is significantly shorter than the size of all ancestry.
Both are around 5:1.

	revno	ancestry | wc -l
bzr.dev	2401	10392
lp	3872	15889

Now both of these projects use a PQM, and follow the same workflow
(versus people developing on tip directly). So I imagine other workflows
wouldn't be quite as obvious.

However, you also can shortcut when you are looking at it this way. We
have Repository.iter_reverse_revision_history(), so you could do:

def find_common_revision_history(repo, revision_id1, revision_id2):
  iter1 = repo.iter_reverse_revision_history(revision_id1)
  iter2 = repo.iter_reverse_revision_history(revision_id2)

  def next_or_none(iter):
    try:
      return iter.next()
    except StopIteration:
      return None

  cur_revision_id1 = next_or_none(iter1)
  cur_revision_id2 = next_or_none(iter2)

  history1 = [cur_revision_id1]
  history2 = [cur_revision_id2]
  set_history1 = set(history1)
  set_history2 = set(history2)

  while cur_revision_id1 or cur_revision_id2:
      if cur_revision_id1 in set_history2:
        return cull_histories(cur_revision_id1, history1, history2)
      if cur_revision_id2 in set_history1:
        return cull_histories(cur_revision_id2, history1, history2)

      if cur_revision_id1 is not None:
          cur_revision_id1 = next_or_none(iter1)
      if cur_revision_id1 is not None:
          history1.add(cur_revision_id1)

      ... # Same for cur_revision_id2

And then 'cull_histories' is defined as trimming the history. Something
like:

def cull_histories(revision_id, history1, history2):
  history1 = reversed(history1[:history1.index(revision_id)])
  history2 = reversed(history2[:history2.index(revision_id)])
  return history1, history2

That gives you just the tips which are not the same on both sides (since
once you converge, everything earlier must be the same).

And then we would have a "get_limited_ancestry()" which could produce
the ancestry graphs for just that limited set.

Probably this isn't 100% correct, but I think the basic idea is to build
up a partial graph, rather than trying to generate the complete graph
all the time.

Anyway, I've digressed :)

John
=:->

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: shortcut-common-ancestor.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070410/741e9bb9/attachment.diff