[MERGE] Graph.find_revno()

Wed May 21 21:01:15 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This adds a new API to Graph. The idea is to make it easier to determine a real
'revno' when all we have is limited information.

Right now, we have some special cases where we check something like:
~  if rev_id == branch.last_revision():
~    revno = branch.revno()
~  else:
~    revno = len(branch.repository.iter_history(rev_id))

Which is good for a lot of default cases. But as soon as you use a revision id,
etc, it starts to fall down and have to iterate all of ancestry.

This function lets us change that search a bit. Specifically, it allows us to
seed the search graph with known points. It then supports 4 different ways of
finding the revno

1) reaching NULL_REVISION. If all else fails, we will have just as many
get_parent_map() calls while we iterate towards NULL. (It will be searching more
nodes at each step, but the total round trips will be the same.)

2) reaching one of the known revisions. If the target is a descendant, once it
reaches the known revision, it can stop. (revno = known_revno + steps)

3) having a known revision reach the target. We follow the lineage of all the
known revisions, so that if one of them encounters the target, we can stop.
(revno = known_revno - steps)

4) having a known revision search converge to the unknown search. This one is a
bit interesting, but if you look at this graph:

~   a
~   |
~   b
~  / \
~ c   d
~ |
~ e

If we know the revno of 'd', we can number 'b', and once we have done that, we
know the revno of 'c'. Because we know how many steps it took us to get there. I
also handle the 'e' <=> 'd' cases, which is a bit of a race because one search
finds 'b' before the other does.

This should actually help some of the 'push --overwrite' cases, because we don't
have to iterate the whole ancestry to find the revno. (I'm currently looking at
that, and the 'push --no-overwrite' when the branches are diverged. Both have
really *bad* performance.)

It also will help the 'bzr branch -r -100 XX XX-100' case, and a few others. (I
believe I opened a bug on this case.) And it sort of "finishes" our earlier work
with changing to use 'RevisionSpec.as_revision_id()' while still allowing quick
revno lookups.

There is one piece that I haven't decided upon. Specifically, Graph could hold
on to the 'known_revno' dictionary. As long as it doesn't get polluted by bad
data, revnos are set at commit time. They are basically just the measure of the
left-hand-distance to NULL, and thus never change.

I don't know (off-hand) of anywhere that is going to call this function multiple
times on the same graph object. If we do, then tracking the 'known_revno'
dictionary seems like a big win.

Thoughts?
John
=:->

PS> This almost seems like it could help the 'find_unique_ancestors()' case, but
AFAICT it can't. revnos give you a left-hand ordering, but can't order between
branches. You *could* do something like prioritize searching based on larger
revno. Like "request the highest 100 revisions in each pass". However, when you
get a merge revision like:

~  A
~  |\
~  B C
~  |/
~  D

You might know the revno of D (and thus B), but you can't know the revno of C
yet. All you could do is mark it as deferred and when you reached A you could
then go back and number C.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0f4sACgkQJdeBCYSNAAMCCwCfRlu3svJU4w5diebsR2mMWk+u
R1QAoI+6RxWIZtXhXIaD1UxddVXyb7wN
=3q2d
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: graph_find_revno.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080521/1ba8e462/attachment-0001.diff