Annotate policies

Tue May 27 19:44:31 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We've been getting some off-list feedback about how annotate is working, and I'd
like to discuss some of it here.

Specifically, the hard part of annotation is how to handle lines that appear on
both sides of a merge, with different annotations. I'll try to describe the
possible resolutions, and what impact I think they will have. First, the all
important ASCII revision graph:

Graph 1
~  A    # Common text
~  |\
~  B C  # B and C both introduce the line 'foo'
~  |\|
~  D E  # Simple merge of B => C
~  |/
~  F    # Merge E back into D

Graph 2
~  V    # Common text, has line 'foo'
~  |\
~  W X  # No change in W, X changes 'foo' to 'bar'
~  | |
~  | Y  # Y reverts 'bar' back to 'foo'
~  |/
~  Z

In graph 1, the question is what to do at merge points 'E' and 'F' given the
disagreement as to the 'author' of the line. Graph 2 is what to do about 'revert'.

Here are the options I can think of:

1) First parent wins. In this case, E would keep the annotation C, and F would
keep the annotation B. Z would keep V.

2) Last parent wins. E would get B, and F would keep B. Z would get the
annotation Y.

3) Merge node gets annotation (current). In this case, E would reannotate 'foo'
with E, since it resolved the 'conflict' of who last modified that revision. (We
~ do this with the file modification graph.) F would then keep E, as it
supersedes (descends from) B.

4a) like 1, only introduce a heads() check. This doesn't effect Graph 1, but in
Graph 2, it would mean that Z would be annotated with Y rather than V.

4b) Like 2, only with heads(). Doesn't actually change these graphs. (If there
were a child of Y which merged W, then it would be effected.)

5) Multiple. In this case, E would get (B, C) as the annotation, rather than
just having 1 marker. This has a couple variants:

~  a) Ordered by first seen. So E would actually get (C, B), but F would want
~     (B, C)
~  b) Unordered, just a set
~  c) Lexicographically ordered, so all would go for (B, C), pretty much the same
~     as b, just a slight logical difference

~  There is also the question of heads(), aka would Z get (V, Y) or would it just
~  get Y

6) Ignore merges when computing annotations. In this case, F would keep B, E
would keep C, Z would stay V.

So on to some debate about the different algorithms.

(3) has the benefit of at least noticing a discrepancy. It has the major
downside of not actually marking a revision which changed the line. (So, for
example, double clicking on a line in gannotate can pop up a diff which doesn't
include that line) We could try to teach gannotate how to handle those cases, it
already does slightly if you use the 'back' button. (Though 'back' probably only
follows the left-hand, and doesn't let you look at any other source.)

(5) seems like the most accurate, but also ends up being the most difficult.
Because it changes the annotate API from returning just a single value, to
returning 1-N values. Which would complicate things like 'gannotate' which
should now be showing multiple revision messages at the bottom.

4ab is nice to have heads() for accuracy. It has the downside of adding a fairly
significant amount of overhead. (Back when I was implementing (3), approx 25-50%
of the time was spent in heads())

(1) tends to 'flip-flop' because values don't propagate through a merge. This is
also a performance implication, because whenever two sides disagree, you have to
do some work to figure out who to believe.

(6) is a bit of an odd fish, but it was actually requested at Pycon.
Specifically, the idea was that the merge integrator should be the one getting
annotations. So that if Barry merged Joe's patch, people can see that Barry did
so, and can go beat up on him for trusting Joe. Nobody knows who Joe is, so they
can't do much about him.

I suppose you could even do a

7) Like 5, but every time the change is merged, add the merge point to the
annotation list. I think this is a "For all lines which don't match left-parent,
use the right annotation, and add current."
This feels like it would scale poorly, but it might give you some interesting
information about when people are pulling patches, etc for a given line.

The biggest downside of heads() stuff is that it happens quite a bit.
Specifically, if you have:

A
|\
B C
|/|
D E
|/|
F G
|/|
H I

... If B and C disagree on the annotation, and B wins, then C, E, G, I, ... will
all be checked when they are merged. B will 'win' every time, but you don't know
that until you look. (certainly this will likely be cached, blah blah blah).

I'm actually leaning towards #2. It does 'fail' on:

Graph 2b
A   # 'foo'
|\
B C # B changes to 'bar', 'C' unchanged
| |
D | # D reverts to 'foo'
|/
E   # E??

In that case, D reverted the value, and would usually get the new annotation,
but C has it at the old value, and the merge reverts it because of propagation.

So my second favorite is 4b (2 + heads()).

2 is what we used with knits, and overall 2/4b IMO gives the best fidelity
versus implementation overhead. (5 is the highest fidelity, but the most
invasive, etc).

Anyone else have some ideas? We need to do something, because right now
gannotate doesn't show useful things. (And actually often fails with a
NoSuchFile error when the line is annotated to a merge, and the diff doesn't
even include that file.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg8Vo8ACgkQJdeBCYSNAAP0IQCgg/TliCRI4tb6Ozmk0w/lGF4q
bdAAn3foQHWSlP9NHqC8dqstvj3RLFpQ
=dHx6
-----END PGP SIGNATURE-----