Annotate policies

Thu May 29 03:42:56 BST 2008

On Wed, May 28, 2008 at 4:44 AM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> We've been getting some off-list feedback about how annotate is working, and
> I'd
> like to discuss some of it here.
>
> Specifically, the hard part of annotation is how to handle lines that appear
> on
> both sides of a merge, with different annotations. I'll try to describe the
> possible resolutions, and what impact I think they will have. First, the all
> important ASCII revision graph:
>
>
> Graph 1
> ~  A    # Common text
> ~  |\
> ~  B C  # B and C both introduce the line 'foo'
> ~  |\|
> ~  D E  # Simple merge of B => C
> ~  |/
> ~  F    # Merge E back into D
>
>
> Graph 2
> ~  V    # Common text, has line 'foo'
> ~  |\
> ~  W X  # No change in W, X changes 'foo' to 'bar'
> ~  | |
> ~  | Y  # Y reverts 'bar' back to 'foo'
> ~  |/
> ~  Z
>
>
> In graph 1, the question is what to do at merge points 'E' and 'F' given the
> disagreement as to the 'author' of the line. Graph 2 is what to do about
> 'revert'.
>
> Here are the options I can think of:
>
> 1) First parent wins. In this case, E would keep the annotation C, and F
> would
> keep the annotation B. Z would keep V.
>
> 2) Last parent wins. E would get B, and F would keep B. Z would get the
> annotation Y.
>
> 3) Merge node gets annotation (current). In this case, E would reannotate
> 'foo'
> with E, since it resolved the 'conflict' of who last modified that revision.
> (We
> ~ do this with the file modification graph.) F would then keep E, as it
> supersedes (descends from) B.
>
> 4a) like 1, only introduce a heads() check. This doesn't effect Graph 1, but
> in
> Graph 2, it would mean that Z would be annotated with Y rather than V.
>
> 4b) Like 2, only with heads(). Doesn't actually change these graphs. (If
> there
> were a child of Y which merged W, then it would be effected.)
>
> 5) Multiple. In this case, E would get (B, C) as the annotation, rather than
> just having 1 marker. This has a couple variants:
>
> ~  a) Ordered by first seen. So E would actually get (C, B), but F would
> want
> ~     (B, C)
> ~  b) Unordered, just a set
> ~  c) Lexicographically ordered, so all would go for (B, C), pretty much the
> same
> ~     as b, just a slight logical difference
>
> ~  There is also the question of heads(), aka would Z get (V, Y) or would it
> just
> ~  get Y
>
> 6) Ignore merges when computing annotations. In this case, F would keep B, E
> would keep C, Z would stay V.
>
>
> So on to some debate about the different algorithms.
>
> (3) has the benefit of at least noticing a discrepancy. It has the major
> downside of not actually marking a revision which changed the line. (So, for
> example, double clicking on a line in gannotate can pop up a diff which
> doesn't
> include that line) We could try to teach gannotate how to handle those
> cases, it
> already does slightly if you use the 'back' button. (Though 'back' probably
> only
> follows the left-hand, and doesn't let you look at any other source.)
>
> (5) seems like the most accurate, but also ends up being the most difficult.
> Because it changes the annotate API from returning just a single value, to
> returning 1-N values. Which would complicate things like 'gannotate' which
> should now be showing multiple revision messages at the bottom.
>
> 4ab is nice to have heads() for accuracy. It has the downside of adding a
> fairly
> significant amount of overhead. (Back when I was implementing (3), approx
> 25-50%
> of the time was spent in heads())
>
>
> (1) tends to 'flip-flop' because values don't propagate through a merge.
> This is
> also a performance implication, because whenever two sides disagree, you
> have to
> do some work to figure out who to believe.
>
>
> (6) is a bit of an odd fish, but it was actually requested at Pycon.
> Specifically, the idea was that the merge integrator should be the one
> getting
> annotations. So that if Barry merged Joe's patch, people can see that Barry
> did
> so, and can go beat up on him for trusting Joe. Nobody knows who Joe is, so
> they
> can't do much about him.
>
> I suppose you could even do a
>
> 7) Like 5, but every time the change is merged, add the merge point to the
> annotation list. I think this is a "For all lines which don't match
> left-parent,
> use the right annotation, and add current."
> This feels like it would scale poorly, but it might give you some
> interesting
> information about when people are pulling patches, etc for a given line.
>
>
> The biggest downside of heads() stuff is that it happens quite a bit.
> Specifically, if you have:
>
> A
> |\
> B C
> |/|
> D E
> |/|
> F G
> |/|
> H I
>
> ... If B and C disagree on the annotation, and B wins, then C, E, G, I, ...
> will
> all be checked when they are merged. B will 'win' every time, but you don't
> know
> that until you look. (certainly this will likely be cached, blah blah blah).
>
>
> I'm actually leaning towards #2. It does 'fail' on:
>
> Graph 2b
> A   # 'foo'
> |\
> B C # B changes to 'bar', 'C' unchanged
> | |
> D | # D reverts to 'foo'
> |/
> E   # E??
>
> In that case, D reverted the value, and would usually get the new
> annotation,
> but C has it at the old value, and the merge reverts it because of
> propagation.
>
> So my second favorite is 4b (2 + heads()).
>
> 2 is what we used with knits, and overall 2/4b IMO gives the best fidelity
> versus implementation overhead. (5 is the highest fidelity, but the most
> invasive, etc).
>
> Anyone else have some ideas? We need to do something, because right now
> gannotate doesn't show useful things. (And actually often fails with a
> NoSuchFile error when the line is annotated to a merge, and the diff doesn't
> even include that file.)

One option would be to allow for several of them and pass down an option
saying what method to use; then we can get some user feedback or at least
let them wiggle around to get a good result in a particular situation.  I
have to admit though that doing this for merge has had only limited
success: only a subset of users will actually try this if they hit a bad
situation and we don't get a huge amount of clear feedback about how we
could improve the default.

4b does seem like it could work, I wouldn't object to trying that as the
default.

I think ultimately the question "why is this line like this?" needs more
than a one word (one revid) answer.  I think I'd like to conceptually get
the highlighted parts of the revision graph showing how that line got to
where it finally ended up, whether that is by its original addition,
merges, conflict resolution, etc, and maybe even being to jump over
sections where it was added and then replaced.  I don't

-- 
Martin <http://launchpad.net/~mbp/>