More merge base discussion [was: monotone's LCA+DOM algo for selecting a merge base]

John A Meinel john at arbash-meinel.com
Sun Feb 19 15:58:58 GMT 2006


Aaron Bentley wrote:
> Denys Duchier wrote:
> | Aaron asked about monotone's LCA+DOM algo for selecting a merge base.
>  I know
> | essentially squat about monotone, but his question made think about
> why their
> | selection algorithm might be correct.  In this email, I'd like to
> write up what
> | I have thought of so far and offer it up for criticism and corrections.
> 
> Thank you, Denys.  I'd originally hoped we could start with
> least-common-dominator (LCD) as our merge base selection algorithm, so I
> was putting off thinking about LCA+DOM.
> 
> I now believe that that least-common-dominator is not suitable, so I'll
> be taking a stab at LCA+DOM.
> 
> I believe that LCD is unsuitable because of this scenario:
> 
> ~ A
> ~ |\
> ~ B C
> ~ |\|
> ~ D E
> 
> If we attempt to select a merge base for D and E, LCD will use A.  But
> it's obvious that B is the best choice.  (Selecting a worse base will
> produce more spurious conflicts.)
> 
> Because E descends from C, B doesn't dominate it, even though it
> dominates D.
> 
> Since B is a dominator (though not a common one), LCA+DOM would pick it.

This looks correct to me. Also, this works for the opposite direction,
right? Because merging E into D should probably also use B since it
dominates D.

Certainly we could have an algorithm that required the dominator to be
in OTHER or in THIS, or in EITHER. From your statements, it sounds like
monotone does EITHER.

> 
> Say we have a criss-cross:
> 
> A
> |\
> B C
> |X|
> D E
> 
> Now LCA+DOM will select A, which is proper.  There will be more merge
> conflicts, but they won't be spurious, because they reflect
> disagreements between the A-B-D line and the A-C-E line.
> 
> So here are the things I'm looking at:
> 1. Does "is a dominator" reliably test for criss-cross?  Does it depend
> on LCA?
> 
> I do not have much graph theory knowledge, so help here would be
> appreciated.
> 
> 2. Is LCA appropriate?
> Certainly the last time we looked at it, we believed that it was not.
> The problem is that the least common ancestor may be very old:
> 
> ~  A
> ~  |\
> ~  B C
> ~  | |
> ~  E |
> ~  | |
> ~  F |
> ~  |\ \
> ~  G H \
> ~  |  X |
> ~  | / \|
> ~  I    J
> 
> Err, you my pefer this version:
> http://aaronbentley.com/pastegraph.cgi?n=130
> 
> Of course, in this scenario, C is not a dominator, so maybe we're okay.
> ~ If we're certain that shortcuts like this one can never cause old bases
> to be selected, LCA+DOM may still be a good idea.

I think one of our most interesting potentials was to have a weighted
graph, where each edge was weighted by the size of the changes. It had
the advantage of making C->E very expensive so unlikely to be chosen.

We just needed a faster way of computing the weights, rather than
running 'compare_trees' all the time.

> 
> | The gist of the problem is that, whenever you merge, you may need to
> resolve a
> | conflict by choosing one of a pair of conflicting hunks.  However, it
> is clear
> | that choosing is NOT making a change.
> 
> Actually, I tend to think of merging as a machine-assisted edit, so I do
> think of these as changes.  But let's ignore that.

I would agree with the concept that picking one of them (and not
modifying it, and certainly not merging the 2 conflicts together in some
logical way), could be considered a resolution, not an actual edit. Just
from the concept of the lines should still be annotated with the person
who actually modified them, not the person who resolved the conflict on
bringing them in. (Then gannotate would tell you why those lines were
created, not at what step they were merged. It might be nice to do both,
though).

> 
> | When I pick a merge base, I should not
> | pick a revision such that in one history it appears unchanged because
> I always
> | choose the same hunk when resolving while in the other it appears new
> because I
> | chose another hunk.
> 
> Right.  Both the THIS hunk and the OTHER hunk should differ from BASE,
> to give the person doing the merge the "opportunity" to resolve the
> conflict.

I think the point is that it should be attempting to select the same
BASE, if I swap THIS for OTHER.

(In the extreme case you could always pick OTHER as BASE, and not merge
anything, or pick THIS as BASE, and you would always override THIS with
other. Obviously not what you want)

> 
> | Instead, it might be better to pick a base that dominates the origins
> of both
> | hunks.
> 
> I am not sure I follow this part.  The first part of my email shows why
> it is fine to use a base that does not dominate both versions of the text.
> 
> Aaron

I agree with Aaron. I think it needs to dominate at least 1, but it
should be EITHER, so that we always select the same BASE for merging
either direction.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060219/d2766cb4/attachment.pgp 


More information about the bazaar mailing list