More merge base discussion [was: monotone's LCA+DOM algo for selecting a merge base]

Thu Feb 23 16:11:34 GMT 2006

Denys Duchier wrote:
> Today, while cleaning my apartment, I had some time to think about "merge base
> selection", and I may have had a very moderate epiphany.  I apologize in advance
> if this was already clear to others (also, I haven't quite caught up with the
> entire thread yet).
> 
> First, because it will be important for the argument that I develop below, I
> must reiterate my view that the job of a merge point is solely to choose which
> halfs of the hunk conflict pairs to keep.  This is (I think) in contrast to the
> view advanced by Aaron that merges are machine assisted edits of changes.  If
> you prefer, think of bzr merges as actually consisting of possibly two
> successive revisions: the first one only chooses a hunk from every conflict
> pair, the optional second one makes additional edits.

"the job of a merge point is solely to choose which halfs of the hunk
conflict pairs to keep"

I disagree with this specific point. Because if there are conflicts, it
is the job of the human to figure out 'which half to keep'. Because
often, the half to keep is to merge them both together.

For example, we had a line which was:

get_stuff(branch, 'some text')

One person did the repository changes and this was changed to:

get_stuff(branch.repository, 'some text')

Another person fixed some text comments, and changed it to:

get_stuff(branch, 'some other text')

After merging, both changes should be present, so the final should
actually be:

get_stuff(branch.repository, 'some other text')

So if merging just picked one to be 'correct', it would be wrong.

The importance of picking an appropriate merge base is to avoid spurious
conflicts. Specifically the problem of: You change a group of lines.
Then I merge you. Then I fix one of those lines. You fix something else,
which is completely unrelated. And I merge you again.

If the base chosen is too old, it will see your original change, and it
looks like I did a similar change, but with one line being different.
Which suddenly is a conflict. But if the correct base is chosen, it will
see that I've already seen your changes, and I have updated them.

Maybe an example will help:

A
| \
B   C
| \ |
D   E
    |
    F

In 'A' the text is
X
Y
Z

In 'B' the text is
X
Y
Z
L
M
N

In 'C' the text is the same as in A, C's changes are in a different
file. 'E' has the same text as 'B' since I merged them. D's changes are
also in a different file.

The text in 'F' is:

W
Y
Z
L
Monty Python
N

Now we have to pick a merge base between D and F. If we picked A, then
the difference from A=>D would include adding 'LMN' to the file. And the
changes from A=>F would also show adding "LMontyPythonN' to the same
file. However, if we pick B as the merge base, B=>D does not add 'LMN'
to the file, so it won't conflict.

> 
> Second, I firmly adopt the point of view that merges are non-symmetric: you
> merge OTHER into THIS.  I believe this may be in disagrement with the idea
> advanced by JAM that base selection should be symmetric.
> 

I'm not stuck on symmetry. But I have the feeling it will have specific
advantages. So that if I merge you, and before I commit you merge me,
our merged trees should look very similar.

Which means that I can make your job easier by merging your tree, and
resolving the conflicts for you. Thus you can just merge the resolved
revision (in the past you could pull this revision because it was
guaranteed to be conflict free).

If we make merge non-symmetric, the conflicts generated would be
different, which leads me to feel that there will be a preference to
which way you merge. Which means one person is required to do the work
since the other person has different access.

> As a consequence of this distinction, the revision graph contains two kinds of
> arcs: "derivation" arcs (i.e. from left parent) and "merge" arcs (i.e. from
> non-left parent: meaning right parent if there are two of them).
> 
> DEFINITIONS:
> 
> A "derivation" ancestor of R is an ancestor of R that is on R's main history
> line.
> 
> A "merge" ancestor of R is an ancestor of R that is not on R's main history
> line.
> 
> R0 is a merge ancestor of R if it is an ancestor of R such that all paths from
> R0 to R traverse at least one merge arc.  R0 is a derivation ancestor of R if
> there exists at least one path from R0 to R that traverses only derivation arcs.
> 
> NO MISTAKEN UPDATES:
> 
> An essential condition, when picking a merge base (BASE), is that it should not
> mistakenly overwrite a deliberate choice in THIS's history over OTHER's.
> 
> Consider the following example where derivation arcs are drawn solid and merge
> arcs are drawn with asterisks:
> 
> 
>              A[123]
>             / \
>            /   \
>        B[143] C[153]
>           |    *|
>           |   * |
>           |  *  |
>           | *   |
>        D[143]   |
>           | *   |
>           |  *  |
>           |   * |
>           |    *|
>           |   F[153]
>           |     |
>           |     |
>           |     |
>       E[1430] G[1530]
>       THIS    OTHER
> 
> Picking D as BASE would cause "4" to appear unchanged in THIS and (incorrectly)
> changed in OTHER.  Picking C would make it appear changed in THIS and unchanged
> in OTHER (correct, no conflict to resolve).  Picking A would make it appear
> changed in both (correct, but conflict to resolve).

I think there is a small issue here. And that is something that
Codeville is trying to do, but I don't believe bzr's weave merge will
do. I think it is http://revctrl.org/ImplicitUndo
Though it may be http://revctrl.org/Convergence or
http://revctrl.org/StaircaseMerge

I have difficulty with their graphs, because I'm not always sure what is
being merged.

But in your above example, you kind of have a double implicit undo. Wher
you say "I saw that change, but I want to throw it away", and the other
branch actually does the same. "I saw that change, and want to ignore it".

I just want to point out, that some merge algorithms will do the right
thing, regardless of what base is chosen, and some will require a
specific base to be chosen. 3-way diff (merge3, our current default)
requires the right base. PreciseCodevilleMerge may not.

> 
> In general picking a BASE that is a merge ancestor of OTHER can lead to a
> harmful false positive (here, I need to develop a full proof): i.e. it may
> classify the hunk as unchanged in THIS (with respect to BASE) and changed in
> OTHER, while actually, with respect to OTHER's history, the hunk is unchanged in
> OTHER and changed in THIS.  A false positive in the other direction (or both
> directions) is harmless.
> 
> This means that BASE must be a derivation ancestor of OTHER that is also an
> ancestor of THIS (note: this is true of a dominator).
> 
> CHOOSING THE BASE:
> 
> This second part of my argument needs even more work, but it goes like this.  We
> need to consider the origin of the hunk data: the BASE must be chosen so as to
> be comparable with the origin of any potential hunk (conflict) data.  The set of
> possible origins are the common ancestor revisions of THIS and OTHER.  Thus, we
> want to choose a BASE that is a derivation ancestor of OTHER and on all paths
> from the root to any common ancestor revision (a dominator satisfies that
> condition).  We also want to pick BASE as close to THIS and OTHER as possible
> (to minimize spurious conflicts).  Thus picking the closest dominator of OTHER
> that is also an ancestor of THIS will do the trick.
> 
> What to you think?  does this sound like I am on the right track?
> 
> Cheers,
> 
> --Denys

I think it is worth exploring. You do have an interesting idea.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060223/027d5878/attachment.pgp