[MERGE][1.6] Real --weave merge

Fri Jul 11 23:11:00 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Attached is a patch which brings back a more accurate '--weave' merge.
The one we have is a bit closer to an 'annotation' based merge, but it
doesn't properly handle deleted lines. Also, it is *really* slow.

The attached patch basically builds up a weave "on the fly". The basic
idea is pretty simple, which I will outline. I'll also try to touch on
some of the specific details that clutter up the code a bit.

1) When we get a file to merge, build up a weave with the texts in the
ancestry. Sort of "pretend you've been using weaves all along and see
what you get".

2) We can do a bit better, though. With the realization that if there is
a simple ancestor, we only need the 3 texts.

eg:
~    A
~   / \
~  B   C
~  |   |\
~  D   E F
~  |   |/
~  G   H

It doesn't matter how complicated the ancestry from A=>G is, or A=>H is.
If it isn't a common ancestor, then we can make do with just those 3 texts.
This is because the merge plan can only really say 'from base',
'killed-a', 'killed-b', 'new-a', 'new-b'. Which is all quantified by
collapsing A=>G and A=>H.

This is actually the same logic used for regular 3-way merging.

3) Of course, weave merge isn't very interesting unless you have
criss-cross merges, because otherwise 3-way works fine. However, the
same logic can be applied when there are multiple common ancestors. For
example:

~    A
~   / \
~  B   C
~  |\ /|
~  | X |
~  |/ \|
~  D   E

We sort of take the above example and apply it recursively.

If A is the LCA of B & C, then all revisions before A will be in the
ancestry of both (and thus uninteresting), and all the revisions between
A=>B will be unique to that side, and thus also uninteresting.
This applies again between B => D&E, and C => D&E.

So we can still do a proper weave merge, with only 5 nodes.

4) Things get complicated, though, when there are more than 2 LCAs. It
is also really hard to draw, but try imagining it:

~    A
~    |\
~    B \
~   / \ \
~  C   D E
~  |\ /| |
~  | X | |
~  |/ \|/
~  F   G

E also is merged into F.

The specific problem is that B is an LCA of C & D, but *not* an LCA of
E. However, if we don't include B, then the lines introduced there will
show up as 'new' in both C and D. Which means that F & G have to 'pick
one copy' and mark the other one as 'killed'. Which causes problems when
someone then changes one of those 'killed' lines.

I believe it would be possible to keep taking LCAs between the pair-wise
nodes until you got some sort of convergence. For now, when I encounter
this case, I just punt, and grab all ancestors that are ancestors of
C,D,E but are not ancestors of A.

4b) There is another problem with the 'grab all the ancestors' case.

~    A
~   / \
~  B   C
~  |\ /|
~  D E F
~  |/ \|
~  G   H
~  |\ /|
~  | X |
~  |/ \|
~  I   J

In this case, grabbing all revisions which are ancestors of G & H, but
are not ancestors of E, would include nodes 'D & F'. However, a lot of
the lines in D & F are actually common to E, because they come from the
ancestors A, B & C.
At the moment, I solve this by grabbing the lca *again* between D,F and
E. I cheat a bit, because I'm not willing to track down the rabbit whole
forever.

Though I've also come up with a realization. We actually could prune D &
F from the graph. It is the same argument about "we don't care *how*
things changed E => G, just *what* changed". It just happens that adding
common lines to a weave gets things confused, as we don't allow lines to
converge. (Note that the codeville weave merge always uses all lines
that have ever existed, so it *might* do better with them.)

If I was to implement this, I would walk the ancestors of G & H,
stopping when I reach an ancestor of E. At the same time, I would keep
track of the parent => child relationships. I would then walk the
limited subset of the children of E, and prune anything that is an
ancestor of G & H but not a child of E.

This would probably be a good solution, I'm just tired of walking graphs
and merging weaves.

Anyway, this restores a nice, fast weave merge algorithm that is able to
handle criss-cross merging properly. By fast, I mean the old --weave
would take >12 hours (nobody knows because they never waited long
enough), and now takes 3m23s for the same merge. By 'properly' I mean
that 3-way merge would introduce 300+ text conflicts, and weave merge
gives only 10.

I still have updates to bring to merging, wrt per-file graphs and
handling when the ancestry graph gets confused by criss-crossing.

I would really like to get this into 1.6, as ATM mysql is using a custom
build to get this sort of result, and I'd really like to get them back
onto official releases.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkh32nQACgkQJdeBCYSNAAMkoQCcClsjMQZNhYVL33Dea6DXQp2e
H/YAoKW1U7Pf3w+b130K7Y01AduNEgaL
=U8P2
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: weave_merge.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080711/f7a62b18/attachment-0001.diff