[MERGE][1.6] Real --weave merge
John Arbash Meinel
john at arbash-meinel.com
Fri Jul 11 23:11:00 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Attached is a patch which brings back a more accurate '--weave' merge.
The one we have is a bit closer to an 'annotation' based merge, but it
doesn't properly handle deleted lines. Also, it is *really* slow.
The attached patch basically builds up a weave "on the fly". The basic
idea is pretty simple, which I will outline. I'll also try to touch on
some of the specific details that clutter up the code a bit.
1) When we get a file to merge, build up a weave with the texts in the
ancestry. Sort of "pretend you've been using weaves all along and see
what you get".
2) We can do a bit better, though. With the realization that if there is
a simple ancestor, we only need the 3 texts.
eg:
~ A
~ / \
~ B C
~ | |\
~ D E F
~ | |/
~ G H
It doesn't matter how complicated the ancestry from A=>G is, or A=>H is.
If it isn't a common ancestor, then we can make do with just those 3 texts.
This is because the merge plan can only really say 'from base',
'killed-a', 'killed-b', 'new-a', 'new-b'. Which is all quantified by
collapsing A=>G and A=>H.
This is actually the same logic used for regular 3-way merging.
3) Of course, weave merge isn't very interesting unless you have
criss-cross merges, because otherwise 3-way works fine. However, the
same logic can be applied when there are multiple common ancestors. For
example:
~ A
~ / \
~ B C
~ |\ /|
~ | X |
~ |/ \|
~ D E
We sort of take the above example and apply it recursively.
If A is the LCA of B & C, then all revisions before A will be in the
ancestry of both (and thus uninteresting), and all the revisions between
A=>B will be unique to that side, and thus also uninteresting.
This applies again between B => D&E, and C => D&E.
So we can still do a proper weave merge, with only 5 nodes.
4) Things get complicated, though, when there are more than 2 LCAs. It
is also really hard to draw, but try imagining it:
~ A
~ |\
~ B \
~ / \ \
~ C D E
~ |\ /| |
~ | X | |
~ |/ \|/
~ F G
E also is merged into F.
The specific problem is that B is an LCA of C & D, but *not* an LCA of
E. However, if we don't include B, then the lines introduced there will
show up as 'new' in both C and D. Which means that F & G have to 'pick
one copy' and mark the other one as 'killed'. Which causes problems when
someone then changes one of those 'killed' lines.
I believe it would be possible to keep taking LCAs between the pair-wise
nodes until you got some sort of convergence. For now, when I encounter
this case, I just punt, and grab all ancestors that are ancestors of
C,D,E but are not ancestors of A.
4b) There is another problem with the 'grab all the ancestors' case.
~ A
~ / \
~ B C
~ |\ /|
~ D E F
~ |/ \|
~ G H
~ |\ /|
~ | X |
~ |/ \|
~ I J
In this case, grabbing all revisions which are ancestors of G & H, but
are not ancestors of E, would include nodes 'D & F'. However, a lot of
the lines in D & F are actually common to E, because they come from the
ancestors A, B & C.
At the moment, I solve this by grabbing the lca *again* between D,F and
E. I cheat a bit, because I'm not willing to track down the rabbit whole
forever.
Though I've also come up with a realization. We actually could prune D &
F from the graph. It is the same argument about "we don't care *how*
things changed E => G, just *what* changed". It just happens that adding
common lines to a weave gets things confused, as we don't allow lines to
converge. (Note that the codeville weave merge always uses all lines
that have ever existed, so it *might* do better with them.)
If I was to implement this, I would walk the ancestors of G & H,
stopping when I reach an ancestor of E. At the same time, I would keep
track of the parent => child relationships. I would then walk the
limited subset of the children of E, and prune anything that is an
ancestor of G & H but not a child of E.
This would probably be a good solution, I'm just tired of walking graphs
and merging weaves.
Anyway, this restores a nice, fast weave merge algorithm that is able to
handle criss-cross merging properly. By fast, I mean the old --weave
would take >12 hours (nobody knows because they never waited long
enough), and now takes 3m23s for the same merge. By 'properly' I mean
that 3-way merge would introduce 300+ text conflicts, and weave merge
gives only 10.
I still have updates to bring to merging, wrt per-file graphs and
handling when the ancestry graph gets confused by criss-crossing.
I would really like to get this into 1.6, as ATM mysql is using a custom
build to get this sort of result, and I'd really like to get them back
onto official releases.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkh32nQACgkQJdeBCYSNAAMkoQCcClsjMQZNhYVL33Dea6DXQp2e
H/YAoKW1U7Pf3w+b130K7Y01AduNEgaL
=U8P2
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: weave_merge.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080711/f7a62b18/attachment-0001.diff
More information about the bazaar
mailing list