Merge algorithms

Fri Sep 10 01:58:41 BST 2010

On 30 August 2010 17:13, Eli Zaretskii <eliz at gnu.org> wrote:
> The documentation of "bzr merge" and "bzr remerge" mentions several
> different algorithms accepted by the --merge-type option, but tell
> almost nothing about them, nor says which one is the default.  All I
> could find about this is (1) a bit of background in the "Criss-Cross"
> section of the User Reference, and (2) a programmer-oriented
> description of LCA merge under Developer Docs/Specifications.

Hi there,

That's a really good question, so I kind of stalled for a while
meaning to write something in to the docs about it.  Sorry.  Here's at
least a brief answer:

There is some existing documentation here, that's actually reasonably
readable but not necessarily very findable:

http://doc.bazaar.canonical.com/bzr.dev/developers/lca-merge.html
http://doc.bazaar.canonical.com/bzr.dev/developers/lca_tree_merging.html

The highest phylogenic difference is between annotation-based merge
and 3-way merge.

Weave merge does essentially an 'annotate' on the two files it's
trying to merge, then meshes together the (origin, content) vectors.
Lines that are common to both are of course common and go straight
through to the output.  Lines that exist on only one side, and whose
origin is a revision new on that side are new.  Conversely, lines that
exist on only one side and whose origin is in the common ancestry must
have been deleted by the other side.  If there are regions between
common lines where there are a mix of changes on both sides, they're
marked as conflicts.

lca stands for "latest common ancestor".  If there have been
criss-cross merges (A merges B who later merges back from A, etc)
there may be any number of lcas.  lca merge has a similar external
behaviour to weave merge, but is faster, because it doesn't fully
annotate the files but only determines whether lines were present in
the lca.  The doc there warns of a bug in lca determination which I
think is now fixed, so this should be safe to use everywhere and
perhaps should become the default.

The other two options are a three-way merge, done using internal
logical or an external diff3.  Both of these have to choose a single
ancestor to use as the base and so tend to do less well when there's
no single clear ancestor.  They have the advantage that it's fairly
easy to understand ("this is what's common and these are the two
sides"), that the basis actually existed at some point in time, and
that there are external tools and guis that do 3-way merges.
(Actually many guis can do N-way merges, and perhaps they could hook
in to lca merge....)

> Is there a user-level documentation of the available merge methods?
> If not, could someone please post that here at least?  What I'm
> looking for is (a) a short description of what each method does,
> (b) situations for which each method is suited best, (c) performance.
> Also, which of the methods is the default.
>
> Apologies if this is in the docs, but I overlooked it.

merge3 merger is the default, and is fast.  lca is probably the best
choice if you get a warning about criss-cross merges, or if you want a
bit more power at resolving conflicts, and it should be only
incrementally slower.

Hope that helps; please ask if anything more should be added and I'll
put it into the docs.

-- 
Martin