Optimal Merge Base selection

Mon Jul 11 03:53:06 BST 2005

On 10 Jul 2005, John A Meinel <john at arbash-meinel.com> wrote:

> > This case works best when there is no overlap in the file names.  (It
> > might be nice if there was a way to merge the root of OTHER into a
> > subdirectory of THIS.)
> 
> Couldn't this be handled with my TREE_ROOT gets a real id suggestion?

Yes, that's a good way to handle it.  I should merge that.

> Well, I believe everything in arch is done by id. So that if the ids are
> not the same, it is not the same file, regardless of the path. (The one
> exception that I know of is the path that the file will be put in, but
> that was declared a bug).
> 
> It actually puts the patches into a directory named something like
> "+patches-missing-files.23123"
> So it sort-of ignores them. I think it should print a warning, and put
> the patch somewhere, but not actually stop the merge/replay.

OK.

> > In arch, baz and bzr this will mean we record that replayed version as
> > merged, but in fact not all of the changes have been taken in.  That
> > might cause trouble later.
> 
> How so? How is that different from doing a merge, and then editing out
> the parts that you don't like?

Yes, it's similar to that.  It comes back to this question of "not
merged" vs "merged and rejected", and whether there is a distinction or
not.  Wanting to merge only the changes that affect a particular
subdirectory or file is one case of it.  Perhaps it is not worth
worrying too much about.

> I realize that bzr makes a different definition of what merging a
> revision means, but it seems like at some point merges need to be edited
> to get them to fit into the local tree. So having 'missing' is not
> really different from having 'edited'.

> Well, Aaron defined that the way to do the merge is "bzr merge M L",
> basically, take the difference between the M snapshot and the L
> snapshot, and merge that into my local tree.
> 
> For starters, I don't have any problem with bzr not handling
> cherry-picking. As long as I *can* cherry-pick, it doesn't have to mess
> up merging. Bzr works in snapshots, not differences anyway, so probably
> a lot of cherry-picks will show up as a similar diff.
> Now if bzr incorporates weave/codeville merging, then you need to be a
> little bit more careful about cherry-picking. Because you start
> detecting that a future diff over-rules the previous diff, so you need
> to make sure that you know where the original came from, to see if any
> new ones over-rule it.
> 
> So a different track... How does Codeville handle branching ancestry? If
> you annotate each line with a number indicating which revision it was
> modified, you can easily see that 10 > 9 thus 10 should take priority.
> But with a truly distributed setup, don't you have the problem that it
> isn't obvious whether john at arbash-meinel.com-200513123123-aontehuntaho
> comes before or after mdp at sourcefrog.net-20051423423-aoehunnth?

bzrlib/weave.py now has a reasonable command-line interface that will
let you experiment with an algorithm that I think is equivalent to
Codeville merge.

The basic idea is that we remember the parents of each text revision.
We use that to compose a parent revision that contains changes from all
the common parents, even if that composed parent (or 'mash') is not any
file that ever existed, or even syntactically valid.

weave.py uses version numbers as a shortcut that's only valid within a
single local weave file (and that should probably never be shown to the
user.)  

At least in my code I don't worry about numeric ordering and rather just
do set logic about whether revision
john at arbash-meinel.com-200513123123-aontehuntaho includes
mdp at sourcefrog.net-20051423423-aoehunnth or vice versa.

> Or is it that when you merge my changes, they get re-labeled with the
> revision number where they were merged, and you just don't worry that
> they came from me. 

No, we do try to remember the true origin.  Of course since this is
based on trying to identify common lines between files it's really only
a heuristic that will be confused by e.g. whitespace changes or
reorderings.

> But then if you merge from me again, how do you
> detect that the change you made on line 10 supersedes my change on
> line 10?

Maybe this example will help:

% weave init test.weave
% weave add test.weave
aaa
bbb
ccc
added version 0
% weave add test.weave 0
aaa
bbb
stuff from martin
ccc
ddd
added version 1
% weave add test.weave 0
aaa
bbb
stuff from john
more john stuff
ccc
added version 2
% weave merge test.weave 1 2
aaa
bbb
<<<<<<<< version 1
stuff from martin
========
stuff from john
more john stuff
>>>>>>>> version 2
ccc
ddd
% weave add test.weave 1 2
aaa
bbb
stuff from martin
fix up merge
more john stuff
ccc
ddd
added version 3
% weave annotate test.weave 3
    0 | aaa
      | bbb
    1 | stuff from martin
    3 | fix up merge
    2 | more john stuff
    0 | ccc
    1 | ddd
% weave add test.weave 3
aaa
bbb
stuff from martin
fix up merge
modify john's code
ccc
ddd
add stuff here
added version 4
% weave annotate test.weave 4
    0 | aaa
      | bbb
    1 | stuff from martin
    3 | fix up merge
    4 | modify john's code
    0 | ccc
    1 | ddd
    4 | add stuff here
% weave merge test.weave 4 2
aaa
bbb
stuff from martin
fix up merge
modify john's code
ccc
ddd
add stuff here
% diff -u <(weave merge test.weave 4 2 ) <(weave get test.weave 4)
%

John's revision 2 is known to be completely merged into 4, and doesn't
generate any conflicts.  (Obviously in a real system we would use
universal ids not simple integers.)  John now makes a new revision 5
based on his last one, 2:

% weave add test.weave 2
aaa
bbb
stuff from john
more john stuff
john replaced ccc line
added version 5
% weave annotate test.weave 5
    0 | aaa
      | bbb
    2 | stuff from john
      | more john stuff
    5 | john replaced ccc line
% weave merge test.weave 4 5
aaa
bbb
<<<<<<<< version 4
stuff from martin
fix up merge
modify john's code
ccc
ddd
add stuff here
========
stuff from john
more john stuff
john replaced ccc line
>>>>>>>> version 5

The main problem with this is that the weave file for the inventory will
get rather large if it's updated for every revision; thus the idea of
either storing the weave in a smarter format or building it in memory
when we merge.

-- 
Martin