Optimal Merge Base selection

Mon Jul 11 14:58:15 BST 2005

John A Meinel wrote:
> Martin Pool wrote:

>>This case works best when there is no overlap in the file names.  (It
>>might be nice if there was a way to merge the root of OTHER into a
>>subdirectory of THIS.)
> 
> 
> Couldn't this be handled with my TREE_ROOT gets a real id suggestion?

Yes, but that breaks the case where you actually want them in the same
directory.  Once we have file-id aliases, that would make sense.

> It actually puts the patches into a directory named something like
> "+patches-missing-files.23123"
> So it sort-of ignores them. I think it should print a warning, and put
> the patch somewhere, but not actually stop the merge/replay.

I think tla is somewhat broken here-- they should be treated as
conflicts, but they don't produce that status code and they don't halt
replay.

>>In arch, baz and bzr this will mean we record that replayed version as
>>merged, but in fact not all of the changes have been taken in.  That
>>might cause trouble later.
> 
> 
> How so? How is that different from doing a merge, and then editing out
> the parts that you don't like?

One problem is you might select the just-committed revision as a base,
and therefore not be able to do proper three-way-merging on the missing
files.

> I realize that bzr makes a different definition of what merging a
> revision means, but it seems like at some point merges need to be edited
> to get them to fit into the local tree. So having 'missing' is not
> really different from having 'edited'.

Just in scale.  No one ever recommends making massive changes to the
tree before committing a merge, just whatever's necessary to make the
merge right.

> Now if bzr incorporates weave/codeville merging, then you need to be a
> little bit more careful about cherry-picking. Because you start
> detecting that a future diff over-rules the previous diff, so you need
> to make sure that you know where the original came from, to see if any
> new ones over-rule it.
> 
> So a different track... How does Codeville handle branching ancestry? If
> you annotate each line with a number indicating which revision it was
> modified, you can easily see that 10 > 9 thus 10 should take priority.
> But with a truly distributed setup, don't you have the problem that it
> isn't obvious whether john at arbash-meinel.com-200513123123-aontehuntaho
> comes before or after mdp at sourcefrog.net-20051423423-aoehunnth?

Yeah, in distributed RCSes, you can't trust time.  All you can trust is
sequence.  So if A is descended from B, you know their sequence.  But
for two parallel branches, you can't know the sequence.

> Or is it that when you merge my changes, they get re-labeled with the
> revision number where they were merged, and you just don't worry that
> they came from me.

A good annotation algorithm would handle it thusly:
For revision C with parents A and B, see if the change was introduced
from A to C, or from B to C.

Compare with A | Compare with B| Really introduced in
-----------------------------------------------------
C              |C              | C
A              |C              | A
C              |B              | B
A              |B              | A & B

As you can see above, it's possible for A and B to both introduce the
same change.  For instance, they may both have applied the same patch.
I'm not sure what the right way to handle that for merging purposes is.

> I might be way to tangential and lost in my own thoughts. I can't say
> that I have spent a long time thinking about Codeville merging, other
> than the cursory, "looks kind of interesting".

Well, Codeville does a couple of neat things:
1. establishes the identity of each line in the file, so that you don't
need context to get merging right.
2. uses ancestry instead of a 'base' revision in order to determine
which changes supercede one another.

But if course, you can still get conflicts, and it's just a text-based
merge.  Also, I doubt it handles any unit finer than 'per-line'.  Though
I suppose you could use any separator you liked, e.g. whitespace, to do
annotate on finer levels.

One thing I think no one's mentioned about Codeville merge is, I don't
think you need the entire ancestry.  I think you only need the
annotation up until the last common ancestor.  (The rest can just be set
to 'UNKNOWNLASTCHANGER'.)

If we only have to annotate a small number of revisions, we may not need
a weave-based format to do it speedily.  It also means we can annotate
with different parameters (e.g. 'ignore line-ending differences', 'break
at all whitespace and the following characters '().'), and that we can
accept relatively wasteful annotation representations, since they're
only temporary.

Aaron
-- 
Aaron Bentley
www.aaronbentley.com