Optimal Merge Base selection
Aaron Bentley
abentley at panoramicfeedback.com
Mon Jul 11 14:58:15 BST 2005
John A Meinel wrote:
> Martin Pool wrote:
>>This case works best when there is no overlap in the file names. (It
>>might be nice if there was a way to merge the root of OTHER into a
>>subdirectory of THIS.)
>
>
> Couldn't this be handled with my TREE_ROOT gets a real id suggestion?
Yes, but that breaks the case where you actually want them in the same
directory. Once we have file-id aliases, that would make sense.
> It actually puts the patches into a directory named something like
> "+patches-missing-files.23123"
> So it sort-of ignores them. I think it should print a warning, and put
> the patch somewhere, but not actually stop the merge/replay.
I think tla is somewhat broken here-- they should be treated as
conflicts, but they don't produce that status code and they don't halt
replay.
>>In arch, baz and bzr this will mean we record that replayed version as
>>merged, but in fact not all of the changes have been taken in. That
>>might cause trouble later.
>
>
> How so? How is that different from doing a merge, and then editing out
> the parts that you don't like?
One problem is you might select the just-committed revision as a base,
and therefore not be able to do proper three-way-merging on the missing
files.
> I realize that bzr makes a different definition of what merging a
> revision means, but it seems like at some point merges need to be edited
> to get them to fit into the local tree. So having 'missing' is not
> really different from having 'edited'.
Just in scale. No one ever recommends making massive changes to the
tree before committing a merge, just whatever's necessary to make the
merge right.
> Now if bzr incorporates weave/codeville merging, then you need to be a
> little bit more careful about cherry-picking. Because you start
> detecting that a future diff over-rules the previous diff, so you need
> to make sure that you know where the original came from, to see if any
> new ones over-rule it.
>
> So a different track... How does Codeville handle branching ancestry? If
> you annotate each line with a number indicating which revision it was
> modified, you can easily see that 10 > 9 thus 10 should take priority.
> But with a truly distributed setup, don't you have the problem that it
> isn't obvious whether john at arbash-meinel.com-200513123123-aontehuntaho
> comes before or after mdp at sourcefrog.net-20051423423-aoehunnth?
Yeah, in distributed RCSes, you can't trust time. All you can trust is
sequence. So if A is descended from B, you know their sequence. But
for two parallel branches, you can't know the sequence.
> Or is it that when you merge my changes, they get re-labeled with the
> revision number where they were merged, and you just don't worry that
> they came from me.
A good annotation algorithm would handle it thusly:
For revision C with parents A and B, see if the change was introduced
from A to C, or from B to C.
Compare with A | Compare with B| Really introduced in
-----------------------------------------------------
C |C | C
A |C | A
C |B | B
A |B | A & B
As you can see above, it's possible for A and B to both introduce the
same change. For instance, they may both have applied the same patch.
I'm not sure what the right way to handle that for merging purposes is.
> I might be way to tangential and lost in my own thoughts. I can't say
> that I have spent a long time thinking about Codeville merging, other
> than the cursory, "looks kind of interesting".
Well, Codeville does a couple of neat things:
1. establishes the identity of each line in the file, so that you don't
need context to get merging right.
2. uses ancestry instead of a 'base' revision in order to determine
which changes supercede one another.
But if course, you can still get conflicts, and it's just a text-based
merge. Also, I doubt it handles any unit finer than 'per-line'. Though
I suppose you could use any separator you liked, e.g. whitespace, to do
annotate on finer levels.
One thing I think no one's mentioned about Codeville merge is, I don't
think you need the entire ancestry. I think you only need the
annotation up until the last common ancestor. (The rest can just be set
to 'UNKNOWNLASTCHANGER'.)
If we only have to annotate a small number of revisions, we may not need
a weave-based format to do it speedily. It also means we can annotate
with different parameters (e.g. 'ignore line-ending differences', 'break
at all whitespace and the following characters '().'), and that we can
accept relatively wasteful annotation representations, since they're
only temporary.
Aaron
--
Aaron Bentley
www.aaronbentley.com
More information about the bazaar
mailing list