[PATCH] more work on merge

Mon Jul 11 16:26:10 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Robert Collins wrote:
>>A better one (but not one cheaply accessible to baz today - tho it may
>>be cheap for bzr today) might be lines modified + new files + deleted
>>files.

Ultimately, the criteria for a good merge base is "whatever produces the
fewest conflicts".  A good merge base will
- - not have any lines that differ from THIS and OTHER when THIS and OTHER
also differ
- - have all files that differ in THIS and OTHER

There are really only a few good candidates for merge bases anyhow, as
any merge candidate that is an ancestor of another merge candidate is
unlikely to be useful.  (This is a heuristic.  It assumes people don't
revert to their previous state often, while their branches also revert
to that previous state without merging.)

So it might be worthwile to try 3-way merging all good candidates.  We
aren't usually talking about huge numbers of files anyway.

>>This neatly fixes the history shortcut problem. (The late merge produces
>>a very large LOC change from the merge source to the branch, making that
>>node an expensive one to take - though it may well be taken once the
>>mainlines overlapping changes to single lines/files start to dominate).

Wouldn't another way of solving the history shortcut problem be to
always record the longest path from CANDIDATE to THIS/OTHER, and then
take the candidate with the shortest longest path?

> There are a couple of issues, though. Specifically, now you need the
> inventory as well as the revision XML, because otherwise you can't tell
> what has changed. This isn't a big issue, but the currently proposed
> cset format only supplies the revision XML.

Oh, but you can generate the inventory from the changeset and its base,
right?  Or just install the changeset's revision first?

> I think it is still cheaper for bzr to use "num_modified + .5*num_new +
> .5*num_deleted", because bzr would have to extract the actual texts and
> compare them. (Of course, as weave or revlib starts to become more
> common, you might only have to extract the delta, rather than get 2
> texts and compute a diff).

Extracting texts from the current storage format doesn't seem very
expensive to me.  We're only talking about modified files, and heck, we
can optimize by only grabbing one copy of each text.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC0o+S0F+nu1YWqI0RAgXJAKCA0ccEAM7YpRv3TOnI09dXn61P3wCfesxi
kFYM2WNb8v4AIEHNvjhn1hM=
=7WQr
-----END PGP SIGNATURE-----