[PATCH] more work on merge

Mon Jul 11 15:24:29 BST 2005

Robert Collins wrote:
> On Sun, 2005-07-10 at 12:13 -0500, John A Meinel wrote:

...

> In a similar theme to my other mail : what makes minimum-path-to-a-node
> a better cost than longest-path-to-a-node.
>
> Funny thing about writing emails, it gets one thinking.
>
> Heres a proposal to determine the cost of a path traversal : number of
> changes. This allows full SPF to be used rather than breadthfirst, and
> will give much better results IMO.
>
> Number of changes could be estimated by Modified-Files + 1/2 New files +
> 1/2 deleted files. This is absolutely crude, but would favour commits
> that only add or delete files over ones that modified files.
>
> A better one (but not one cheaply accessible to baz today - tho it may
> be cheap for bzr today) might be lines modified + new files + deleted
> files.
>
> This neatly fixes the history shortcut problem. (The late merge produces
> a very large LOC change from the merge source to the branch, making that
> node an expensive one to take - though it may well be taken once the
> mainlines overlapping changes to single lines/files start to dominate).

I agree. I think costing the paths relative to some factor (such as you
mention of count of modifications) would be something to try.

There are a couple of issues, though. Specifically, now you need the
inventory as well as the revision XML, because otherwise you can't tell
what has changed. This isn't a big issue, but the currently proposed
cset format only supplies the revision XML. I suppose we could decide on
some sort of cost for entries without an inventory (perhaps just make it
really,really high, since you can't use them as a merge base anyway,
though they might make their parent more attractive).

I think it is still cheaper for bzr to use "num_modified + .5*num_new +
.5*num_deleted", because bzr would have to extract the actual texts and
compare them. (Of course, as weave or revlib starts to become more
common, you might only have to extract the delta, rather than get 2
texts and compute a diff).

I might actually play with this one. I do like SPF, though it seems like
you still need to extract one sides complete history. Or are you
thinking that you could have 2 growing SPF lines, and then as soon as
they meet up, you are done.

Interesting ideas, though.
John
=:->

>
> Rob
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050711/f185ea5c/attachment.pgp