Discussion about merging

Fri Jun 3 18:52:53 BST 2005

I've been thinking about the bzr model, and how merging could/should
work. One thing that keeps bzr from being really useful right now is
that you can't keep your own tree, and just "bzr merge" the changes back
and forth. As I understand it, bzr merge pretty just works when the
ancestry is identical. (Which is a good start, make no mistake).

I'm thinking about the case where we are working on our own integration
branch. So while we wait for mainline integration, we want to keep a
tree with all of our patches, since they provide functionality that we
want. My primary motivation for getting native plugin support added, was
because I can add functionality in a plugin, and then not have to worry
about a separate branch as much.

My concern is that bzr records snapshots of the tree. And it doesn't
really make sense to say "this snapshot was merged", it would make sense
to say this patch/changeset was merged.

So really, when recording history, you need to record "I merged the
differences between these 2 revision ids into the current tree".
Maybe you could also list the in-between revisions. But since these are
actually tree-snapshots, they are truly merged in the same way.

Is it okay in bzr to say that a given revision-id is both the
tree-snapshot and the changeset between that revision and it's
precurser? Since the revision-store includes the precurser, it seems
like this might be valid.

So assuming that a merge just needs to include the list of revision ids
that it has merged, how do we prevent the next merge from having to do
an O(n^2) search to find the last revision? Because revision-ids are not
sequential, so you can't really do an in-order search. Trees also don't
have a unique identifier, right?

So to my mind, to merge tree-a into tree-b the appropriate merge has to:
    get a list of all revision ids in tree-a
    get a list of all merges in tree-b
    go sequentially through tree-a ids until you find one that is not
       present in tree-b
    alternatively go reverse sequentially until you find one that does
       exist in tree-b
    Use the found revision as BASE

You could probably filter out common ancestry to reduce the search space
(think of a root tree with 10k patches, and a branch with only 1 new
patch in it).

Also, you probably could keep the revision ids in a dictionary to allow
for hashed lookups rather than a sequential search.

What about merges from 3rd parties. eg. tree-a merges a patch from
tree-c, as does tree-b, when you merge tree-a => tree-b, you *don't*
want to merge the patch from c.

Also, I know there was still discussion about whether a merge is
presented as that specific revision-id in the revision history, or
whether it would be stored as some sort of roll-up merge.

Because if you don't do the roll-up merge, then you have to merge each
step one at a time. But at no point do you genuinely have the same
tree-snapshot to correspond with the revision id. Doesn't that mean you
*have* to use a roll-up merge in any time where the ancestry is not
identical?

I'm guessing this is some of the problems with using snapshots rather
than using changesets, but maybe I just have my head turned the wrong way.

Sorry this was so long, but hopefully some good comes out of it.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050603/43513400/attachment.pgp