Discussion about merging

Fri Jun 3 20:58:37 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
| My primary motivation for getting native plugin support added, was
| because I can add functionality in a plugin, and then not have to worry
| about a separate branch as much.

Hear, hear!

| My concern is that bzr records snapshots of the tree. And it doesn't
| really make sense to say "this snapshot was merged", it would make sense
| to say this patch/changeset was merged.

Another way of putting it would be "the changes introduced by this
revision were merged".

| Is it okay in bzr to say that a given revision-id is both the
| tree-snapshot and the changeset between that revision and it's
| precurser? Since the revision-store includes the precurser, it seems
| like this might be valid.

I should point out that the precursor is defined for every revision, but
~ it's conceivable that the precuror is not accessible.  We do want to
provide a way for people to set a "history horizon".

Anyhow, it's defined, but may not be in the revision store.  However,
all we need is for it to be defined.

| So assuming that a merge just needs to include the list of revision ids
| that it has merged, how do we prevent the next merge from having to do
| an O(n^2) search to find the last revision? Because revision-ids are not
| sequential, so you can't really do an in-order search. Trees also don't
| have a unique identifier, right?

Revision-ids aren't sequential, but the revision-history list is.  Do
you really mean last revision, or last mege?

| So to my mind, to merge tree-a into tree-b the appropriate merge has to:
|    get a list of all revision ids in tree-a
|    get a list of all merges in tree-b
|    go sequentially through tree-a ids until you find one that is not
|       present in tree-b
|    alternatively go reverse sequentially until you find one that does
|       exist in tree-b
|    Use the found revision as BASE

Finding the common base should be a symmetrical operation.  It shouldn't
matter whether you're merging A into B or B into A.

So I think this can work:
1. get an iterator of revisions in A
2. get an iterator of revisions in B
3. get the next A revision
4. see if it merged any revisions in B-ANCESTORS.  If so, set it as
A-BASE and stop iteration through B
5. see if it is listed in B-MERGES.  If so, set the revision that merged
it as A-BASE and stop iteration through A
6. add it to A-ANCESTORS
7. add its merges to A-MERGES
8. get the next B revision
9. see if it merged any revisions in A-ANCESTORS.  If so, set the
revision as B-BASE, and stop iteration through A
10. see if it is listed in A-MERGES.  If so, set it as B-BASE and stop
iteration through B
11. add it to B-ANCESTORS
12. add its merges to B-MERGES
13. goto 3
14. compare A-BASE and B-BASE to see which merged the other.

| Also, you probably could keep the revision ids in a dictionary to allow
| for hashed lookups rather than a sequential search.
|
| What about merges from 3rd parties. eg. tree-a merges a patch from
| tree-c, as does tree-b, when you merge tree-a => tree-b, you *don't*
| want to merge the patch from c.

That's a mesh-merge problem.  And as you may have noticed, mesh-merge
sometimes becomes mess-merge.

But the technique, AIUI it is:
1. Find the set intersection of all the merges of both trees.
2. Find the non-cherry-pick merge that has merged everything else in the
~ set intersection.

| Also, I know there was still discussion about whether a merge is
| presented as that specific revision-id in the revision history, or
| whether it would be stored as some sort of roll-up merge.

Pardon?

| Because if you don't do the roll-up merge, then you have to merge each
| step one at a time. But at no point do you genuinely have the same
| tree-snapshot to correspond with the revision id. Doesn't that mean you
| *have* to use a roll-up merge in any time where the ancestry is not
| identical?

If the ancestry is identical, you don't need merge.

But you don't have to do roll-up commits, either.  You can do a series
of commits, starting with the first descendant of BASE, and moving on to
the next until you reach the most recent.

| I'm guessing this is some of the problems with using snapshots rather
| than using changesets, but maybe I just have my head turned the wrong way.

The one converts to the other rather easily, so use whatever makes the
most sense in the problem space.  There's no reason this can't work
exactly the same as Arch does.  (But we'd rather have it work the way
Codeville does.)

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCoLZt0F+nu1YWqI0RAv1/AKCI5N+KA+2LIH+63qAa+zcFDm1ODQCfSuXw
B0NE8UCVU6UkNZ6LFnvc6xk=
=qZ8T
-----END PGP SIGNATURE-----