Changesets feature complete

Thu May 25 15:47:50 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Aaron Bentley wrote:
>>More: Branch bundle, repository fragment, repo shard.
> 
> 
> I can't help but think of "The Dark Crystal" when I hear shard.

Heh, that's a pretty good analogy: you put the shard into the main
repository, in order to merge your two branches into one.  "By Gelfling
hand, or else by none."  Does this mean all bzr users become Gelflings?
 I gotta try repository shards out on a female programmer, and see if
she grows wings.

> The problem with changeset is that it means 'set of changes'. And the
> issue with 'set' is that a set of a set is still a set. It doesn't
> really have a concept of double grouping. You can think of a delta
> between revisions as a set of changeset, and a group of those as a set
> of changesets, or changeset set.
> 
> I'm fine trying to create a new name for it, to help move away from any
> connotations.

Well, the name should intuitive, and I think 'revision bundle' might be
pretty good that way.

> 'darcs send' calls it a "bundle of one or more patches"
> 
> We could call it 'bzr bundle', but bundle is also a verb. And most
> commands are verbs.
> 
> Certainly we could just have the 'bzr send' command, and avoid the
> naming issues (for now).

I don't think we really can, because the term "changeset" appears in the
format header, which is the first thing people see.

>>In the example, the first diff (Test and fix...) is against the
>>changeset base, which defaults to the common ancestor.  All subsequent
>>diffs are against the rightmost parent.  Since the third revision (Merge
>>mainline) produced a tree that was identical to its rightmost parent, it
>>has no actual diff.
> 
> 
> The rightmost parent was chosen because of patches where another branch
> merges all of you, and then you merge them back. I'm not positive if it
> is the best pick of base, but it is an okay one.

When two branches merge each other periodically, I believe this is the
best choice.  I can't quite formulate my explanation at the moment.

When only one branch merges the other, as with Shelf/Bzrtools, it's not
necessarily a good choice, but it depends on how different the merging
branch is from the source branch.  Since Bzrtools is very different from
Shelf, a -r0..-1 changeset using rightmost ancestors is twice the size
of one using leftmost ancestors.  But I think this case is more rare.

Another alternative would be to generate both diffs, see which was
shortest, and use that.  (bases can be overridden on a per-revision basis)

> Will we tend to have this behavior in the real world? Obviously we have
> it happen on a few of our bzr trees, when we submit to mainline, and
> then later on we do a merge before we keep developing.

I think the mutual-merge case is the common case, and so it will work best.

> To draw it out:
> 
>   A-B-C-D-H-I
>    \     /   \
>     E-F-G-----J-K
> 
> In this case 'J' is almost always identical to I (and it is our current
> failing that we cannot make it 'I' and thus have convergence).
> 
> However, I think in this case, you would not get an empty merge:
>   A-B-C-D-H-I
>    \     /   \
>     E-F-G--J--K-L

Because the basis is the common ancestor, this case should simplify to

I
 \
  K-L

You won't get an empty diff, but it won't be very long, unless there are
big differences between G -- J.

> My concern is that now the patch for 'K' actually looks like the G->J
> difference. Since that is most likely the delta from I->K. Which means
> you would have G->J, and then J->K looks like you re-applied G->J.
> (Technically it is I->K, but you still end up with the same patch
> showing up 2 times).

Yes, repeated diffs are inevitable, no matter which ancestor you select.

> My other issue with selecting the rightmost parent is that it is not the
> actual changes that the person developing branch 'E' was reviewing. When
> you run 'bzr diff' it shows you the changes relative to your leftmost
> parent, not the rightmost (though sometimes it would be nice to be able
> to specify that :)

This is true.  But what it shows is the changes the committer
originated, which I think is more interesting.  So if they had to do a
lot of conflict resolution, you would see that.

> I'm wondering if we could detect something like this, and switch the
> base if the delta is empty, but otherwise always use leftmost. We would
> have to be explicit about which base the patches are against, rather
> than just using left/right implicitly.

Personally, I think it's less confusing to always do the same thing.
But the current format allows you to use any base you want at any time--
it's just the rightmost bases that are implicit.

>>On the other hand, it's hard to know how a changeset that was largely
>>opaque would be received on MLs like the lkml.
> 
> 
> Well, a lot of our design is based around getting it onto lkml. Since
> they want things that they can directly pipe through 'patch'. However,
> this sounds like a case where we need to hide all the extra patches,
> since otherwise it would apply the primary patch, and then a bunch of
> patches that can't be applied afterwards.

You have a good point.

I wonder whether we should have a 'patch-compatible' mode?  It would:
 - emit noisy patches for renames (delete with name X, create with name
   Y)
 - not base64-encode binaries (AIUI, patch *can* apply binary patches--
   it's just that diff won't emit them)
 - base64-encode all of the patches after the first
 - fail noisily if there were symlink operations.

Michael, what do you think of that?

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEdcOW0F+nu1YWqI0RAh1DAJ0ftIsy1k3TJeVOy4uao/ed7GocZwCeM2uh
t1lFBQfntPR0eFahL+/LzFM=
=M8dV
-----END PGP SIGNATURE-----