Changesets feature complete

John Arbash Meinel john at arbash-meinel.com
Thu May 25 19:34:43 BST 2006


Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>> More: Branch bundle, repository fragment, repo shard.
>>>
>>> I can't help but think of "The Dark Crystal" when I hear shard.
> 
> Heh, that's a pretty good analogy: you put the shard into the main
> repository, in order to merge your two branches into one.  "By Gelfling
> hand, or else by none."  Does this mean all bzr users become Gelflings?
>  I gotta try repository shards out on a female programmer, and see if
> she grows wings.
> 

:)

>>> The problem with changeset is that it means 'set of changes'. And the
>>> issue with 'set' is that a set of a set is still a set. It doesn't
>>> really have a concept of double grouping. You can think of a delta
>>> between revisions as a set of changeset, and a group of those as a set
>>> of changesets, or changeset set.
>>>
>>> I'm fine trying to create a new name for it, to help move away from any
>>> connotations.
> 
> Well, the name should intuitive, and I think 'revision bundle' might be
> pretty good that way.

# Bazaar revision bundle v0.7
#
# message:
#  Add ghosts to TODO
# committer: Aaron Bentley <aaron.bentley at utoronto.ca>
# date: Sat 2006-05-20 19:04:49.732064009 -0400

It would be nice if bzr went ahead and dropped the resolution on commit
timestamps. I really don't think we need better than 1s resolution
there. And at best would be 1ms. Having 1ns resolution is a little bit
silly. (I realize some of that is just needing to round-trip a float).

> 
>>> 'darcs send' calls it a "bundle of one or more patches"
>>>
>>> We could call it 'bzr bundle', but bundle is also a verb. And most
>>> commands are verbs.
>>>
>>> Certainly we could just have the 'bzr send' command, and avoid the
>>> naming issues (for now).
> 
> I don't think we really can, because the term "changeset" appears in the
> format header, which is the first thing people see.
> 

...

> Another alternative would be to generate both diffs, see which was
> shortest, and use that.  (bases can be overridden on a per-revision basis)

I don't think that pure shortest is always best.

> 
>>> Will we tend to have this behavior in the real world? Obviously we have
>>> it happen on a few of our bzr trees, when we submit to mainline, and
>>> then later on we do a merge before we keep developing.
> 
> I think the mutual-merge case is the common case, and so it will work best.
> 
>>> To draw it out:
>>>
>>>   A-B-C-D-H-I
>>>    \     /   \
>>>     E-F-G-----J-K
>>>
>>> In this case 'J' is almost always identical to I (and it is our current
>>> failing that we cannot make it 'I' and thus have convergence).
>>>
>>> However, I think in this case, you would not get an empty merge:
>>>   A-B-C-D-H-I
>>>    \     /   \
>>>     E-F-G--J--K-L
> 
> 
> Because the basis is the common ancestor, this case should simplify to
> 
> I
>  \
>   K-L
> 
> You won't get an empty diff, but it won't be very long, unless there are
> big differences between G -- J.

Yeah, I was trying to figure out how exactly pure merges work with
ancestry and what needs to be transmitted.

> 
>>> My concern is that now the patch for 'K' actually looks like the G->J
>>> difference. Since that is most likely the delta from I->K. Which means
>>> you would have G->J, and then J->K looks like you re-applied G->J.
>>> (Technically it is I->K, but you still end up with the same patch
>>> showing up 2 times).
> 
> Yes, repeated diffs are inevitable, no matter which ancestor you select.
> 
>>> My other issue with selecting the rightmost parent is that it is not the
>>> actual changes that the person developing branch 'E' was reviewing. When
>>> you run 'bzr diff' it shows you the changes relative to your leftmost
>>> parent, not the rightmost (though sometimes it would be nice to be able
>>> to specify that :)
> 
> This is true.  But what it shows is the changes the committer
> originated, which I think is more interesting.  So if they had to do a
> lot of conflict resolution, you would see that.

You see conflict resolution, but don't you see the entire set of changes
bundled up as well? If I do:
   A-B-C-D-H-I
    \     /   \
     E-F-G-J-K-L-M

Then the common ancestor for submitting M should be I, and the L change
would be a rollup of all of J,K, and L. You still need to send J and K
because of I doesn't have them in the ancestry. So you end up sending
J, K, (J+K+conflict resolution), M

The alternative would be to send
J, K, (H+I+conflict resolution), M

I don't specifically see duplication in the alternative version. Though
I guess H+I is already known to I.

Any way that we could perform the merge, and then just have 'L' be the
changes between that and the merge? Then it would be:

J, K, conflict resolution, M

It would require that merges are deterministic, which we probably don't
have. But it should would be a nice display. :)

> 
>>> I'm wondering if we could detect something like this, and switch the
>>> base if the delta is empty, but otherwise always use leftmost. We would
>>> have to be explicit about which base the patches are against, rather
>>> than just using left/right implicitly.
> 
> Personally, I think it's less confusing to always do the same thing.
> But the current format allows you to use any base you want at any time--
> it's just the rightmost bases that are implicit.
> 
>>>> On the other hand, it's hard to know how a changeset that was largely
>>>> opaque would be received on MLs like the lkml.
>>>
>>> Well, a lot of our design is based around getting it onto lkml. Since
>>> they want things that they can directly pipe through 'patch'. However,
>>> this sounds like a case where we need to hide all the extra patches,
>>> since otherwise it would apply the primary patch, and then a bunch of
>>> patches that can't be applied afterwards.
> 
> You have a good point.
> 
> I wonder whether we should have a 'patch-compatible' mode?  It would:
>  - emit noisy patches for renames (delete with name X, create with name
>    Y)
>  - not base64-encode binaries (AIUI, patch *can* apply binary patches--
>    it's just that diff won't emit them)
>  - base64-encode all of the patches after the first
>  - fail noisily if there were symlink operations.
> 
> Michael, what do you think of that?
> 
> Aaron

Yeah, I wonder about base64-encode of the compressed patches. (Since if
you are encoding, you might as well save some space).

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060525/a77f3239/attachment.pgp 


More information about the bazaar mailing list