[RFC] Cherry-picking, reordering revisions and 2D versioning

Tue Jan 17 14:35:33 GMT 2006

On Sun, Jan 15, 2006 at 16:19:09 +0100, David Allouche wrote:
> On Fri, 2006-01-13 at 11:13 +0100, Jan Hudec wrote:
> > Well, it was a brain-storming. Yes, it can be simplified.
> 
> Actually, I just carefully re-read the whole thread and I think we are
> basically talking about the same thing here.
> 
> The main distinction I see is that I do not want to introduce
> reincarnation, but just pick-merging. Which is anyway the best you can
> hope since, in your proposal, cherry-picking involves a review step
> where the user can include arbitrary changes. Those arbitrary changes
> cannot be limited to conflict resolution and will normally contain new
> work, for example fixing typos.

I think this is where the users should be careful, for their own good.
The system should be able to fix mess with clean work, but does not need
to fix it with another mess.

> > All equivalent functionality of quilt/stgit/mq can be built without it.
> > The logic that I want to sort out is what it takes to allow merging the
> > stacks. Ie. when I pull a branch, use stack operations on revisions
> > already there, branch from the middle of the new stack and then attempt
> > to merge back to the original branch. And someone does similar thing and
> > tries to merge back too. It might work with just picked-ancestry, but
> > needs to be checked.
> 
> Oh my. Stgit does NOT support that?

I don't think stgit supports pushing stacks around and merging them. I
have not seen a manual (no links to it in first pageful of Google
answers either) for stgit, but I don't think it actually keeps the old
versions of the stack around, the less in a way that would allow merging
them.

> It shows that I have not carefully checked the prior art before starting
> to think about bzr cherrypicks. I just naively assumed that the
> oh-so-great stgit would support the kind of operations you describe.

I am not sure, but I would not expect it -- the kernel workflow does not
require it.

> So, yes, my picked ancestry idea is very much designed to remove the
> distinction between queues and branches and to support queue publication
> and merging.

Yes, that's I want too. And I don't think stgit or mq aim for that.

> > > The common case that needs an extension of the current model is indeed
> > > patch commutation, and it's just a special case of cherry-picking. It
> > > would be sufficient to keep a list of "picked ancestors" that contains
> > > only revision ids that are not part of the normal ancestry.
> > 
> > Well, It would need picked-ancestors for the moved down patch and
> > equivalent-revisions for the moved up patch. Consdier revision A and
> > revision B based on it. You reorder it, so you get B2[picks=B] and
> > A2[parent=B2, equals=B]. That is so that if you have D[parent=A2] and
> > another branch has C[parent=B], you can merge from that branch, taking
> > A2 as base.
> 
> If I understand you correctly you mean the following:
> 
>          P ---> A ---> B ----------> C
>           \             \
>            `---> B2 -----`-> A2 ---> D
>                  [B]
>          
>         Lines denote ancestry. Names in bracket denote picked-ancestry.
>         
> "Taking A2 as the base" is doubly wrong.
> 
>       * It assumes 3-way merging, which goes quickly insane in
>         real-world cherry-picking situation. Robust merging in the
>         presence of cherry-picks requires weave merging.
>       * Assuming 3-way merging, the right base is B, because A2 may
>         contain additional changes applied during the review steps of B2
>         and A2.

Well, weave-merging is kind-of 3-way merging with per-line bases. And
certainly weave-merge reduces to 3-way merge where the base is defined
(which it is not in presence of cherry-picks).

> > Thinking of it again, if A2[parent=B2,B], then this case would merge
> > properly too. The question is whether it would suffice for other
> > operations like revision spliting.
> 
> Yes, exactly what I mean. Revision splitting is annoying.
> 
> There is a worse-is-better way of handling splitting: don't. Consider
> that B is is split into B1 and B2.
> 
>  P ---> A ---> B
>   \      
>    `---> B1 ---> B2
>          [B]
> 
> In that case, picking just B1 would be the same, ancestry-wise as
> picking B. That would be enough to address the "Quux bugfix" use case
> you describe on PatchReordering, so I'm inclined to just go for it.
> 
> I do not have solid proposal for accurate split support, just fuzzy
> ideas at the moment. But splitting diffs is just such a PITA that I do
> not expect it to be very important.

Well, in what I had in mind, it would happen kind-of by the way. You
would cherry-pick, manually (but you have to do that if all you have is
a mess), the changes you need and bzr would generate the rest for you.
Certainly that's what is done in the "Quux use-case".

> > Yes, that's clear and makes sense. And should probably work, though the
> > details of weave-merge (3-way merge won't be able to deal with this)
> > will still be a bit tricky.
> 
> I have been repeatedly assured by some of the few that really understand
> weave merging (Martin and Aaron) that weaves support cherry-picking by
> design. From what I can tell, weave-merge support for cherry-picks, is
> kind of just a matter of actually coding it.

As I've seen the current format, I don't think it's ready. It will need
some rewrite. I hope the knits will be ready for it.

> What picked-ancestry provides is the ability to make reasonably
> meaningful diff3 merging, which I regard as very desirable since it's
> more predictable and is generally more user friendly in case of
> conflicts. Also, I believe referencing text ids from picked ancestry
> would be useful to allow store garbage collection in the presence of
> cherry-picked weaves.
>
> > Yes, many revisions will be created this way. For common stbzr
> > operation, where only the final stack should be published, they could be
> > pruned. If you make a new version of already published stack, they would
> > of course stay.
> 
> Basically two approaches there: severing ancestry, ghost revisions. But
> as I said, it's premature optimization.

Yes. Optimizations can wait until it actually works.

> > > If you have real world use case that you do not think would be addressed
> > > adequately by this model, please share it with me.
> > 
> > Well, they probably are addressed adequately. Though I am not sure how
> > the partial cherry-pick (I only pick some of the changes, because the
> > other parts fix code I didn't merge yet, or are screwed or whatever)
> > will work out. Maybe it will, but I don't see it.
> 
> I hand-waved that the "Quux bugfix" use case is supported by my model.
> You are calling my hand-waving. Okay, let's look at it.
> 
> At first, you are just coding on Foo, and you have a Quux bugfix.
> 
>         P ---> Foo1 ---> Quux
> 
> A Bar programmer wants the Quux patch without the Foo goo, so he does a
> partial cherrypick.
> 
>         P ---> Foo1 ---> Quux
>          \
>           `--> Bar1 ---> Bar2
>                         [Quux]
> 
> Note that Bar2 does not contain all the changes in Quux, as the Bar
> programmer removed the bits that were associated changes to the Foo
> code.
> 
> Then comes a qizzy programmer, who has neither Foo1 nor Bar1 on his
> branch, and who also needs the Quux bugfix. I think your use case is a
> bit bogus because at no point did anybody prepare a clean Quux-fix
> branch based on the mainline. Nevertheless, the qizzy programmer now has
> a clean Quux-fix patch to cherrypick.
> 
> 
>         P ---> Foo1 ---> Quux
>         |\
>         | `--> Bar1 ---> Bar2
>         |               [Quux]
>          \
>           `--> qyz1 ---> qyz2
>                      [Quux, Bar2]
> 
> Now, we have no actually used the cherry-picked ancestry yet. So let's
> try to find out where plain diff3 merging is going to fall short.
> 
> Consider that Foo2 gets merged into mainline first, then the Bar
> programmer merges mainline. For the sake of readability I will not show
> the mainline branch on the graph: merging a branch that merged Foo2 is
> the same problem as merging Foo2 directly.
> 
> 
>         P ---> Foo1 ---> Quux ---> Foo2
>          \                           \
>           `--> Bar1 ---> Bar2 --------`-> Bar3
>                         [Quux]
> 
> Either with the current ancestry model or with picked-ancestry, the only
> meaningful ancestor is going to be P. But for weave merge, we expect
> that the Quux changes are already present in the weave in the ancestry
> of Bar2 changes, so weave merge can avoid conflicts generated by diff3.
> However, these conflicts could be meaningful, because they could be
> caused by the removal of changes required by Foo.

The weave-merge should actually do bettern than that. Because it will
not skip the whole Quux -- it knows which concrete lines were picked.

> Screwed isn't it?
> 
> Let's consider the converse, merging Bar into Foo.
> 
> 
>         P ---> Foo1 ---> Quux --,-> Foo
>          \                     /
>           `--> Bar1 -----> Bar2 
>                           [Quux]
> 
> Again, the only meaningful base for diff3 is P. The conflict situation
> is similar to the previous case: some changes made on the Quux bugfix in
> Bar2 may conflict using diff3 and be applied without conflict with weave
> merge.
> 
> You may think that the Foo programmer would review the changes and find
> easily about the issue, but in my experience, I never review mainline
> merges. If stuff got into mainline, it has already been audited and
> tested and it is supposed good.
> 
> So, apparently, this scheme is not going to work. That's really
> annoying, do you have concrete suggestions about how to improve that?
> 
> At the moment we need to tell the Foo branch which changes in the Quux
> patch were Foo-specific and need to be preserved when merging branches
> that have picked Quux. So we create a Quux-bugfix branch, and merge it
> into Foo2.

Again this could be actually smaller problem than you think, because the
Bar2 does not pick all of Quux and undo parts of it, but rather picks
part of Quux and add a few bits of it's own. In which case the remaining
bits of Quux will remain from the Foo branch. A problem would be if I
actually picked all of Quux and then undone parts of it, but that is
fixing mess with mess and does not need to work IMHO.

> The diff(Quux, Foo2) should only contain the substantial changes made in
> Quux1, like fixing typos an style in the Quux bugfix, but no Foo-stuff
> removal or conflict resolution that is apparent in
> diff(Quux, patch(P, diff(Foo1, Quux))).
> 
>         P ---> Foo1 ---> Quux --,-> Foo2
>          \                     /
>           `--> Quux1 ---------'
>                [Quux]
>  
> Now, Bar can merge the Quux-bugfix branch.
> 
>         P ---> Foo1 ---> Quux --,-> Foo2
>          \                     /
>           \-------> Quux1 ----'
>            \        [Quux]
>             \           \
>              `--> Bar1 --`-> Bar2
>                              [Quux]
> 
> And when merging Foo2 and Bar (in any direction), Quux1 can be used as a
> diff3 merge base.

Which is what I, as a programmer knowlegable about best practices of
versioning, would do. A bit more work now to make the merging a bit less
pain later.

> Then the qyzzy programmer finds out he needs the same bugfix. Ideally,
> the Quux bugfix would have been merged into the trunk at that point, but
> imagine for a moment that reality got in the way and that PQM that
> commits to mainline was unusable for a couple of days.
> 
> 
>         P ---> Foo1 ---> Quux --,-> Foo2
>          \                     /
>           \-------> Quux1 ----'
>            \        [Quux]
>             |            \
>             |            |\
>             |\           | \
>             | `-> Bar1 --|--`-> Bar2
>             |            |      [Quux]
>              \            \
>               `-> qyz1 ----`--> qyz2
>                                 [Quux]  
> 
> As far as the Bar and qyz branches are concerned, the Quux bugfix is
> just a common parent. The cherry-picking does not matter to them. 3-way
> merging of Bar2 and qyz2 will just use Quux1 as a base instead of P.  
> 
> Now, the slightly embarrassing bit for me is that the picked ancestry
> makes no difference. You can get this behavior today with bzr. Which is
> fine with me because the picked ancestry value lies in other use cases.

It does nake a lot difference actually. But mainly as present as part of
the weave anotation. The per-revision record of it would start to be
important when more than one pick is around -- for deciding which
changes should override each other and which should conflict.

... yes, seems the picked-ancestry (properly represented in the
weave/knit too) would be enough to make this kind of stuff work.
Thanks for the analysis.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060117/89313fa1/attachment.pgp