Fixing rebase rather than avoiding it

Sat Mar 6 05:25:05 GMT 2010

Stefan Monnier writes:

 > Just because it currently can't [rewrite ancestry of commits "in
 > place"] doesn't mean that it will never be able to do it and even
 > less that it should not do it.

Nevertheless, IMO it should not be able to do it.  Ancestry is what
makes two commits with other metadata identical into different
commits, and for a number of reasons (forensic analysis as Ben Finney
mentions, intended presentation by authors, etc) I think that parents
should remain part of the identity of commits.

 > > OTOH It could add the original revision as one of the parents to the
 > > newly created revisions as part of the rebase, but I'm not sure of how
 > > much value this is -

Let's call this the "added-parents implementation".  The alternative
(which I proposed elsewhere) is the "annotation-based implementation".

 > This would make it perfectly acceptable to rebase a branch that's
 > published, i.e. it would eliminate the reason to discourage the use
 > of rebase.

This is incoherent, however.  Remember, a rebase is conceptually
moving an arc, while a merge creates new ones.  Also, the content of
the *old* base is *not* merged; the rebased branch really has been
severed from that ancestry.

Jelmer writes:

 > > it seems like it would just cause disk usage to go through the roof

I think this is not the case.  In the most common case, suppose you
rebase branch A sprouting from commit 0 on the mainline to A'
sprouting from commit 1 later on the mainline.  Then in the best case
(no file changed on 0..1 is changed on A) the cost of this is #(A)
commit objects and O(#(A)) tree objects, because you just mix and
match file objects.  Depending on the representation of tree objects
(eg, in git, common subtrees are shared), the extra tree storage might
very well be just #(A).  Even if there are files that were changed in
both lines of development, they should be a small fraction.

If bzr doesn't already have such a format, you know where to find
one. :-)

 > > and confuse the hell out of users because their history would include
 > > a lot of (seemingly) duplicate revisions.

This is true in the case of your proposal to include the original base
as a parent of the post-rebase branch.  However, in the case of an
annotation-based implementation, it would not show up as such.

It might also be possible to add a flag to the obsolete commits
somehow so that bzr log would (normally) suppress them in the
added-parents implementation.  I don't think that's a good idea
because the semantics of rebase is that the branch has been *moved*
(rebase) or *copied* (cherry-pick).  In either case, users *do not*
expect there to be a strong ancestry relationship between the
pre-rebase branch and the post-rebase branch; they want the VCS to
handle that automatically, behind the scenes.  Otherwise, they would
have done an explicit merge.

 > > The whole point of rebase (as I understand it from Git users) is to
 > > clean up history and get rid of those old commits.

It's a fine point, but I would say the point is to clean up the
*presentation* of history.  Or as Stefan puts it:

 > I don't think the purpose of rebase is to save space.  So yes, it is to
 > clean up history so you can understand the current head of a branch as
 > "head of parent plus this set of patches" rather than an interleaving of
 > patches and merges.  But if you keep *both* versions of the history, you
 > leave it up to the end user to choose which version of the history he
 > wants to look at.

 > So if he's got the parent' head he'll want to use the rebased
 > history, whereas if he's got some old head of the branch, he'll
 > prefer to use the non-rebased version of the history.

I suggest he probably should be told that he has the non-rebased
version, and be asked whether he wants to update to the post-rebase
version or do something else.

 > > Please also note that while "bzr dpush" has similar behaviour to rebase
 > > in that it appears to rewrite existing revisions, this is done for
 > > different reasons - the revisions are rewritten to exclude anything that
 > > can not be represented in the target vcs.
 > 
 > Yes, the same holds for branch-filter.  These are slightly different
 > because they don't necessarily need to rewrite history.  Instead they
 > ned to create each of their new revisions with 2 parents.

But male and female He must create them.  Er, make that as parent and
metaparent. ;-)

 > E.g. when rewriting revisions A1-B1-C1 to A2-B2-C2 by removing some
 > of the files, the resulting revisions should not look like:
 > 
 >        A2 -> B2 -> C2
 > 
 > but
 > 
 >        A2 -> B2 -> C2
 >        ^     ^     ^
 >        |     |     |
 >        A1 -> B1 -> C1

This can only work *well* if the vertical links are different in kind
from the horizontal ones.  Also (nitpick) the implementation of the
links will go in the reverse direction (the child points at the parent).