Fixing rebase rather than avoiding it

Thu Mar 4 02:19:17 GMT 2010

>> Rebase should not destroy history: it should only alter history by
>> adding more arcs (and nodes) into the DAG, basically providing an
>> alternate history.  I understand that adding a parent is not something
>> that Bzr supports right now, but that doesn't change anything to the
>> fact that it would be the right thing to do.
> Bazaar (like Git or Mercurial) doesn't support modifying history; this
> includes adding parents. 

Just because it currently can't do it doesn't mean that it will never be
able to do it and even less that it should not do it.

> OTOH It could add the original revision as one of the parents to the
> newly created revisions as part of the rebase, but I'm not sure of how
> much value this is -

This would make it perfectly acceptable to rebase a branch that's
published, i.e. it would eliminate the reason to discourage the use
of rebase.

> it seems like it would just cause disk usage to go through the roof
> and confuse the hell out of users because their history would include
> a lot of (seemingly) duplicate revisions.

You're thinking about it the wrong way: you're thinking of all the
problems that a naive approach would generate.  Try to think about it
from the point of view that "we absolutely want this feature" and then
try and figure out how to make it work well.

> The whole point of rebase (as I understand it from Git users) is to
> clean up history and get rid of those old commits.

I don't think the purpose of rebase is to save space.  So yes, it is to
clean up history so you can understand the current head of a branch as
"head of parent plus this set of patches" rather than an interleaving of
patches and merges.  But if you keep *both* versions of the history, you
leave it up to the end user to choose which version of the history he
wants to look at.  So if he's got the parent' head he'll want to use the
rebased history, whereas if he's got some old head of the branch, he'll
prefer to use the non-rebased version of the history.

> bzr-rebase already sets a 'rebase-of' revision property that contains
> the revision id of the revision the newly created revision is
> a rebase of.

Right, this could be part of the answer, but "bzr pull/merge" would need
to make use of it.

> Please also note that while "bzr dpush" has similar behaviour to rebase
> in that it appears to rewrite existing revisions, this is done for
> different reasons - the revisions are rewritten to exclude anything that
> can not be represented in the target vcs.

Yes, the same holds for branch-filter.  These are slightly different
because they don't necessarily need to rewrite history.  Instead they
ned to create each of their new revisions with 2 parents.  E.g. when
rewriting revisions A1-B1-C1 to A2-B2-C2 by removing some of the files,
the resulting revisions should not look like:

       A2 -> B2 -> C2

but

       A2 -> B2 -> C2
       ^     ^     ^
       |     |     |
       A1 -> B1 -> C1

Of course, if the file-removal was done because you want to save disk
space (e.g. removing 90% of the files), or so as to remove copyrighted
material, this would only work if you can have such a DAG while still
getting rid of the data associated with A1,B1,C1.

        Stefan