Fixing rebase rather than avoiding it

Sat Mar 6 08:01:28 GMT 2010

Stefan Monnier writes:

 > It is an arc.  Or rather, since the two commits have identical
 > contents, the two commits should be merged, which is why I'm saying
 > it should simply modify the existing commit by adding to its
 > history.

As patches, the two commits have equivalent contents, but in general
not identical (in some cases the abstract changes will be different
due to merge conflicts).  As revisions, the content will be quite
different.  My claim is that we want to be careful to avoid conflating
patch with revision in this context, although most of the time we do
not need to think too much about the matter.

 > > I don't think it can be represented by an arc between commits,
 > > because that would look like a merge,
 > 
 > What's wrong with that?

It's not a merge!  In particular, the content of the old base revision
is not merged into the rebased revisions.  The diff between a rebased
revision and the old base revision is likely to confuse the hell out
of the user, to borrow a phrase.[1]  It *is* useful to know that
sometimes, but I would normally diff the new base against the old
base to avoid polluting that information with rebased changes.

It is true that in the common case of a forward rebase on the same
mainline, that part of history is a subset of the history of the new
base, but this is just an accident, not part of the design of the
rebase operation.  Does it help if I mention that this common case is
what GNU Arch calls a "tla update"?

OTOH, rebase itself is implemented in the same way as "tla replay",
but the semantic contexts are different.  "tla replay" is intended to
work as "bzr merge --pull", while "git rebase" is a more generic
replay operation that can be used for cherry-picking.

 > > I agree we'd like to be able to represent that idea, but I'm not sure
 > > how big a loss it is for day-to-day operations.
 > 
 > Let's simply imagine that you want to rebase a published branch.
 > Of course, it's currently not a "day-to-day" operation for one good
 > reason: current "rebase" would screw everyone who branched off of
 > that branch.

That's not what I meant by "day-to-day operation".  I'm talking about
"when would you care to know that the branch was rebased?"  IMO, in
day-to-day operations, the answer is mostly "don't bother me with such
trivia, just DTRT."  IMO most users would want to be notified of the
rebase at pull time, would then go look at the new log of the rebased
branch a little more carefully than usual ... and very likely never
again refer to the obsolete history's log.  The only time I can
imagine is when doing forensic analysis on really mysterious breakage.

When would you refer to the obsolete history?

(BTW, I don't insist on calling it "obsolete" if you feel that it's
useful to use another adjective.  That is the way I understand it,
though.)

 > as long as the connection between the new commit and the old one is
 > not recorded and usable by "pull/merge", you won't be able to use
 > rebase on a published branch.

Of course.  Nobody is saying otherwise.  I'm simply proposing that the
relationship between target and source of a rebase is different from
the relationship between a commit and its parent, and should be
implemented differently.

Footnotes: 
[1]  Any frequent user of Mercurial has surely been confronted with
such wacko diffs.  I assure you, they confused the hell out of me.