Fixing rebase rather than avoiding it

Stephen J. Turnbull stephen at xemacs.org
Fri Mar 5 17:03:14 GMT 2010


Ben Finney writes:
 > "Stephen J. Turnbull" <stephen at xemacs.org> writes:
 > 
 > > Stefan clearly thinks it's possible, and probably useful, to develop
 > > "safe rebasing".
 > 
 > I look forward to an explanation of what this would mean, not least
 > because it will probably clear up various misunderstood details I hold.

OK, with the caveat that sometimes Stefan and I think alike, and
sometimes we don't.  I don't speak *for* Stefan, but I won't be
surprised if he agrees with much of what I say.

IMO, there are two important issues to deal with in rebasing.  The
first is the social issue of overloading branch names (the publication
issue), and the second is quality assurance.  The second is more
straightforward to deal with than the first, although perhaps less
familiar, so I'll start with that.

The quality assurance problem is that a rebase is a (complex) merge.
This is the fundamental reason why it would be a bad idea to allow
editing of the DAG in the sense that Matthew Fuller correctly says is
impossible in all of the DAG-based VCSes -- to understand the context
of the merged content, you need to know its ancestry.  "Grafting" a
branch onto the end of another changes the ancestry of the base of the
grafted branch, and therefore of all its descendents.  Thus each
rebased revision must be considered a *different* revision from
corresponding revision in the source branch.

Now, "professional" developers have gotten used to running tests on
their changes, but it is less than obvious that rebasing means you
have to run the tests again, on *every* rebased commit that policy
would demand testing if it were made de novo.  Since this is a matter
of policy (and for rebase might differ from that for other merges), in
the context of Bazaar this would be dealt with by providing a
post-rebase hook, I think.  (Maybe a post-merge hook could do double
duty for this, but only one commit, the merge commit, needs testing
after a normal merge, while all of the rebased commits may need
testing after a rebase, eg, to preserve bisectability.)

The overloading issue is far more complex, and is deeply concerned
with the semantics of "branch".  First, let's look at git, where a
branch is "just" a set of commits, each linked to its parent.[1]  Then
a branch is just a *head*, ie, a reference to a commit which can
become the parent of a new commit.  If that happens, the head is
automatically updated by the commit command to point to the child.
Then the branch is just the sequence of parents, *going all the way
back to the root commit*.[2]

So here the semantics of rebase are entirely determined by the DAG.
Here's probably the most common case of rebasing, rebasing onto a
later commit on the mainline.  *Mainline* commits are denoted by
digits, *upstream* branch commits (ie, by the rebaser) are uppercase
letters, and *local* branch commits (by the victim) are lower case.

mainline    0 --- 1
             \     \
upstream      ` A   ` A'
                 \
local             ` a

The history is that the rebaser branches from the mainline at 0,
making commit A.  The victim then branches from A, making commit a.
The mainline moves on, making commit 1.  The rebaser then rebases 0-A
to 1-A'.  If the victim now pulls, what happens?

1.  The VCS fetches commits 1 and A', and their dependent content.
2.  Since the VCS only knows the branch as a ref, it checks to see
    if the pull is a fast-forward, and it's not.  *Even if commit a
    were not made, the VCS would see this as a local branch unrelated
    to A'.*  The VCS decides to merge.
2.  The VCS looks for the common ancestor of a and A', and finds 0.
3.  The VCS attempts a three-way merge with base 0, left a, and right
    A'.

But this will result in conflicts, because the same changes are
present in A and A'.  The right thing to do here is to rebase a on top
of A' (and in a less trivial context, merge new content from upstream
into that), but the VCS can't know that....

How can this be made safe?  Well, in fact git *can* discover the fact
of rebasing, and even make a pretty good guess at what the rebase was.
The reason is that the local git keeps a *tracking branch* which is
identical to the upstream branch (with colocated branches this is very
efficient).  So when you pull, you fetch the new content to a
temporary branch and check if the temporary branch is a fast forward
of your tracking branch for upstream.  If not, it's been rebased.

To identify the new base for the local branch, you find the most
recent common ancestor of your local branch and the tracking branch
for upstream.  Then you look for a commit in the recently fetched
branch with the same metadata (except parents, of course).  For
confirmation you can compare diffs of that commit against its parent
for the tracking branch and for the proposed base.

To make this robust, you would want to extend the format of commit
objects with an "equivalent" field, I think.  This field would be set
by the cherry-pick and rebase operations.  For backward compatibility,
this could be implemented "out-of-band" using a mechanism similar to
that used in recent git to implement commit notes.  (In fact, it would
have to be done that way to implement "equivalent" symmetrically,
since you can't have an equivalent field in the original object that
points to the future cherry-pick!)

If a rebase were detected, you'd also have to update the reflog (to
satisfy Stefan's requirement of preserving (access to) historical
versions of the branch).  (The reflog is a device used by git to track
changes to heads, which includes not only commits but also pulls,
merges, rebases, resets, and so on.)

How would this work in Bazaar?  I'm not entirely clear on the
implementation of Bazaar, but I suppose for a first cut you could try
something similar to the above.  My understanding is that enough
support for tracking (ie, colocated) branches in the sense described
above is already present (needs to be, to support merging).  However,
Bazaar semantics for user-visible branches are quite a bit heavier.
Mostly I don't think they matter to rebasing.  However, since Bazaar
doesn't support user-visible colocated branches, there's no reflog.
That would need to be implemented, I think.

Footnotes: 
[1]  This isn't quite true; git is somewhat left-handed, more so than
Mercurial, though not nearly so much as Bazaar.  Thus what developers
think of as a branch tends to be a set of commits linked through
*leftmost* parents, but this often gets confused when cherry-picking
or synchronizing a long-lived branch with its parent, especially in
cases where the flows of code are bidirectional.

[2]  Thus, related branches share ancestry from the root up to the
node, and there is no way to determine which branch is the mainline
from the commit data alone.



More information about the bazaar mailing list