Will re-basing support be added into Bazaar core ?

Mon Apr 20 18:43:07 BST 2009

Robert Collins writes:

 > When users talk about altering history, particularly in the context
 > of rebase, they are not talking about altering the content of
 > commits in the $vcs database.  Rather they are talking about how
 > the order and content of commits reachable from a given human name
 > change when a rebase operation is executed.

But this usage is very problematic.  Such users are in a state of
cognitive dissonance anyway.  Use of rebase only ensures that they'll
come face-to-face with it sooner rather than later.

For one thing, the original history can be made available, with an
intelligible UI for detecting the reference change and examining it.
And the various descriptions of how to untangle an "evil" rebase are
sufficiently algorithmic that I believe it's only a matter of time
before they're formalized as a script.

Second, to the extent it's used in a deprecatory way, "altering
history" confounds the history of the development process with the
history of a *name*.  If the VCS did a better job of tracking the name
and avoided spurious conflicts in later merges, I bet people would
hardly notice the reordering of commits.

 > To quote Linus 'rebase turns tested code into untested code'.

To quote me in another post: "Just like any other merge does."

 > This is the heart of why rebase is problematic - it has nothing to
 > do with the machinery of doing rebase, and everything to do with
 > what the operation *sets out to achieve*.

A merge.  No more, and no less.  What's wrong with that?  People like
to think of rebase using biological analogies like "graft" and
"transplant", but they are poor analogies, because the *content* is
necessarily altered (unless it is a fast forward, but nobody should be
using rebase for that).

Maybe the world would be a better place if "$vcs rebase" were instead
spelled "$vcs merge --strategy=rebase".

 > A similar issue is at the heart of why bzr asks for merges to be
 > committed rather than just saying 'why this was a merge, enjoy'.

I think this should be workflow policy; the VCS should be agnostic
about it.

 > In the checkout discussion, you seemed to say that git will do a merge,
 > during a push operation where two refs have diverged (that is, the
 > server has B, local has C, both B and C are children of A). There really
 > are only two operations to fix this situation: rebase C on B||B on C, or
 > merge C and B to give D. What does git do, by default, in this
 > situation?

I must have written something very misleading, I'm sorry.  What I
meant to describe was a strategy, implemented in git, that would allow
a commit (push) to always succeed from a checkout.  But git does not
do that, because it doesn't implement checkouts.

In the case of a *push*, git by default will succeed if and only if
it's a fast forward.  Otherwise it will complain that it's not a fast
forward, and do nothing.

There are two things you can do: first, use the --force option, which
will push the local branch exactly as is and switch the remote ref to
point to that branch, hiding any commits on the remote branch since
the point of divergence (ie, you can only refer to them by groveling
through the reflog, and eventually they will get gc-ed in git's
default configuration).

The second is to create a new branch by supplying a new name for it to
push.

In no case is a non-trivial merge done.

 > >  > At this point in my arguments, you're probably thinking, "So, just
 > >  > don't use rebasing." But I strongly believe that Bazaar core should
 > >  > provide a safe, coherent set of tools by default.
 > > 
 > > Well, it doesn't.  But that has little to do with what operations it
 > > provides.  It's an issue of design and implementation of the history
 > > database.  Bazaar's branch-centric (ie, linear) design makes it very
 > > difficult to safely provide DAG operations other than commit and merge.
 > 
 > Could you expand on this, I don't understand the limitations in play
 > here. Is it a limit on the number of refs we can have in a repository
 > (no limit in the code)? Something else? 

The issue is entirely "are multiple heads exposed to the user"?  If
you don't expose multiple heads, then there's really no way to make
"reset" (ie, multiple uncommits) or rebase safe.  Once the operation
is done, you have no handle on the previous state, and it might as
well not be there.

A little experimentation shows that I'm at least partially wrong:

mkdir quux; cd quux; bzr init
echo foo>foo; bzr add foo; bzr commit -m foo
echo bar>>foo; bzr commit -m bar
# OK, let's start editing the DAG
bzr uncommit
# That was a surprise!  (It says how to revert the uncommit.  Cool!)
# Now, let's muddy up the trail.
bzr revert
echo foo baz; echo >>foo baz; bzr commit -m baz
# Let's try the revert of uncommit.
bzr pull . -r revid:steve at stephen-turnbulls-macbook-pro.local-20090420151627-l54jir6nb8j5st02
# Can't do that (and shouldn't be able to, but you know about
# curiosity - Meow!)
bzr merge . -r revid:steve at stephen-turnbulls-macbook-pro.local-20090420151627-l54jir6nb8j5st02
# Bingo!

But that's a gawdawful user interface!  For anybody who isn't trying
to turn bzr into git as a proof-of-concept, that possibility might as
well not be there.  (And what about gc?  Would the information about
the recently created uncommit go away if I repacked?)

 > I appreciate the concept that git is about history presentation, but
 > there is a distinct difference - a qualitative difference - between 'log
 > this differently' and 'change the DAG, and the content of commits'.

Ah, but you're still missing the point.  git doesn't "create" or
"change" commits, it looks them up in its local database, and if not
found, records some information about them.  But conceptually they're
already recorded in a universal distributed database.  Using git is
like playing Zork, or being a particle in quantum physics.  All the
histories are *already there*, in the space of numbers between 0 and
16^40 - 1.  The universal DAG exists, in revision space *and time*,
whether anybody uses git or not.  By using git commit, you "collapse
the Schroedinger wave into a particle", or more realistically, you
*discover* a new node and the arrow from where you were before to that
new commit in that universal DAG.  But there's no program you can
write that git won't already know its name when they first meet. :-)

(Yes, I know, the map from commits to SHA1s is not an injection,
that's obvious.  The point is to suspend disbelief and think of it as
if it were an injection.)

In git, all the rest is about managing refs.  If you drop a ref on the
floor, then the database record about the corresponding commit gets
garbage collected.  But if you hang on to the ref, then not only do
you have that commit, but you also have ways to refer to any of its
ancestors.

Another way to say all this is that in git by using rebase you don't
say "log this differently", you say "log a different but equivalent
this".

 > log is very much _not_ the same thing as rebase, not in git, hg or
 > bzr.

True.  But surely the complexity and performance issues that bzr log
has suffered are related to the fact that bzr must recompute the
"appropriate" log every time where git encourages use of rebase to
ensure fast generation of the default (and only) log sequence, while
hg doesn't seem to care about nice logs at all and just spits out
stuff in order.