Why Darcs users prefer Darcs over Bazaar

Mon Jun 8 06:26:19 BST 2009

Ben Finney writes:
 > "Stephen J. Turnbull" <stephen at xemacs.org> writes:
 > 
 > > Ben Finney writes:
 > > 
 > >  > Yet a patch, even when printed out, is entirely abstract and
 > >  > worthless without what I'm *actually* interested in: the files
 > >  > affected (including their state). So I see a patch as an abstract
 > >  > representation of changes between tangible states of the files.
 > > 
 > > But that is *not* what a patch is. A patch represents an equivalence
 > > class of such changes. All of the versions of the tree to which the
 > > patch can be applied, paired with of the versions that result after
 > > application.
 > 
 > I think the fact that you have to explain that "patch" doesn't mean
 > what anyone unfamiliar with Darcs but experienced with Unix programming
 > would expect it to mean, says a lot about the gap in explanation of a
 > phrase like "a change is a patch".

You posted an ambiguous and quite possibly mistaken description of
what a patch is.  If that corresponds to your actual understanding,
you are missing something really important.  Surely everybody (else)
"experienced with Unix programming" understands that patches may be
applied to varying versions, as well as to the exact version against
which they were generated.  I've just explained that in precise terms
based on the expression you posted.

 > > You are really literal-minded, aren't you?
 > 
 > I had pre-existing definitions of words that didn't match what you were
 > telling me about them, so needed clarification on what exactly we're
 > talking about. I don't think that makes me especially literal-minded.

Of course that doesn't.  Everybody has pre-existing definitions.

The point is that a less literal-minded person would be less wedded to
their own definitions, and more able to infer variations in definition
by being sensitive to the context of others' words.  This really
smooths communication, I can tell you from experience. :-)

 > > Cf. makepatch(1) (if you don't have it installed, check CPAN.)
 > 
 > I don't have that installed, and the fact that it's not already
 > installed on my system as a dependency of some other development tool
 > speaks a fair bit to where the burden of explanation is on this usage of
 > the term "patch".

There are plenty of other explanations, and you're doing yourself a
disservice by focusing on this one---it makes it harder for you to
understand.  Specifically, IMO the fact that it's not installed as a
dependency speaks a fair bit to the fact that development practices
were already focusing on VCSes by the time makepatch was developed.
makepatch/applypatch is not a *development* tool, it is a
*distribution* tool, and a highly specialized one at that.

The point is that I doubt that anybody but you would have a problem
understanding what they do and why Vromans named it "makepatch"
instead of "make-augmented-patch".  In fact, I rather suspect that
you're just being perverse here, and you *do* understand.  No?

 > >  > So, you're saying that "a change is a revision"? That doesn't get
 > >  > us any closer to tangibility; it's nearly tautological. My
 > >  > complaint is that the things Darcs tracks are less tangible than
 > >  > those tracked by e.g. Bazaar.
 > > 
 > > No, a change is not a revision (== version). (That is the terminology
 > > used by most VCSes, starting with the venerable Revision Control
 > > System, as enshrined in the "-r" flag universally used for specifying
 > > revisions.) A change is the difference between two revisions. A patch
 > > (also called a delta) is an object which can be used to execute the
 > > "same change" starting from different revisions.
 > 
 > Okay, this is now the third time that I though I understood what you
 > were saying but now have to conclude I don't. So a Darcs change, the
 > atomic unit of data to be represented, is a delta between two other
 > deltas between file states, yes?

No.  Just like a "diff(1) recursive diff" or a "bzr diff" for that
matter, a "Darcs patch" is a delta between tree states.  A revision
(as can be specified by -r) in a version control system is a snapshot
of tree state plus some metadata.  True, in many VCSes (original git
is an obvious exception) revisions are stored in a delta-compressed
form.  However, even in RCS "co -r 1.8 foo" does not check out the
delta from 1.7 to 1.8 of foo, nor the reverse delta from
1.9 to 1.8.  It checks out a version of foo.

In English, of course "revision" is ambiguous.  It can refer to a
*change* in a text ("please incorporate the following revisions") or
to the product of applying those changes ("Joe, you don't seem to be
looking at the latest revision").  I can't do anything about that: you
have to determine which meaning is intended from context.

Except in quotes (which I've left alone), contexts where the accepted
term uses "revision" (eg, Bazaar "revision number"), and where I'm
actually discussing the word "revision", I'll try to disambiguate with
"version" and "delta" or "change".

 > Recall that this discussion of terminology all started with your claim
 > that "a change is a patch" roots Darcs in very tangible atomic units.

Actually, it started because you said you were having trouble
communicating with your coworkers.  That is important.  It could be
that you have a few misconceptions, could it not?

 > Yet I find it to be astonishingly intangible compared to the concept of
 > "working tree state" which is the atomic unit I *actually* want tracked
 > by a VCS. Have I misunderstood?

You seem to be confounding "tangible" with "primitive".  The fact that
a patch is derived as a difference between versions doesn't make it
less well-defined and concrete == tangible.  Also "specification"
(commands operate on versions) with "implementation" (the version
database is delta-compressed in most snapshot-oriented VCSes, but to
operate on them the versions are checked out---but in Darcs, the
patches are manipulated directly).

Of course Darcs tracks working tree state, and can reproduce a given
tree state.  But unlike snapshot-oriented VCSes (all the others that I
know of), Darcs is delta-oriented.  Darcs doesn't worry much about the
order in which deltas are added to a branch (although that information
is available), instead focusing on the set of deltas contained in the
branch.  That doesn't mean it can't reproduce state, just that
reproducing state is a derived operation based on a topologically
ordered set of patches, rather than a primitive, based on a
topologically ordered set of versions.

Note the similarity of "set of deltas in a branch" to the use case of
working on "a number of independent patches at the same time".  This
is AIUI a typical use case Robert proposes to support by "giving us
'Darcs in Bazaar'".

 > >  > Where am I going wrong?
 > > 
 > > You're not going wrong, but you clearly place little importance on the
 > > "high level editor" or "semantic editor" functions that a VCS can
 > > provide. So your point of view is very different from the Darcs fan's.
 > 
 > I place a lot of importance on editing semantics, but that's what my
 > editing tools are for. How does Darcs help me to edit semantics?

Not just Darcs, any VCS that implements "pull" (fetch versions or
deltas, then apply deltas to the branch; applying to the working tree
and committing the result is optional--git does, bzr doesn't).  Darcs
has an exceptionally flexible implementation of "pull", that's all.
Such a VCS can help you to "edit semantics" by allowing you to pull a
collection of deltas that implement a feature.

 > >  > It does not, however, do anything to dispel the concern that the
 > >  > default way of applying revisions *loses history*; i.e. I can end
 > >  > up with a branch in which I have as much information as Darcs can
 > >  > give me about that branch, but it's impossible for me to know that
 > >  > the *state* of files at a particular revision is the same as was
 > >  > recorded in that revision initially.
 > > 
 > > What was that you were saying about hyperbole? :-)
 > > 
 > > There's only one way to ensure that a particular commit satisfies some
 > > condition: the VCS must test the condition itself, as Aegis does.
 > 
 > That's not the issue. The VCS doesn't make the claim, the person
 > who wrote the commit message makes that claim.

That *is* the issue.  People believe (or act as if they believe) that
operations like rebase and merge and commit (!!) do not affect the
"tested" status of a branch, and they're just plain wrong.  Rather
than educate people, who mostly will cut these "theoretical" classes
in favor of meeting deadlines that their boss cares about right now,
it would be more effective to give the responsibility to the VCS.

(Of course there's a practical problem that running a test suite on
every commit really slows things down, so this is "effective" in the
sense that a Turing machine implementation of a computation is
"effective".  This is why Bazaar splits off the responsibility to PQM
AIUI, and why Python and many other projects use Buildbot.)

 > Since they had full access to the state of the working tree when
 > committing, it's reasonable to expect that they believe that claim
 > to be true.

Linus's point about git rebase is that it is indeed the *VCS* that
makes that claim, because it copies a log message containing a correct
claim into a version where it is not true.  The user misunderstands
what the VCS can do in this respect.  But you can reproduce the exact
same effect with diff and patch.  How often do you see posts of the
form "The attached patch fixes bug XYZ; tested.  Please apply"?  That
is a lie, of course.  The true statement is that "version A was
tested", but "version A" is never mentioned!  The submitter clearly
believes that applying the patch will cause any version "sufficiently
like version A" to pass the test, but they do not know or specify what
"sufficiently close" means.

 > The above statement is about the *user* being unable to know whether the
 > commit message refers to anything tangible that they can *themselves*
 > verify whether that commit message's claims are true.

Of course he can verify.  He checks out the version he wants to
verify, and runs the tests.  There's no other way.  That is also true
in Bazaar.  I am pretty sure that many Bazaar users (and a great
majority of those who have not worked in test-driven environments) who
see that "Test A" passed on "Branch A" in the log will assume that
"Test A" is still satisfied after merging "Branch A" into the
mainline.  They're wrong; they must run the tests again.  (I bet you
do re-run the tests after a merge, no?)

The problem with git rebase is that this kind of mistake is reified by
automatically copying the "Test A passed" log message into an untested
commit.

Note that you do *not* have to re-run unit tests after a merge, if the
"unit" hasn't changed.  But to automate this optimization requires
intimate cooperation between VCS and test scaffolding, which AFAIK
doesn't exist as a product, although it probably has been implemented
internally to various projects.

 > > As far as reproducing state, the whole point of the theory of patches
 > > is to ensure that reordering the patches *does* indeed leave the
 > > working tree in *exactly the same state*. You can lose history, yes,
 > > but reproducing state is the sine qua non of a VCS, and Darcs does not
 > > fail on that count.
 > 
 > I find myself less reassured by an algorithmic *inference* of state from
 > changes-between-changes-between-state, than by storing a lower level of
 > abstraction from that state. Based on the focus we've found in this
 > discussion, that seems to be close to the core of my
 > I-don't-see-the-benefit.

But AFAIK not only does Bazaar use delta compression, it has umpteen
different ways to implement it depending on which branch and/or repo
format you're using.  All of which have had bugs in public releases,
too.  If Darcs leaves you "less than reassured", bzr should have you
laundering your shorts after every commit.  Of course I actually mean
the contrapositive.<wink>

In fact, the "inference" that Darcs does is simply an accounting
computation, like the "offset" that patch(1) announces.  The only
difference is that Darcs does this computation when adding a patch to
a set of existing patches that may be different from the original
context, rather than deducing it from matching the content of the
patch against the file it is applied to.  This is eminently suited for
automatic computation; in principle it's not rocket science.
(Optimizing it is rocket science, though.)

 > > In order to ensure reproducibility of a given state, in Darcs you
 > > *need* to tag the branch.
 > 
 > So why isn't this done every time I commit,

Because you're weird<wink>, at least in the Darcs world.  I'm not
trying to convince you that Darcs itself would be useful to you; I'm
pretty sure it would not, as it stands.  I'm trying to explain what it
is that Darcs fans find attractive, as I understand it, and thus why
many people (such as the Honorable Mr. Collins) might find Darcs-like
features a worthy addition to Bazaar.  I also think you would find
this mode of operation very useful, in the context of an addition to
Bazaar rather than a switch to Darcs.

 > I suppose all the above questions are also asking, what am I buying with
 > this cost that I don't have to pay when using Bazaar? Why would I ever
 > want to commit *without* being able to ensure I can reproduce state?

Reason 1: Because your VCS is being used as an editor; the metadata
would only serve to confuse you.  Take "bzr shelve".  It could create
commits (eg, "git stash" does), but you really don't care about those
commits.  Rather, you want to label the *change* that was shelved, so
you can apply it later in an appropriate context, which often isn't
the original one.  Think of Darcs as a VCS implemented in terms of
"shelve", maybe.

Reason 2: Because your workflow is linear, so the date is a usable
name for the state.  Ie, lack of a tag isn't a problem for Darcs (in a
linear workflow), any more than lack of revision numbers is a problem
for git.  Eg, Darcs has bisect, and it's implemented in terms of the
chronological order in which patches are added to the repo.  (Darcs
has a git-like division between the author date, when the patch was
created, and the committer date, when the patch was added to the
repo.)

The problem that Darcs users face in a general workflow is that in
Darcs a tree *really is* a branch *really is* a repo.  In order to get
reproducibility of Developer A's results by Developer B without tag,
their branches must be 100% in synch, which would require a clone.
This doesn't often matter to them because the typical Darcs workflow
is very linear.  When it does, they tag rather than clone.

This linearity is possible in Darcs because it handles cherrypicking
100% correctly, and 100% consistently with all other Darcs operations,
and because Darcs treats non-conflicting merges as a set union.  Thus
most "darcs pull" operations do not appear non-linearly to the user;
only conflicting ones do.  What in Bazaar or git would require an
explicit merge commit (or a rebase), in Darcs is done by cherry
picking, which is done with the "pull" command.  A default "darcs
pull" is simply "cherrypick everything I don't have yet".  This will
result in the correct state because the sequence of patches in the
resulting Darcs repo conforms to a topological order that guarantees
identical state.

(I'm ignoring the issue of merge conflicts, whose efficient handling
remains an unsolved problem in Darcs.)

 > > IMO, the right way to handle this problem is to prohibit QA claims in
 > > commit messages (except in Aegis-like systems, where it's
 > > uninformative since all commits have passed the test suite). Such
 > > claims should be placed in tags.
 > 
 > In a VCS which *can* ensure reproducibility of the state represented in
 > any commit, I don't have to make that choice every time I commit.

You've missed the whole point of Linus's diatribe if you think that.

Reproducibility of a given state has nothing to do with whether state
changes preserve properties of the original state.  A QA claim is
associated with a state, and as soon as state changes *all* such claims
are invalidated.  That is the state of the art (except in Aegis, where
it's trivial because Aegis enforces test success as a precondition for
a commit).  (At least, I don't know any VCSes with correctness proof
automation in their commit operations. :-)

 > I *can* make such claims backed by the knowledge that every single
 > commit represents a working tree state that can be reproduced later
 > by any user of that branch.

Not in the presence of a rebase command that copies log messages, no,
you absolutely cannot.  Remember, you may be disciplined enough not to
use rebase, but you cannot guarantee that others are so disciplined.
When you put a QA claim into a commit message, you put third parties
at risk.

 > What is worth the loss of that automated assurance, pushing the task
 > back to the user, making them manually manage such assurances in
 > advance?

It's not a question of managing assurances.  To manage assurances "in
your VCS", you can accept "in your bones" that assurances only apply
to the particular state they are attached to, and do not propagate
forward, backward, sideways, or through hyperspace, and eschew all
operations that copy log messages into new commits.  This is pretty
useless as a *management* tool; all it does is restrict what *you* can
do with *your* VCS, and puts all your claims at the mercy of *others*
less disciplined than you.

Or you can go the Aegis route, which does manage assurances, though
not by propagating them across changes, but rather by testing each
version individually.

The difference between Darcs and other VCSes in this regard is not
that you can't attach an assurance to a reproducible state.  It's that
for a state name to be generally valid across repos, it must be a tag.
(This may be changing as Darcs is getting new repo formats in versions
after 2.1, but it's true in Darcs 1 and Darcs 2.0/2.1.)

 > I realise it's not your job to educate me about Darcs, especially in a
 > Bazaar forum, but it's all intended to be in aid of better understanding
 > the competition and whether perceived advantage of one over the other
 > represents something that should be addressed in one or the other.

Agreed, and you're welcome!