Fwd: [Python-Dev] Primer on distributed revision control?

Tue Mar 25 18:45:07 GMT 2008

I'm not Paul but I do have a few opinions here....

Ian Clatworthy writes:

 > Any reason why you prefer the Hg book to the Bazaar User Guide?

FWIW, having followed Arch development from the ArX fork to the Bazaar
fork, I found the Hg book almost useless.  I already knew all that....
As an extended tutorial, it doesn't cover as much of the generic side
of revision control as the old Arch tutorial did, while as a manual
... well, no command is fully documented and most commands are
undocumented; there's not even a reasonably complete listing of the
important extensions.

The BUG (oops), on the other hand, I found to be annoying with its
continuous cheerful plugs for Bazaar.  It's almost smug: "distributed
tree revisioning done right".  I think an official manual should take
itself more seriously.  YMMV.  And somewhat hypocritical: although
Arch (and implicitly git and Mercurial) is deprecated as "not ready
for prime time distributed tree revisioning", when bzr performance is
discussed, very clearly the implied standard of comparison is not the
other modern VCSes which currently are *multiple* orders of magnitude
faster in *naive* use, but rather the network-based centralized
systems like CVS.  (This is especially grating when the bzr project
I'm most interested in (Emacs) is currently not worth using with bzr
-- far preferable to maintain a Tailor gateway to a local git repo.)

To me, Section 1.3 is out of place: at this stage the user either
knows the initial workflow he wants to implement or doesn't know about
workflows.  It takes up a lot of space and interrupts the flow of the
introduction.  I would prefer a couple of examples of side by side
implementations of a couple of tasks using CVS or svn vs. bazaar.  The
diagrams in 1.3 are a distraction, because they are terrible as
illustrations.  To figure out what's going on, you need to read the
text and analyze the diagrams to understand the information flows.
People who are pictorially oriented are likely to get rather confused
IMO.  I would move this whole section to just before Chapter 4, and
replace it with a *brief* description of the ideas of workflow:
cooperative development, pushing and pulling branches, shared
repositories and commit control.  Just say that bzr supports these
fundamentals well and comes with tools like Bundle Buggy and PQM, and
point to the details in the later chapter.

Chapter 2 is excellent, except that I prefer section 1 to be replaced
by "We assume you already have Bazaar properly installed, which you
can test by issuing the command 'bzr --version'.  If that doesn't
output something like

Bazaar (bzr) 1.3
[spew elided]

Bazaar is easy to install.  See Appendix A."

Chapter 3 is also excellent.  Use of regexps in the .bzrignore should
be mentioned, though, as people who know how to use them can get great
advantage of them, especially in legacy projects with lots of ad hoc
generated files.

Chapter 4 again is very good.  One thing I noticed in Ch. 4, but is
possibly true throughout the BUG, is that the concept of shared repo
is a little imprecise.  Specifically, in many operations most dVCSes
will follow parents until they find the meta-info.  But in creating
new branches, is that true?  For example, what happens when you do:

bzr init-repo project
cd project
bzr branch bzr://central.org/project/trunk
mkdir branches
bzr branch bzr://outthere.com/project/wacko-branch
bzr branch bzr://outthere.com/libwacko/libwacko

My intuition is that both wacko-branch and libwacko end up stored in
the "project" repo, which is what you always want with wacko-branch,
but it's not obvious if that's appropriate for the separately
developed libwacko.  I see this is discussed in Ch. 8.  Probably just
a note that the organization of a set of branches does matter, and
that it's discussed in detail in Sec. 8.2, is the best way to handle
it.

Note that there's an ambiguity between "shared repo" meaning "stores
revisions for many branches" and "shared repo" meaning "stores
revisions for many users".  I don't recall ever being confused by it,
but I wonder if a more naive user might.

Another thing I noticed in Ch. 4 (but may be more widespread) is a
slight tendency to tell the user what she should want.  Eg, in
discussing tools like gannotate (4.5.2), the BUG says "The GUI tools
typically provide a much richer display of interesting information
(e.g. all the changes in each commit) so they are often preferred over
the text-based command by users."  I would put the period after the
close paren and let the reader figure it out for herself.  For an
example of what I consider good usage, in section 1.3.8: "A companion
tool of Bazaar's called Patch Queue Manager (PQM) can provide the
automated gatekeeper capability."

Hm ... on second thought, after reviewing Ch. 4, the taxonomy of
workflows from Sec. 1.3 should come *after* Ch. 4 and before Ch.5,
IMO.  The pair workflow is intuitive, and provides the operational
ideas needed to understand more general workflows.

I don't have much to say about Ch. 5-7.  Ch. 5 and 6 seem to have
about the right level of content, showing how centralized and
decentralized workflows can be organized, without going into so much
detail as to suggest this is the *only* right way.  I also liked the
way Ch. 5 deals with the classic problems of a centralized repo.  Eg,
when working disconnected it shows how bzr can help you work offline,
and then merge the work into the normal workflow without disruption.

I'm not sure "Best Practices" is a good title for Ch. 7, which seems
more like a "grabbag of goodies" than The Way The Pros Do It.  That
is, by and large these are descriptions of features and how to invoke
them, rather than explanations of when to use them and why (which is
what "best practice" means to me).  Really, best practices are more
suited to a wiki than the user guide.