[RFC] branch --bind

Mon Jan 11 08:37:42 GMT 2010

Martin Pool writes:
 > 2010/1/9 Ian Clatworthy <ian.clatworthy at canonical.com>:

 > > Implementation details and bugs aside, the biggest problem is that
 > > Bazaar has both bound branches and heavyweight checkouts while no other
 > > DVCS has either. It confuses users having to choose and every new
 > > project moving to Bazaar needs to go through this (IMO) unnecessary
 > > learning curve of understanding the differences between bound branches
 > > and heavyweight checkouts.
 > 
 > Right.

I don't think that *all* the confusion is unnecessary, at least at the
project level.  CVC-in-a-DVCS is inherently a *lot* more complex than
"pure" DVCS, at least at the present state of the art.  There are a
lot of tradeoffs to be made in designing the workflows, and "best
practice" is not yet well-established, except maybe in the minds of a
few senior Bazaar developers minds.  (There are several examples of
best practice out the wild, but they're kind of project-specific.)

 > > Personally, I'd like to have 3 core concepts only:
 > >
 > > 1. Repository
 > > 2. Branch
 > > 3. Checkout (aka Working Tree).
 > >
 > > Users should then to able to:
 > >
 > > * Make a repository treeless or not.

I have a strong intuition that it would be better to make all
repositories treeless in principle.  You could have a standalone
branch where the repository is in the branch, or a shared repository
where the branches are in the repository.  The tree shouldn't need to
know anything about this.  Conversely, the repository shouldn't need
to know anything about trees.

 > > * Make a branch bound or not.
 > > * Give a checkout a cache or not.
 > >
 > > That's flexible, easy to explain and clean.
 > > Importantly, each concept is orthogonal and adds value in it's
 > > own right.

Bound branches are *not* that easy to explain.  Currently they have
some unintuitive restrictions (at least from the point of view of what
Uri Moszcowicz wanted to do, for example).

And that's *four* core concepts: you're sneaking in "cache".  Caches
are *horrible* to explain, because they are complex options that need
to be carefully tuned and usually have rather implementation-dependent
behavior.

 > That's a good list, and I think exposing them orthogonally is good.
 > 
 > It seems to me this still allows both lightweight and heavyweight
 > checkouts, by either having the working tree separate from the branch,
 > or by having it colocated with a bound branch.

IMHO colocation has very little to do with the lightweight vs.
heavyweight distinction.  A lightweight checkout conceptually connects
a tree to a branch (which might be local or remote), and updates the
tree to a revision on that branch.  A heavyweight checkout incurs
substantially more overhead because it connects a tree to a bound
branch (typically by creating a local branch as a cache for the remote
branch, which is the one you're really interested in).

Eg, I would typically want to set things up this way:

~/var/bzr/repos/{project1,...}
~/var/bzr/repos/project1/{mirror-branch1,...}
~/work/checkout1

So "checkout1" is a checkout of "mirror-branch1", which is bound to a
remote "branch1" on the home repository of "project1" (on another host).

 > It's not necessarily a bad thing that those options fall out, and
 > we could expose things more cleanly:
 > 
 >   bzr checkout: make a checkout, either on top of an existing
 > checkout-less branch, or pointing to a separate branch

What does checkout-less branch mean, and how would you know that
anyway?  You mean a branch whose .bzr doesn't happen to be located in
a working tree?

 >   bzr branch --bind: make a bound branch and optionally a working tree
 > (what is now a heavyweight checkout.)

Optional?  When would a bound branch without an associated tree make
sense?

How about the following *hierarchy*:

Tree: contains working copies of managed content, a pointer to the
Branch it was checked out from and the parent revision, and other
metadata associated with this tree (user config, for example).  Plus
"junk" (eg, build products and editor backups), of course.

Branch: contains a pointer to the repository which stores its data
(both content and revisions), and other metadata associated with this
branch (head revisions, pointers to bound branches, and user config,
for example).

Repository: contains content and revisions, and other metadata
associated with this repository (list of Branches and user config, for
example).

Any two or all of them may be colocated (for example, in a traditional
standalone branch the Tree, the Branch and the Repository share .bzr,
in checkout, branch, and repository subdirectories (more or less)).

Here's the idealized UI (it's something of a strawman, probably will
be rather unpopular in practice because of the change in the meaning
of "bzr branch"; dealing with that can come later).  Note that I don't
specify any defaults because I define some of the commands in terms of
others, but in practice at least the obvious defaults would be defined.

bzr
  Global option --repo=FILE-URL if present means to use the file:///
    URL FILE-URL as the local shared repo in which to store new
    branches, or as a default start point for relative branch specs
    for existing branches.

bzr branch BRANCH-URL URL
  (Mostly a building block for complex workflows.)
  Copy Branch BRANCH-URL to location URL.
  Does not create a tree.
  BRANCH-URL is an existing Branch.
  URL is a location where the new Branch will be stored.

bzr bind SOURCE-BRANCH TARGET-BRANCH
  (Mostly a building block for complex workflows.)
  Binds SOURCE-BRANCH to TARGET-BRANCH.  That is, committing in
  SOURCE-BRANCH requires an up-to-date check against TARGET-BRANCH.
  If the check succeeds, the commit in SOURCE-BRANCH is automatically
  pushed to TARGET-BRANCH.

bzr unbind == bzr bind SOURCE-BRANCH --nowhere
  (Mostly a building block for complex workflows.)

bzr checkout BRANCH-URL URL
  Checkout a working Tree from Branch BRANCH-URL into URL.
  BRANCH-URL is an existing Branch, and is called the *parent* of the
    Tree (sometimes called the "nominal parent" when the parent is
    bound to another branch).
  URL is a location where the working tree will be stored.
  checkout option --cache-branch=CACHE-URL is equivalent to
    "bzr branch BRANCH-URL CACHE-URL
     && bzr bind CACHE-URL BRANCH-URL
     && bzr checkout CACHE-URL URL".
  checkout option --heavyweight is equivalent to
    "bzr checkout --cache-branch=URL BRANCH-URL URL".
  checkout option --standalone is equivalent to
    "bzr checkout --cache-branch=URL BRANCH-URL URL
     ; bzr unbind URL".

bzr switch BRANCH-URL
  Must be executed in a working Tree, whose real parent is switched to
    BRANCH-URL, and the Tree's content updated.
  If the current nominal parent is not bound, the Tree's parent is set
    to BRANCH-URL.
  If the Tree's current nominal parent is a bound branch, the real
    parent is the bound branch's target.  In this case a new cache
    branch bound to BRANCH-URL is created, and the Tree's parent is
    set to the new cache branch.  (This operation will fail if the
    Tree is colocated with its parent branch.)
  switch option --nominal unconditionally sets the Tree's parent to
    BRANCH-URL, without checking for bindings.
  switch option --no-update sets the parent, but does not update
    content.

  An alternative UI for switch might be to omit the --nominal option,
  and have a --real-parent option that switches the real parent as
  above.  I guess in this case it would make sense to fail the
  operation if the parent is a bound branch but --real-parent was not
  given.

Obviously most of the above would need ways for users to configure
various defaults, and some of the arguments (especially locations)
would get the usual defaults.  Several of the commands should take a
--revision option to set the parent revision and update targets.  I'km
sure there are other obvious things to do here.