Revisit Newbie Bazaar Repository and Branch Setup

Eric Siegerman lists08-bzr at davor.org
Thu Dec 9 23:57:34 GMT 2010


On Thu, 2010-12-09 at 15:48 -0600, Tom Browder wrote:
> > On 2010-12-09 12:47 , Eric Siegerman wrote:
> >> Try unbinding "central" from "company" [...]
> 
> Actually, in this case "central" is NOT a checkout of company.  It's
> the one where I'm working in a tree inside "central".

One thing that will reduce your confusion: stop worrying about
repository vs. thing-inside-the-repository.  Once you've created
a shared repo, you can pretty much just ignore it.

When, in my previous post, I used the name "central" differently
than you had done, it's because I'm so used to ignoring shared
repos that I automatically did so this time, and so used
"central" to label the only object I cared about, i.e. the
thing-inside-the-repo. :-)  Sorry for muddying things up.

The only times I can think of where you *do* have to worry about
shared repos are as follows (anything I say below about branches
applies equally to heavyweight checkouts):

  - Any branches that are to be backed by a given shared repo
    must physically reside (perhaps indirectly) under the repo's
    root directory.  This can, on occasion, force you to organize
    your branches and checkouts in a way you might not like.

  - You must not move a branch outside its shared repo.  If you
    do, the branch will no longer work, since it won't be able to
    find its revision data.  If you need to do this, "bzr
    reconfigure --standalone" the branch *before* moving it.
    That will give the branch its own private copy of its
    revision data, after which it will no longer depend on the
    shared repo, and so can be moved at will.  (This independence
    of any shared repo is precisely what the word "standalone"
    refers to, when describing a branch).

  - For the same reason, you can't copy a shared-repo-using
    branch to somewhere outside its shared repo, except of course
    using bzr itself.  You can use the "bzr reconfigure
    --standalone" trick here too, but easier is just to use "bzr
    branch"; that will do the Right Thing regardless of whether
    the destination branch lives:
      - under the same shared repo as the source branch
      - under a different shared repo
      - under no shared repo at all

> Confusing to me so I
> think I won't work inside the repository tree again (unless I can get
> my head wrapped around this somehow--this is powerful stuff for a cvs
> => svn guy).

Yeah, "repository" means a fundamentally different thing in
Bazaar than it does in either CVS or SVN.  Both kinds of
repository are "where the revision data lives", but that's where
the similarity ends; the way you think about them and interact
with them is totally different.

CVS and SVN are what one might call "repository-centric" -- the
repository is the *main* object you interact with, and branches
are this weird second-class concept built on top of,
respectively, either RCS tags or a cheap copy operation within
the repository's namespace.

Bazaar, on the other hand, is "branch-centric" -- the thing you
interact with is a branch (or a heavyweight checkout, which is a
(bound) branch; or a lightweight checkout, which is just a
working tree that's physically decoupled from its branch).  In
all of those cases, it's the *branch* that's the focus of your
attention.  In Bazaar, a shared repository is just a revision
store; you don't interact with it directly, but just create it in
the first place and let Bazaar deal with it behind the scenes.


Another way to look at all this is in terms of what's going on
under the hood.  To do work using Bazaar, you need three things:
  - a revision store, which is a big bucketful of revisions,
    connected together into a DAG based on their ancestry

  - branch metadata, at the heart of which is a pointer to some
    revision in the DAG

  - a working tree, which is your working files plus some
    metadata about them

In the simplest case, all three of those entities are colocated
under one directory ("branch" here):
    branch/.bzr/repository      # the revision store
    branch/.bzr/branch          # the branch metadata
    branch/.bzr/checkout        # working-tree metadata
    branch/foo.py               # working files

A lightweight checkout splits that in two; instead of containing
the branch and revision store directly, it contains a pointer
to them (in the form of a URL or pathname):

    branch/.bzr/repository
    branch/.bzr/branch
                ^
                |
    light-checkout/.bzr/checkout
    light-checkout/foo.py

Note that "branch" can have a working tree too (i.e. working files +
.bzr/checkout metadata), it's just that you're not using them;
you're using the copy over in "light-checkout" instead.

Of course, there's nothing stopping you from creating multiple
lightweight checkouts backed by the same branch.  If you commit
from any of them (or from the branch's colocated working tree, if
it has one), all the other working trees will be out of date; you
use "bzr update" in each one to fix this.  This is the pattern
you're familiar with from CVS/SVN (except that in those, the hub
of the wheel is a repository, whereas in bzr it's a branch).

Unfortunately, I don't understand bound branches = heavyweight
checkouts well enough to include them in this discussion; that's
where *my* brain starts to hurt :-(


We haven't talked about shared repos yet, only standalone
branches.  That's the next level of complication: we can separate
the revision store from the branch metadata:

    shared-repo/.bzr/repository     # revision store

    shared-repo/br1                 # a branch
    shared-repo/br1/.bzr/branch     # its branch metadata

    shared-repo/br2                 # another branch
    shared-repo/br2/.bzr/branch     # its branch metadata
    shared-repo/br2/.bzr/checkout   # working-tree metadata
    shared-repo/br2/foo.py          # working files

This is a shared repo that contains two branches, one with a
colocated working tree and one without.  (Presumably br1 is
accessed via a lightweight checkout, but that (a) doesn't have to
live under the "shared-repo" directory, and (b) is irrelevant.)

A branch does *not* contain a pointer to its shared repo; rather,
bzr searches up the directory tree from the branch, looking for
the first of ../.bzr/repository, ../../.bzr/repository, etc.
that it can find.  This is why all of the branches must
physically reside under the "shared-repo" directory.

The space savings from using a shared repo comes from the fact
that it's the revision store (.bzr/repository) that takes up most
of the space; the branch metadata (.bzr/branch) is tiny.  So if
two branches are backed by the same revision store, and they have
revisions in common, only one copy of each common revision needs
to be stored.  That works because a revision, once created, is
immutable.

I said above that the revision store contains a DAG of revisions.
That might be an oversimplification.  What happens if you slurp
multiple unrelated branches into the same shared repo?  I'm not
quite sure.  Maybe you get multiple disjoint DAGs.  Or maybe
there's some empty ur-revision that each branch's r1 descends
from, and which ties all the DAGs together into a single large
DAG.  But to end users like us, I don't think it much matters one
way or the other.  (Personally, I find it convenient to think of
it as multiple DAGs, and the ur-revision, if one exists, as an
implementation detail.)


In summary, we've talked about two distinct choices you can make:
  - whether or not a branch is colocated with its revision store
  - whether or not a working tree is colocated with its branch
These two choices are orthogonal; all four combinations are
possible, and sometimes useful.

Hope this helps.

  - Eric





More information about the bazaar mailing list