Feedback from evaluation in a corporate environment

Robert Collins robertc at
Fri Jan 8 05:07:29 GMT 2010

On Fri, 2010-01-08 at 04:59 +0900, Stephen J. Turnbull wrote:

> I don't mean to push git on the Bazaar list, but git is the only one
> of the three whose object storage layout and semantics I understand
> well.  Hopefully somebody who knows Bazaar better than I do can point
> out features of Bazaar that achieve the effects you need.

I'll take a stab at it ;). Firstly I'd like to note that bzr's stacking
support at the repository level supports an arbitrary number of fallback
locations: its entirely possible to write a trivial plugin to add a
lookup to this 10GB store.

> > For distributed read/write support (master-master), Bazaar needs something
>  > like a distributed commit transaction. Bind sort of fits this role but not
>  > quite yet. First, you want for repositories to be able to be bound to
>  > multiple other repositories, not just one.

Repositories do not bind at all. *branches* bind. Branches are the unit
of work for an individual, repositories are just containers.

I don't know why you'd need multiple binding at the branch level, but it
could be done. A simpler thing though - transitive binding - is
reasonably supported, but not for commit: We have """
        # If the master branch is bound, we must fail
        master_bound_location = self.master_branch.get_bound_location()
        if master_bound_location:
            raise errors.CommitToDoubleBoundBranch(self.branch,
                    self.master_branch, master_bound_location)

""" in This restriction could be lifted with a modicum of

> OK.

cat > ~/.bazaar/plugins/ << EOF
# quick demo version: more error checking etc may be useful.
from bzrlib import branch, repository
def activate_corporate_store(branch):
    # only do this for local branches in the /nfs namespace
    if branch.base.startswith('file:///nfs/'):
        repo ='file:///nfs/bigrepo')
        # For stacked branches or branches in bigrepo this will add a
        # redundant repo, but this shouldn't matter much most of the
        # time :)
branch.Branch.hooks.install_named_hook('open', activate_corporate_store,
    'activating master readonly repository')

> > Second, not only do you want the commit to succeed in the parent
> > repository before the child but you want the commit to succeed in
>  > all the repositories or none of them.
> This is a noop.  A commit that succeeds in one repository will succeed
> in *all* related repositories once it gets there ("related" meaning
> that the parent commit(s) is (are) available in all repositories).
> All of the modern DVCSes are based on a *DAG* of commits, with new
> heads being created automatically when different commits have the same
> parent.  This is completely unlike the centralized VCS (CVCS) model,
> where creating a new head requires an explicit branch (or tag -b in
> CVS).

Ack. Applies to bzr and git equally. Less so to hg because they don't
expose a repository view of the universe (or didn't until recently, and
like all systems baggage has accrued).

> What may conflict is the update of the branch reference, the *name* of
> the head of a sequence of commits.  In a CVCS, there is no place to
> put the commit unless an explicit branch is done, so you need locks
> and atomicity guarantees for the commit as well as the branch update.
> This is not true in a DVCS.  Different DVCSes have different
> strategies here.  Mercurial simply creates a nameless head (actually
> it has the name "tip" but this is conceptually a tag, not a branch),
> and eventually you will be forced to merge it or otherwise handle it
> when you try to communicate with other repositories (because that's
> when you need coherency in names of heads).  git goes this one better
> with reflogs (but similar restrictions on communication that manifest
> somewhat differently, typically as a "rebase disaster").  bzr is
> basically like Mercurial, except that IMO it nags you to merge
> somewhat earlier.

I don't see how reflogs really impact things here: yes it gives you a
way to address the two heads, but you still need to resolve things

> > There are some features that are needed just for the distributed read
>  > support (master-slave) which are also needed for distributed read/write.
>  > Repositories need to be chained together for this to work. Suppose
>  > you have a master server and a read proxy which is bound to the master. In
>  > the DVCS world, you either clone the proxy or check it out. Either way,
>  > commit/push doesn't result in updating the master.
> Assuming by "proxy bound to master" you mean "bzr bind", I believe
> you're misunderstanding.  True, currently I don't think you can
> recursively bind to (or checkout from) a branch which is bound to yet
> another branch.  However, if you clone the proxy, AIUI a local commit
> in the clone does not update the proxy or the master, but a push to
> the proxy will update not only the proxy but also the bound master.

Indeed - see above. However, commit and our general APIs have come a
long way since binding was introduced.

Autopropogation across a series of branches or repositories is really
pretty easy to arrange: Write a plugin like this:

cat > ~/.bazaar/plugins/ << EOF
# quick demo version: more error checking etc may be useful.
from bzrlib import branch

def push_on_change(params):
    """Push to this branches push location before permitting a tip
    branch = params.branch
    tip = params.new_revid
    location = branch.get_push_location()
    if not location:
    target =
    branch.push(target, stop_revision=tip)


This will make commits, pull and push all push immediately to the
configured push location for the branch. Install that on your servers,
and the servers will push for you; install it on the users workstations
as well if you want this to happen for file:// accessed repositories, or
for users local work. Configuration can be added pretty easily too. Hmm,
this *may* interact badly if you have a bound branch bound to its push
location - don't do that ;).

>  > It would be nice if the distribution to proxies were non-blocking
>  > as well.
> It had better be; that's what DVCS means.  Unless I don't understand
> what you mean by "blocking" here.

I think Uri means 'if the pushing could happen after the commit or first
push completes'. My little plugin above wouldn't be needed in that case
- simply take the inotify email plugin and modify it to push rather than
send email - e.g. using the code I have above.

>  > With distributed repositories, it is possible that on occasion some
>  > or all of the servers will become disconnected and so there needs
>  > to be some mechanism for resyncing on reconnection. The options I
>  > see here are for the repositories to update on reconnection, a
>  > periodic resync, or a check for coherency on checkout/branch. I
>  > don't think the first option is compatible with the DVCS philosophy
> Well, your whole set of requirements is incompatible with DVCS
> philosophy (but Bazaar intends to be more than "just DVCS", so that in
> itself is no problem).  However, given those requirements, update on
> reconnection seems like the obvious solution.  Coherency check on
> checkout/branch is insufficient, you'd really need a coherency check
> on every update, so I don't think that idea will work.

Coherency checks happen on every update with some limits, and database
integrity is maintained all the time, so I don't see that a particular
resync would ever be needed.

Uri - what in particular do you mean by resync?

>  > Last, the only tool that I've found that can robustly read a CVS repository
>  > is cvs2svn. It has support for Bazaar now but I ran into some problems with
>  > it. I was able to convert trunk with history fairly easily into Bazaar but
>  > when I told it to include all the branches and tags (it doesn't even support
>  > specification or specific ones, you get all or nothing) I killed the process
>  > after >1 day of running and an 80GB fast import file, which didn't even
>  > appear to be remotely near completion.
> In this day and age of terabyte disks, I don't understand why a
> one-time cost of 80GB, or even 800GB, sets you back that way.
> However, it might be an interesting idea to convert content, not to
> fastimport format, but to git object (compressed) format or even git
> packs.

Doable but not a particular improvement IMO - just piping straight
through fastimport should behave very well for git itself, and tolerably
for bzr and other systems.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : 

More information about the bazaar mailing list