Feedback from evaluation in a corporate environment

Robert Collins robertc at robertcollins.net
Fri Jan 8 05:07:29 GMT 2010


On Fri, 2010-01-08 at 04:59 +0900, Stephen J. Turnbull wrote:

> I don't mean to push git on the Bazaar list, but git is the only one
> of the three whose object storage layout and semantics I understand
> well.  Hopefully somebody who knows Bazaar better than I do can point
> out features of Bazaar that achieve the effects you need.

I'll take a stab at it ;). Firstly I'd like to note that bzr's stacking
support at the repository level supports an arbitrary number of fallback
locations: its entirely possible to write a trivial plugin to add a
lookup to this 10GB store.

> > For distributed read/write support (master-master), Bazaar needs something
>  > like a distributed commit transaction. Bind sort of fits this role but not
>  > quite yet. First, you want for repositories to be able to be bound to
>  > multiple other repositories, not just one.

Repositories do not bind at all. *branches* bind. Branches are the unit
of work for an individual, repositories are just containers.

I don't know why you'd need multiple binding at the branch level, but it
could be done. A simpler thing though - transitive binding - is
reasonably supported, but not for commit: We have """
        # If the master branch is bound, we must fail
        master_bound_location = self.master_branch.get_bound_location()
        if master_bound_location:
            raise errors.CommitToDoubleBoundBranch(self.branch,
                    self.master_branch, master_bound_location)

""" in commit.py. This restriction could be lifted with a modicum of
effort.

> OK.

cat > ~/.bazaar/plugins/bigrepo.py << EOF
# quick demo version: more error checking etc may be useful.
from bzrlib import branch, repository
def activate_corporate_store(branch):
    # only do this for local branches in the /nfs namespace
    if branch.base.startswith('file:///nfs/'):
        repo = repository.Repository.open('file:///nfs/bigrepo')
        # For stacked branches or branches in bigrepo this will add a
        # redundant repo, but this shouldn't matter much most of the
        # time :)
        branch.repository.add_fallback_repository(repo)
branch.Branch.hooks.install_named_hook('open', activate_corporate_store,
    'activating master readonly repository')
EOF


> > Second, not only do you want the commit to succeed in the parent
> > repository before the child but you want the commit to succeed in
>  > all the repositories or none of them.
> 
> This is a noop.  A commit that succeeds in one repository will succeed
> in *all* related repositories once it gets there ("related" meaning
> that the parent commit(s) is (are) available in all repositories).
> 
> All of the modern DVCSes are based on a *DAG* of commits, with new
> heads being created automatically when different commits have the same
> parent.  This is completely unlike the centralized VCS (CVCS) model,
> where creating a new head requires an explicit branch (or tag -b in
> CVS).

Ack. Applies to bzr and git equally. Less so to hg because they don't
expose a repository view of the universe (or didn't until recently, and
like all systems baggage has accrued).

> What may conflict is the update of the branch reference, the *name* of
> the head of a sequence of commits.  In a CVCS, there is no place to
> put the commit unless an explicit branch is done, so you need locks
> and atomicity guarantees for the commit as well as the branch update.
> This is not true in a DVCS.  Different DVCSes have different
> strategies here.  Mercurial simply creates a nameless head (actually
> it has the name "tip" but this is conceptually a tag, not a branch),
> and eventually you will be forced to merge it or otherwise handle it
> when you try to communicate with other repositories (because that's
> when you need coherency in names of heads).  git goes this one better
> with reflogs (but similar restrictions on communication that manifest
> somewhat differently, typically as a "rebase disaster").  bzr is
> basically like Mercurial, except that IMO it nags you to merge
> somewhat earlier.

I don't see how reflogs really impact things here: yes it gives you a
way to address the two heads, but you still need to resolve things
eventually.

> > There are some features that are needed just for the distributed read
>  > support (master-slave) which are also needed for distributed read/write.
>  > Repositories need to be chained together for this to work. Suppose
>  > you have a master server and a read proxy which is bound to the master. In
>  > the DVCS world, you either clone the proxy or check it out. Either way,
>  > commit/push doesn't result in updating the master.
> 
> Assuming by "proxy bound to master" you mean "bzr bind", I believe
> you're misunderstanding.  True, currently I don't think you can
> recursively bind to (or checkout from) a branch which is bound to yet
> another branch.  However, if you clone the proxy, AIUI a local commit
> in the clone does not update the proxy or the master, but a push to
> the proxy will update not only the proxy but also the bound master.

Indeed - see above. However, commit and our general APIs have come a
long way since binding was introduced.

Autopropogation across a series of branches or repositories is really
pretty easy to arrange: Write a plugin like this:

cat > ~/.bazaar/plugins/pushonchange.py << EOF
# quick demo version: more error checking etc may be useful.
from bzrlib import branch

def push_on_change(params):
    """Push to this branches push location before permitting a tip
change."""
    branch = params.branch
    tip = params.new_revid
    location = branch.get_push_location()
    if not location:
        return
    target = branch.Branch.open(location)
    branch.push(target, stop_revision=tip)

branch.Branch.hooks.install_named_hook('pre_change_branch_tip',
    push_on_change,'pushing')
EOF

This will make commits, pull and push all push immediately to the
configured push location for the branch. Install that on your servers,
and the servers will push for you; install it on the users workstations
as well if you want this to happen for file:// accessed repositories, or
for users local work. Configuration can be added pretty easily too. Hmm,
this *may* interact badly if you have a bound branch bound to its push
location - don't do that ;).

>  > It would be nice if the distribution to proxies were non-blocking
>  > as well.
> 
> It had better be; that's what DVCS means.  Unless I don't understand
> what you mean by "blocking" here.

I think Uri means 'if the pushing could happen after the commit or first
push completes'. My little plugin above wouldn't be needed in that case
- simply take the inotify email plugin and modify it to push rather than
send email - e.g. using the code I have above.

>  > With distributed repositories, it is possible that on occasion some
>  > or all of the servers will become disconnected and so there needs
>  > to be some mechanism for resyncing on reconnection. The options I
>  > see here are for the repositories to update on reconnection, a
>  > periodic resync, or a check for coherency on checkout/branch. I
>  > don't think the first option is compatible with the DVCS philosophy
> 
> Well, your whole set of requirements is incompatible with DVCS
> philosophy (but Bazaar intends to be more than "just DVCS", so that in
> itself is no problem).  However, given those requirements, update on
> reconnection seems like the obvious solution.  Coherency check on
> checkout/branch is insufficient, you'd really need a coherency check
> on every update, so I don't think that idea will work.

Coherency checks happen on every update with some limits, and database
integrity is maintained all the time, so I don't see that a particular
resync would ever be needed.

Uri - what in particular do you mean by resync?

>  > Last, the only tool that I've found that can robustly read a CVS repository
>  > is cvs2svn. It has support for Bazaar now but I ran into some problems with
>  > it. I was able to convert trunk with history fairly easily into Bazaar but
>  > when I told it to include all the branches and tags (it doesn't even support
>  > specification or specific ones, you get all or nothing) I killed the process
>  > after >1 day of running and an 80GB fast import file, which didn't even
>  > appear to be remotely near completion.
> 
> In this day and age of terabyte disks, I don't understand why a
> one-time cost of 80GB, or even 800GB, sets you back that way.
> 
> However, it might be an interesting idea to convert content, not to
> fastimport format, but to git object (compressed) format or even git
> packs.

Doable but not a particular improvement IMO - just piping straight
through fastimport should behave very well for git itself, and tolerably
for bzr and other systems.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20100108/b5a35b9f/attachment-0001.pgp 


More information about the bazaar mailing list