Experience of centralized workflow with NFS-mounted storage?

Thu Nov 20 16:15:20 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mikael Karlsson wrote:
> We're setting up a Bazaar environment with a centralized workflow and we
> wonder if someone have experience of having a Bazaar server with a
> mounted NFS-storage where the branch data is stored?
> 
> Clients will checkout data over SSH from the server to clients like this
> 
> Branch data <=== NFS ===> Bazaar server <=== SSH ===> Developer
> 
> We could be talking about several hundred users checking out and
> committing data and several gigabytes of branch data simultaneously.
> 
> The actual question is how file locking will work when using NFS? I know
> it doesnt work with cvs but it work with subversion.
> 
> I've tried to find information in the old mailing list archive but not
> found anything that gives me answers.
> 
> The obvious reason to store everything on the SAN are for availability
> and security reasons and also to be able to make a quick recover if one
> server fails since there will be a backup server in standby in case of
> failure.
> 
> I would be pleased if the developers of Bazaar could give me a
> recommendation if this is a good idea or not.
> Also if someone have experience of this, good or bad.
> 
> 
> Regards
> 
> Mikael
> 
> 

To describe how Bazaar would work with NFS and locking....

1) We only use OS locks for *working trees*. So if you have a repository
+ branches on NFS everything should work fine. What tends to break is
when your home directory is an NFS directory, because we use an OS lock
on one of the files there (.bzr/checkout/dirstate).

2) For Branch and Repository, we use directory locking (create a
directory, add a file with your lock code, rename the directory into
place, check if your lock code is the one in that directory.) Generally
this makes it safe for any filesystem.

(Aside from old versions of Twisted's SFTP server which had a buggy
implementation that caused

rename(dir, existing)
to be treated as
rename(dir, existing/dir)

Which matches what the "mv" command does, but not what the low level
"rename()" os call does.)

3) Things that are mounted into the local filesystem are generally
accessed with different patterns than things that are accessed via
bzr+ssh or sftp. For example, if you are accessing via bzr+ssh we buffer
64kB for all index reads, while if you access via file:/// we don't do
any buffering.

This may be better/worse for you. But local access is generally assumed
to have very high bandwidth and very low latency. If that isn't true for
your NFS mounts, then you might consider accessing via bzr+ssh:// which
tries harder to hide latency.

Also, bzr+ssh:// does have 2 processes running (the local and remote),
which means you share the workload a little bit, but you also can cause
us to (re-)serialize the data to send it over the wire.

4) Why mount over NFS versus direct access to the Branch Data? Just
because it is on a SAN which doesn't let you install anything? (This can
be reason enough, I just want to make sure I understand what is going on.)

As for a backup server, it is pretty easy to replicate a Bazaar
repository with a cron script. It could be easier to provide 2 Bazaar
servers rather than a SAN solution. (Except you probably already have
invested the time, effort and money into the SAN solution. :)

Bazaar would probably handle multiple "master" repositories slightly
better, though. (You still have problems if 2 people commit to the
"same" branch on each repo, but the repository storage itself would be
quite capable of figuring out what revisions need to be copied.)

Perhaps to put it a different way. You could scale "up" in the number of
Bazaar servers you have, as long as there was only one official master
for each branch. Users could get read access from Branches from all
servers, but Write access for branches 1-10 from master 1, branches
11-20 from master 2, etc.

This may be more complexity than you want/need. Just mentioning it as a
way to scale up.

5) I highly recommend using the latest repository format if you aren't
concerned about backwards compatibility. (--1.9-rich-root if you can,
- --1.9 otherwise)
I'll mention that we've been focusing a lot lately on how we scale to
very large projects, so I would expect at least one more repository
format update.

As mentioned, you can do things like lightweight checkouts, stacked
branches, local shared repositories, etc to help minimize the impact of
very large histories. If you are doing strictly centralized development,
then lightweight checkouts should work well for you. I'll admit that
lightweight checkouts of network repositories haven't had as much
optimization time as some of the other arrangement. Mostly because when
you are as distributed as Open Source, you don't have much "local
network" to rely on. :)

6) Asking questions here, and giving us feedback about how things are
working is a great way to help ensure that as Bazaar evolves we continue
to fit your needs and make things better for you. We try to be
responsive, especially if there are "hundreds of developers" being impacted.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkljRgACgkQJdeBCYSNAAO9VwCghSaZE0kfsLRdIpvsAxwb/NVf
hDYAn3HVx52pp7VYyP8qweiEVo2K3+P1
=yxW5
-----END PGP SIGNATURE-----