Replacing an expensive proprietary CM system with bzr.

Talden talden at gmail.com
Wed Feb 27 00:56:25 GMT 2008


>  Talden wrote:
>  > Is there a commentary anywhere in the docs on known scaling behaviour?
>  > ...
>  > We have a CVS tree that, in an experimental conversion to Subversion
>  > produced around 25,000 revisions for a working tree with 17,000 files
>  > in 3,500 folders.  The working tree totals (not counting any VCS
>  > book-keeping) 800MB with about 1000 files accounting for 75% of that
>  > volume.
>  >
>  > There are a couple of dozen branches but I would expect 2/3rds of the
>  > revisions to be on HEAD.

John Arbash Meinel wrote:
>  Well, Ian is currently working on importing the OOo repository as "one
>  big tree" which makes it ~75k files, 500k revisions, and several GB of
>  data. (In SVN it is ~50GB on disk, it was smaller than that in CVS.)
>  That will be the largest tree I know of which should give a bit of an
>  upper limit to scaling.
>
>  25k revisions is pretty small (Bazaar itself is 15k). 17k files probably
>  puts "bzr status" at < 2s. (I think on a Moz tree with 50+k files it was
>  2-3s on decent hardware.)

Our revision rate has increased... though the CVS archive dates back a
decade we're now adding about 4,000 revisions (in Bazaar speak, not
CVS per file revisions) per year.

>  With 7k files my "bzr status" time is < 1s (0.8s) on 5 year old
>  hardware. That tree is 131MB with 56k revisions (600MB in the repository.)

A quick test shows status on newer hardware taking about 5 seconds but
then that's on WinXP machines with NTFS - a linux machine with ext2 or
ext3 would be considerably quicker on the same hardware.

>  You tree shape is a bit different (~2x the files, but 8x the bytes).
>
>  Probably my biggest concern would just be having a lot of data as your
>  history grows large. I don't know how much churn your files have (are
>  they big compressed binaries which roll over completely with every minor
>  change, or are they just big text files that have 2 lines change in a
>  given commit...)

Our changes are incremental, there's little movement in the binaries -
but revisions would typically change >10 files each with varying
amounts of diff data.  We've added about 5000 of the files in the last
two years but we're now seeing older files walk the plank so this rate
will decrease dramatically.

>  If your history is 2GB, that makes initial copy to a new machine have to
>  download 2GB of data. We are working on Shallow Branches that wouldn't
>  copy the data, but then you don't have the history locally for
>  introspection, etc.

The SVN repo was much smaller than the CVS it was sourced from (about
60%) because early on in the CVS history there were a lot of binary
changes that SVN binary diffs.  The SVN repo was ~1.5GB.  Given we
won't have all of it in one branch I'm assuming the Bazaar branches
will be lighter than the 1.5GB + 800MB working tree.

>  In general, I don't think your tree is in the "running into problems"
>  size yet. It might be just outside of the "simple comfort" zone.
>
>  John

CVS is so far outside the comfort zone it's not funny.  I still have
to sell the PHBs on Bazaar but I'm hopeful I can demonstrate the
necessary benefits.

Changes I'd like to see:
- Some equivalent (or better) of the Hg forest extensions
- EOL conversion (Subversion provides the right amount of VCS
involvement I think)
- Subversion-style property support and a distributable form of the
Subversion autoprops feature.

--
Talden



More information about the bazaar mailing list