Bazaar repository size benchmarks

Ian Clatworthy ian.clatworthy at internode.on.net
Mon Jun 2 00:40:36 BST 2008


Pieter de Bie wrote:
> Hello,
> 
> I did some benchmarks on repository size for repositories with full
> history. I compared the size of Git, Bazaar as well as Mercurial
> repositories. The results of this experiment can be seen here:
> http://vcscompare.blogspot.com/2008/06/git-mercurial-bazaar-repository-size.html

Pieter,

Thanks for the benchmarks. Can you please make the repositories available,
particularly the Bazaar ones? I did some (completely unrelated) space analysis
experiments on the weekend looking to tune how bzr-fastimport decides when to
store an inventory full-text. FWIW, fastimport is not as smart as the core
Bazaar code in deciding when to do this, so fastimport gives worse results
than a team using Bazaar normally. That was an explicit design decision
in the name of making fastimport faster.

Some other comments:

* Recent versions of fastimport implicitly pack and delete the obsolete packs
  so you don't need to do that by hand any more.

* The SpaceEfficiency benchmarks I published don't just look at a single branch
  but several. It's very common for developers to have at *least* two - the
  current trunk and the current released code. If using feature branches, the
  number can climb a lot. The space taken up by the working trees themselves
  become important in that case and being able to save space across them - as
  Bazaar lets you do using the --hardlink option to bzr branch - can make some
  large savings. (Git tends to encourage one working tree and 'switch' instead
  of multiple but that doesn't always fit in with how every one likes to work.)

I'd like to analyse where the space is being used in the repositories you've
generated. I *think* it's probably in inventories but I'd like to confirm that.
To give an idea of the difference possible, on one test repository with 6000+
revisions (wordpress), I'm seeing a variation from 17MB to 78MB depending on
how often inventory fulltexts are stored. The current fastimport algorithm -
create an inventory fulltext every 200 and only every 200 - gives a repository
size of 17.5MB so it's acceptable on my test repository but could well be
lousy on other data sets.

Ian C.



More information about the bazaar mailing list