Bazaar repository size benchmarks

Pieter de Bie frimmirf at gmail.com
Mon Jun 2 11:08:39 BST 2008


On Mon, Jun 2, 2008 at 1:40 AM, Ian Clatworthy
<ian.clatworthy at internode.on.net> wrote:
>
> Pieter,
>
> Thanks for the benchmarks. Can you please make the repositories available,
> particularly the Bazaar ones? I did some (completely unrelated) space analysis
> experiments on the weekend looking to tune how bzr-fastimport decides when to
> store an inventory full-text.

I will take a look at what I can put online; the repositories are
quite big and I don't have a fast upload. Perhaps I can upload some of
the smaller repos (that is, not the mozilla one)?

> FWIW, fastimport is not as smart as the core
> Bazaar code in deciding when to do this, so fastimport gives worse results
> than a team using Bazaar normally. That was an explicit design decision
> in the name of making fastimport faster.

Git's fast-import acts in much the same way. It creates badly packed
packs in order to be faster. That is also the reason why I did a git
repack -adf after the conversion. If Bazaar has a similar option, I
can try it, but I couldn't find it.

> Some other comments:
>
> * Recent versions of fastimport implicitly pack and delete the obsolete packs
>  so you don't need to do that by hand any more.

Yes, this is what I found also, but I thought it couldn't hurt :)

> * The SpaceEfficiency benchmarks I published don't just look at a single branch
>  but several. It's very common for developers to have at *least* two - the
>  current trunk and the current released code. If using feature branches, the
>  number can climb a lot. The space taken up by the working trees themselves
>  become important in that case and being able to save space across them - as
>  Bazaar lets you do using the --hardlink option to bzr branch - can make some
>  large savings. (Git tends to encourage one working tree and 'switch' instead
>  of multiple but that doesn't always fit in with how every one likes to work.)

Do you mean you use hard links for the working tree? Won't that edit
two repositories if you use an editor / command that edits files (as
opposed to delete/create?)

I may look into the usage of branching. I don't think it is a useful
thing to benchmark, as you can just calculate how big a branch will
get if you have this kind of information. But, comparing how branching
works in the different DVCS's and comparing their disk usage / branch
might still be interesting. I think _only_ comparing disk usage when
the approaches of branches are so different between the systems is
misleading and not very useful.


> I'd like to analyse where the space is being used in the repositories you've
> generated. I *think* it's probably in inventories but I'd like to confirm that.
> To give an idea of the difference possible, on one test repository with 6000+
> revisions (wordpress), I'm seeing a variation from 17MB to 78MB depending on
> how often inventory fulltexts are stored. The current fastimport algorithm -
> create an inventory fulltext every 200 and only every 200 - gives a repository
> size of 17.5MB so it's acceptable on my test repository but could well be
> lousy on other data sets.
>
> Ian C.
>



More information about the bazaar mailing list