Bazaar repository size benchmarks

Pieter de Bie frimmirf at gmail.com
Tue Jun 3 13:27:52 BST 2008


On Tue, Jun 3, 2008 at 1:56 PM, Ian Clatworthy
<ian.clatworthy at internode.on.net> wrote:

> I'm sure your benchmarking was sound but I'm curious about a few of
> the results. The Mozilla benchmark doesn't gel with the figures given
> in http://www.infoq.com/articles/dvcs-guide. Any ideas why? I'd also
> like to know more about the heritage of the Emacs repo. Was it
> converted by bzr-svn or by fastimport? The former uses a different
> scheme for revision-ids. I wonder if bzr-fastexport/bzr-fastimport
> would change the size or not?

I have looked a bit at this. As for the Mozilla benchmark, I think
that they might have used the cvs-trunk-import as opposed to the
mozilla-central repository I used. This explains why they used "a
snapshot of 12456 changesets (from 20080303, 70853 total revisions
from the hg Repository)" whereas my mozilla-central repository is just
15000 revisions with full history.

The reason the Mercurial size is so high is perhaps because they did
something equivalent to "hg uncommit", which did not decrease
repository size. Bazaar imported only the 12456 changesets, thereby
giving a lower repository size than the Mercurial one. For the Git
import, they did a repack, but not one quite as aggressive as mine
(which really makes a difference after using git-fast-import), which
explains the relatively high repository size for Git.

For the Emacs repository, I used the Bazaar repository that is
available here: http://bzr.notengoamigos.org. In retrospect, I don't
know if the choice for Emacs was good, as this is just a cvs import
and does not contain any branching / merging or rename tracking, which
might influence the results. I wanted to put a really large repository
in the test, but Emacs might not have been the right choice. Also, I
don't know how this repository was created

> FWIW, I tweaked bzr-fastimport yesterday to have a new parameter that
> controls how often an inventory fulltext is stored. That might prove
> useful on repositories with deep histories. I also noticed in my
> experiments that bzr-fastimport can give repos of quite different
> sizes for *some* front-ends depending on how it's run. In particular:
>
>  bzr fast-import ../wordpress.fi -> 17.5MB
>  bzr fast-import ../wordpress.fi --info ../wordpress.cfg -> 20.5MB
>
> That's *bad* and indicates a bug, possibly in the blob caching?
> I have no idea whether that bug is impacting your results or not.
> I'm really busy for the next week or two so I honestly can't look
> into this right now. I do promise to come back to fastimport and
> space efficiency in general once other priorities are addressed.
> If anyone wants to jump in and investigate more before then,
> please go ahead.

I did not use the --info parameter so I don't know how this would have
influenced my results. The importing process took a long time, with
the Python based im/exporters being a factor 10-100 slower than Git's,
so I don't feel like importing it again :)



More information about the bazaar mailing list