bzr-usertest - a benchmarking toolkit for Bazaar and other?command?line (VCS) tools
Ian Clatworthy
ian.clatworthy at internode.on.net
Mon Mar 31 01:45:58 BST 2008
Teemu Likonen wrote:
> I did testing from this mentioned point of view: I dropped
> emacs-22.2.tar.gz into git and bzr repositories (add+commit), ensured
> that the repos are fully packed and then checked with "du -sh" how much
> the .git and .bzr directories weight. With bzr I ensured that there are
> no .bzr/repository/obsolete_packs. I got this:
>
> .git 39 MB
> .bzr 41 MB
>
> With Linux 2.6.24.4 tarball:
>
> .git 68 MB
> .bzr 78 MB
>
> I don't have a bzr repository of Linux kernel with full history but
> I have git and bzr repos of GNU Emacs. Repos aren't exactly the same as
> the git repo has over 2000 commits more. I don't know if there are some
> other things that may cause them to be uncomparable. Anyway, this is
> what I got:
>
> .git 166 MB
> .bzr 293 MB
Thanks for these numbers. The more hard data to have to work from, the
better. Some comments ...
I'm gradually working towards getting bzr-usertest capable of spitting
out figures like this, so we can easily re-run the benchmarks from
release to release, from OS to OS. Please consider extending
bzr-usertest to incorporate the steps you took.
One of the main reasons why git is currently more efficient than bzr for
deep repos (Emacs is 90k+ revisions right?) is that bzr stores a
fulltext copy of the inventory every 200 revisions by default. In the
short term, one way to get a smaller repository is to do the import into
Bazaar using bzr-fastimport - it lets you specify a different inventory
fulltext frequency (every 2000 for example). In the medium term, Robert
Collins has patches for a more efficient inventory format that will make
a difference here as well.
More generally, the heritage of a Bazaar repo import can have a
noticeable impact on overall size thanks to the way different importers
use revision-ids differently. Once again, I know that deep repositories
generated by bzr-fastimport (which uses random ids like Bazaar does
normally) tends to be a fair amount smaller than those generated by the
bzr-git plugin (which uses deterministic ids based on the git SHAs). For
that reason, I think it is advisable to note how the Bazaar repo was
generated when including it in benchmarks, and to be aware that bzr-git
(and I'm guessing bzr-svn) produces repos that aren't size optimised.
The other important data I'd like to see you mention is the size of the
working trees for the various projects. In my space benchmarking, I've
been careful to explicitly include them when comparing the tools. In my
experience, the working tree sizes often dominate the overall figures.
That's one of the reasons why Bazaar 1.3 now includes a --hardlink
option on the branch (and checkout) command that let's advanced users
hardlink files across working trees. That can save huge amounts even
with just 2 working trees, let alone for developers working in
feature-branch style.
Ian C.
More information about the bazaar
mailing list