Bazaar-NG vs. Mercurial -- speed comparison

Sat May 13 07:44:08 BST 2006

Diwaker Gupta wrote:
> Bryan Sullivan (of Mercurial) recently posted this benchmark:
> <http://lists.freestandards.org/pipermail/lsb-futures/2006-May/002080.html>
> 
> I know that some of the speed difference is due to the fact that bzr
> doesn't need a specialized server at the other end point -- it can
> pull natively over HTTP, SFTP and so on. But in that case, I think we
> should emphasize this point strongly in the feature list. As someone
> who is new to both systems, the above speed comparison numbers will
> easily bias one towards Mercurial. I'm not saying Mercurial is bad --
> I use it on a daily basis and its *great*. All I'm saying is that bzr
> should play to its strengths.
> 
> Diwaker

I wanted to post a little bit of a rebuttal to this. But first I would
like to say that Mercurial really does show off as a fast little system.

First, mercurial use a custom server rather than working off of 'plain'
http/sftp. This does give it a huge latency advantage. It has some
drawbacks, as in it is another thing that needs to be setup, holes
opened in firewalls, etc. Though honestly 'hg serve' isn't real hard to
setup. And it is something that we want to pay attention to. I think we
want to support it, just not require it.

mercurial uses a python extension (C code) to do diff & patch. Which can
certainly be a bottleneck (though I'm thinking bzr's might be using XML,
and certainly used to be how it handled weaves). I should revisit my
performance testing with knits. Mercurial has used 'revfiles' for a long
time, which are very similar to knits.
We might consider looking into using a similar diff & patch, though it
would mean requiring some sort of build stage for bzr. Right now it is
very nice that bzr just works from the source tree. (Also, on one of my
production servers, I don't install gcc, which makes mercurial difficult
to install, and it is where I host some repositories)

hg has a lot less code which is imported by default. So a plain 'hg
root' takes only 0.1s rather than 'bzr root' which takes 1.1s.
Now, if you switch and use bzrtools' 'bzr shell' command, which leaves
the bzrlib code in memory, stuff like 'bzr root' again becomes very
fast. Again, we could look at what we import when. I did a cleanup at
one point in time, which helped simplistic tests like that. Though long
term we found out that delayed imports can be costly inside internal loops.

Bzr could be better about not having to load support for all of its
features until they are actually needed. 'hg' actually uses a solution
called 'demandload', which we probably could just move directly into the
bzr code. I'm not sure if this imposes much of a runtime overhead,
though I do see at least one more __getattribute__ function call per module.

As to the specific benchmarks...
Timing hg clone of hg code isn't quite the same as timing bzr.dev code.
bzr.dev has 5002 revisions, while hg has 2253 changesets (4553 changes).
So there is at least a factor of 2 there. Not huge, but not trivial.

Also, there is the raw amount of data:
$ du -ksh bzr.dev/ mercurial/
28M     bzr.dev
5.5M    mercurial

Mercurial has 2MB of source files, and bzr.dev has 4MB. So while
mercurial is about 1/5th the size of bzr, it also has 1/2 the code, and
1/2 the revisions, so I'm guessing it is only compressing slightly
better than bzr.

In a local network, this is what I get:

$ time hg clone http://juju.arbash-meinel.com:8000/
real    0m18.448s  user    0m5.906s   sys     0m4.346s

$ time bzr get http://bzr.arbash-meinel.com/mirrors/bzr/bzr.dev/ http
real    1m49.052s  user    0m34.059s  sys     0m10.676s

$ time bzr get sftp://juju/srv/bzr/public/mirrors/bzr/bzr.dev/ sftp
real    1m41.964s  user    0m36.068s  sys     0m10.979s

So bzr still needs to do some catching up, but in a local network it is
only 6x slower. (Honestly I thought sftp would spank http, I don't know
whether this is good or bad :)

I can say that the max theoretical speed for bzr would be:
$ time rsync -av juju:/srv/bzr/public/mirrors/bzr/.bzr/ xxx
real    0m15.583s  user    0m1.890s   sys     0m2.482s

hg does a lot better, but it also isn't copying nearly as much data around:
$ time rsync -av juju:dev/hg/mercurial
real    0m3.940s  user    0m0.381s    sys     0m0.849s

hg is 4-5 times slower than rsync, while bzr is 6.5 times slower.

Remote network:
$ time hg clone http://catharsis.i-clic.uihc.uiowa.edu:8000/ tmp
real    0m17.449s  user    0m5.931s   sys     0m4.561s

$ time bzr get http://src.i-clic.uihc.uiowa.edu/bzr/bzr/bzr.dev xxx
real    6m25.014s  user    0m32.949s  sys     0m9.863s

$ time bzr get sftp://src/srv/bzr/bzr/bzr.dev/ yyy
real    7m46.863s  user    0m43.455s  sys     0m13.142s

this shows hg as 21x faster, though you do have to still count the fact
that hg is copying only 1/5th the amount of data.

Local clone:
'hg' uses hardlinks when copying its repositories, which means it saves
disk space, and definitely saves time (on Linux).
The downside is that fat32 doesn't support them, technically NTFS does,
though some people fear it (I think they should be fine, though I don't
have much experience myself).
I *do* know that on Mac OSX (HFS+), they are abysmal. Coming from Arch
which loved hardlinked revlibs, I could measure how much slower it was
to use hardlinks than plain copying. (It has to do with how HFS+ stores
hardlink entries: *badly*).

bzr decided to use a different solution, repositories.

So to be fair, lets try this:

$ \time hg clone mercurial/ xxx
2.20user 0.36system 0:02.58elapsed

$ \time bzr init-repo bzr
1.24user 0.17system 0:01.41elapsed

$ cd bzr/
$ \time bzr branch ~/bzr/mirrors/bzr/bzr.dev/ bzr.dev
51.69user 4.19system 0:59.35elapsed

(Yes, this first copy is much slower)

$ \time bzr branch bzr.dev bzr-dev2
Branched 1706 revision(s).

1.71user 0.23system 0:01.97elapsed

Notice that this time, we actually create a new branch faster than hg
without using hardlinks, and in a system which has approximately a 1.1s
startup overhead. Also, this can be done from remote, so you can create
a new branch over sftp with very little overhead. (I don't know if sftp
supports hardlinking or not) Though I suppose since hg uses a custom
server, they could still make branching on the remote end stay cheap.

Now, I admit to a little bit of cheating, in that I didn't create
working trees here. This does have relevance when you have public
repositories (meaning it is fairly cheap However, so let me go fix that:

$ rm .bzr/repository/no-working-trees
$ \time bzr branch bzr.dev bzr-dev3
12.37user 1.17system 0:14.43elapsed

Creating the working tree is pretty slow in bzr, and it is something we
need to look closely at.

So I think hg definitely does have a lot of things we should look
closely at. But I did want to make people aware that hg isn't 30x faster
than bzr. In many cases it is more in the 4-7x range.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060513/7eb0e598/attachment.pgp