Bazaar-NG vs. Mercurial -- speed comparison
John Arbash Meinel
john at arbash-meinel.com
Thu May 18 21:52:09 BST 2006
Bryan O'Sullivan wrote:
> On 5/18/06, Jan Hudec <bulb at ucw.cz> wrote:
>
>> No, it's not a plain http. It's a mercurial protocol over http and
>> requires
>> mercurial server.
>
> No, you can serve a plain repository over HTTP (i.e. just the files in
> .hg) without a CGI server. It's just quite slow (i.e. much slower than
> using the CGI), so we don't push it as a feature.
And I think this is a very valid statement. I think it would be nice for
bzr to also support a more advanced protocol (and this is indeed in our
TODO list), but be able to fall back to plain http. In the mean-time, I
think we want to get plain http support as fast as possible, since this
will also likely speed up ftp and sftp support, and means people don't
need to do anything on the server end.
>> What I don't know is how knits and revlogs compare in number of blocks
>> in the
>> scatter/gather read request.
>
> My observation was that knit files and indices seem to be bigger than
> our files (i.e. .bzr is almost 2x the size of .hg when storing the
> same data), so I don't know how they compare on individual accesses,
> but more data on disk presumably translates to more reading at some
> point.
>
Well, there have been a few specific design differences.
1) revlog index files are binary chunks of fixed sizes
knit index files are ascii text delimited by ':\n'
We chose this because:
a) You can open up a .kndx file in a text editor, which is good while
debugging
b) Our revision ids are not fixed size like revlog. So while we could
pick a size which should contain everything, it isn't guaranteed
2) knit files are chunks compressed with gzip rather than zlib (which I
think is what revlog uses)
a) You pay about 10% for this, in return you can do
zcat foo.knit | vim -
And read the raw data. This is good both because of debugging, and
because if there was ever a problem, there are still common tools
which would give you access to the original data
b) We annotate each line with the complete revision id. Where revlog
doesn't annotate at all. At one point we annotated with a
dictionary-compressed integer, but it was only 10% bigger after
gzip compression, and it means you don't have to modify the
annotations when you merge them into a different branch. (so you
don't have to uncompress them at all).
c) mercurial uses a binary delta algorithm. I assume this means
that it stores deltas that are smaller than one line. So if I do:
some long sentence with a small typeo
=>
some long sentence with a small typo
mercurial can just store the 'remove "e"' while bzr will store a
line delta, which requires the whole line.
d) Our inventory is still XML, and we store all attributes on a single
line, which means any change to a file and all of the attributes
for that file are saved. This only effects inventory.knit, but it
has a pretty large effect on it. (I did some testing about changing
the inventory format, and found you could get rather large savings
if you were careful about the format and how it interacted with the
delta algorithm).
Anyway, we should be aware of what the differences are, so we can decide
which ones we want to keep, and what we can get rid of. Just to say, I
wouldn't be surprised if bzr's knits take 50% more space, but I would be
surprised if it was >2x.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060518/74c9e82b/attachment.pgp
More information about the bazaar
mailing list