robertc at robertcollins.net
Mon Jan 19 05:45:31 GMT 2009
I've been working up a benchmark for compression.
Its a little rough, so only go peeking at this if you intend on hacking
It runs against the inventories in one of my repos today. In summary:
3.4GB of raw data
141MB in knits
500KB in gc
gc takes 31 seconds to compress
knits take more - typically 121 seconds
gc is currently slower than knit to decompress 16ms vs 5ms (average).
Much of this overhead is decompression: knits have a cap of twice the
compressed size of a text between full texts, the gc group is creating a
single group with all 22874 texts in it.
I'm going to investigate adding a cap on the raw size of twice the raw
input size, and see how that changes the size/performance tradeoffs.
example output (there was some system noise during the knit compression,
so these numbers are out there)
$ bzr compressbench
Extracting corpus to test with
Corpus size 3461880213 bytes in 22874 texts
Testing gc compression
compression finished in 31.5650629997 seconds. 0.00137995379032 s/text
Compressed size 508555
Testing knit compression
compression finished in 235.808960915 seconds. 0.0103090391237 s/text
Compressed size 141152371
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090119/405df391/attachment.pgp
More information about the bazaar