compressbench

Mon Jan 19 05:45:31 GMT 2009

I've been working up a benchmark for compression.

Its a little rough, so only go peeking at this if you intend on hacking
on it.

http://bazaar.launchpad.net/~lifeless/+junk/bzr-compressbench

It runs against the inventories in one of my repos today. In summary:
3.4GB of raw data
141MB in knits
500KB in gc

gc takes 31 seconds to compress
knits take more - typically 121 seconds

gc is currently slower than knit to decompress 16ms vs 5ms (average).
Much of this overhead is decompression: knits have a cap of twice the
compressed size of a text between full texts, the gc group is creating a
single group with all 22874 texts in it.

I'm going to investigate adding a cap on the raw size of twice the raw
input size, and see how that changes the size/performance tradeoffs.

example output (there was some system noise during the knit compression,
so these numbers are out there)
$ bzr compressbench
Extracting corpus to test with
Corpus size 3461880213 bytes in 22874 texts
Testing gc compression
compression finished in 31.5650629997 seconds. 0.00137995379032 s/text
Compressed size 508555
durations:
min 0.00339198112488
max 0.0679371356964
avg 0.0165324013394
dev 0.00705757324336
per_k_times:
min 7.37465481619e-05
max 0.00147705104043
avg 0.000359438182797
dev 0.000153441792845
Testing knit compression
compression finished in 235.808960915 seconds. 0.0103090391237 s/text
Compressed size 141152371
durations:
min 0.00239777565002
max 0.674364089966
avg 0.00578923002139
dev 0.0128237186079
per_k_times:
min 5.21310912254e-05
max 0.0146616452181
avg 0.000125866187008
dev 0.000278806086212

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090119/405df391/attachment.pgp