Recompressing knits

Aaron Bentley aaron.bentley at utoronto.ca
Sat Aug 26 19:05:04 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthieu Moy wrote:
> Hi,
> 
> Just a remark which might allow a gain of a few percents in
> performances.
> 
> Today, bzr's knit format is a concatenation of gziped hunks. It has
> many advantages (in particular, the famous "append-only property"),
> but it's not as efficient as a globally zipped file.
> 
> For example, on the knit file for builtins.py, I get this:
> 
> $ wc -c builtins.py-20050830033751-fc01482b9ca23183.knit recompressed.gz 
> 1566727 builtins.py-20050830033751-fc01482b9ca23183.knit
> 1363079 recompressed.gz
> 
> where recompressed.gz is the result of gunzip+gzip on the knit file.
> 
> It might be good to have a command like "bzr optimize-repository"
> which could (optionnaly) be ran from time to time (cron job for
> example), to do this "gunzip+gzip" operation.

We don't usually ungzip the entire file.  We open the knit index, seek
to the piece we want, and ungzip just that.  This should be faster than
ungzipping the whole file.  gzipping the whole file as one piece would
likely slow things down, and would break existing clients.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8I1Q0F+nu1YWqI0RApffAJ9cBXdjam3KzYYsFErikdH5/pfG/QCdHyuq
ictzSfcmGFQ3PFu4jNWEn18=
=2830
-----END PGP SIGNATURE-----




More information about the bazaar mailing list