Recompressing knits

John Arbash Meinel john at arbash-meinel.com
Sat Aug 26 14:10:36 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthieu Moy wrote:
> Hi,
> 
> Just a remark which might allow a gain of a few percents in
> performances.
> 
> Today, bzr's knit format is a concatenation of gziped hunks. It has
> many advantages (in particular, the famous "append-only property"),
> but it's not as efficient as a globally zipped file.
> 
> For example, on the knit file for builtins.py, I get this:
> 
> $ wc -c builtins.py-20050830033751-fc01482b9ca23183.knit recompressed.gz 
> 1566727 builtins.py-20050830033751-fc01482b9ca23183.knit
> 1363079 recompressed.gz
> 
> where recompressed.gz is the result of gunzip+gzip on the knit file.
> 
> It might be good to have a command like "bzr optimize-repository"
> which could (optionnaly) be ran from time to time (cron job for
> example), to do this "gunzip+gzip" operation.
> 

But then you lose the ability to download only pieces of the knit. You
have to download the whole compressed chunk, to extract pieces from the
middle.

Right now, because of the index, if you already have entries 1,2,5,and
7, you can make a request for just 3,4,and 6.

It would probably be possible to change how knits work, such that you
can combine things into larger compressed hunks, and then bzr will
download the large hunk, and just extract the pieces it needs. (It might
be reasonable to do something like this for old, seldom accessed revisions).

But it is a pretty big change.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8Eg8JdeBCYSNAAMRAn4qAJ96omoq5yiFXMhuIHYGKdfTMrWDEQCfey+W
RJbtVJR86DAUsr/jh/2MYqM=
=rJwn
-----END PGP SIGNATURE-----




More information about the bazaar mailing list