Compressing packages with bzip2 instead gzip?

Phillip Susi psusi at cfl.rr.com
Thu Jan 19 03:07:54 GMT 2006


Aigars Mahinovs wrote:

>That would require either a huge amount of disk space or additional
>software on the server + a significant load on the server. That is no
>better then the rsync situation today as this is simply not acceptable
>to most mirrors.
>  
>

An apache module could be written to decompress the deb on demand, cache 
it, and serve the decompressed contents out of the cache.  That would 
not require a whole lot of cpu ( assuming a good cache hit rate ) or 
disk resources on the server, just some added software.  In the end it 
would save disk space on the servers because it would allow the packages 
to be compressed with 7zip instead of gzip, and would save bandwidth 
because people would not be downloading nearly as much data. 


John C. McCabe-Dansted wrote:

>This does *not* require that the uncompressed tar be stored on the server. In 
>fact the zsync manual recommends that you do not decompress the tar, so that 
>zsync can read the smaller compressed data.
>
>  
>

The zsync manual is foolish then.  It requires the tar to be 
uncompressed on the server because you want to zsync the uncompressed 
data, not the compressed data.  You change one byte in the uncompressed 
data and it ripples changes through the compressed stream from there to 
the end of the block, causing zsync to have to send a LOT more data.  
The gzip --rsyncable sets gzip to use small blocks to contain the 
ripples, but if you want good compression, you compress the entire file 
in one pass, and use a better algorithm like LZMA, but then zsync won't 
work very well because of the massive change ripples. 

>Zsync uses a clever trick which I think goes basically like this:
>
>$gzip_opts=detect which options were used to create data.tar.gz
>tail -f data.tar data.tar | gzip $gzip_opts > data.tar.gz &
>if next block b matches
>then 
>	cat b >> data.tar
>else
>	read data from http://server/data.tar.gz until there is enough data in the 
>local copy of data.tar.gz to reconstruct block b. 
>fi
>
>  
>


That code contradicts your original statement by using the uncompressed 
tars, which you said the server did not need to store.  Other than that, 
I can't make much sense out of it. 






More information about the ubuntu-devel mailing list