Compressing packages with bzip2 instead gzip?

Phillip Susi psusi at cfl.rr.com
Wed Jan 18 17:04:31 GMT 2006


Paul Sladen wrote:
> 
> Assuming there is a small change and zsync, there are three options:
> 
>   gzip, treated as data:  Download ~100% of file
>   gzip with look-inside:  Download ~30%-40% of file
>   gzip with --rsyncable:  Download ~ 5%-10% of file
> 


You got that backwards.  gzip with look-inside will give smaller deltas 
than gzip with --rsyncable.  The look-inside method also is usable on 
bzip2 and 7zip, so you can get better compression.


> (zsync needs teaching where to find the start of the two zlib streams within
> a .deb for the second two.  The .debs need repacking for the last one).
> 

Also not true.  zsync does not need teaching to find the zlib streams 
unless it is going to look inside them.  If it isn't looking inside the 
gzip streams, then it is syncing the entire deb, which is just an ar of 
the two gzip tars, so there's just a few extra bytes of header, which 
won't bother zsync at all.  You also don't need to repack the debs for 
the last one since dpkg already basically builds with --rsyncable.

> 
> Zsync can be taught that restart points occur every 900kB, likewise zsync
> could be taught to find the restart markers in LZMA, but there's even less
> of those.  You have a choice of optimising for two cases:

The data stream DOESN'T restart every 900 KB, that's the problem. 
That's what gzip --rsyncable does, and bzip2 and 7zip don't support that 
( and don't want to since it sacrifices compression ).  When you do 
cause data restarts every x KB, zsync doesn't require any special 
handling to sync them efficiently, the restarts themselves contain the 
fallout from a small change in the original uncompressed stream, which 
means there is less changed data zsync has to find and send.


> 
>   Smallest size of .debs on the CD image
>   Least bandwidth used during a network upgrade
> 
> Pick either.
> 


I'll take both ;)


>> My solution would be to just use bsdiff patches against data.tar and 
>> control.tar.
> 
> This is O(N^2), right?  Whereas zsync is O(1).
> 
> Yes, 'apt-torrent' could be used to fetch the chucks (instead of partial
> HTTP), once 'zsync' has figured out what they are.
> 
> 	-Paul

apt-torrent doesn't make any sense, since torrents need to be rather 
large to be effective, and packages are rather small.  Otherwise it will 
take forever to fetch small packages using torrents.

Bittorrent makes sense when downloading the entire install cd image, not 
for downloading individual packages.




More information about the ubuntu-devel mailing list