Compressing packages with bzip2 instead gzip?

Phillip Susi psusi at cfl.rr.com
Wed Jan 18 15:41:16 GMT 2006


John C. McCabe-Dansted wrote:
> 
> Just to be clear, we do not have generate any deb files using --rsyncable. We 
> only have to put up .zsync files, zsync should do the rest. (although  at 
> present  zsync cannot recognise that the archive contains gzipped files).
> 


Therein lies the problem.  Since zsync isn't smart enough to look inside 
the deb and gunzip the tars before computing the deltas, it will compute 
the deltas against the deb as a whole, which will be very large without 
--rsyncable.  Even by using --rsyncable and giving up some compression, 
the diffs will still be larger than if zsync were smart enough to open 
the deb and gunzip the tars, and diff those.


Diffing the tars of course, still requires that the client have the 
original deb, which they likely won't, so if you are going to make zsync 
smart enough to gunzip the tars, you may as well make it smart enough to 
untar them as well, and sync the files contained inside.


> 
> I don't think it is possible to use zsync for .bz2 files. Between 	
> koffice-libs_1%3a1.4.1-0ubuntu7.{1,2}_i386.deb (4.7MB) 99% saving is achieved 
> with gzip, but with bzip2 no saving occurs. Even with "bzip2 -1" only 12.3% 
> of the bandwidth is saved.
> 


It seems the reason it works well on the original .deb is because dpkg 
actually does about what --rsyncable does when it gzips the tars, which 
is to say, compresses smaller blocks at a time rather than the entire 
file, so a 1 byte change in the uncompressed file will only radically 
change that compressed block, rather than the entire compressed stream 
from then out.

This is why if you gunzip the tar then gzip -9 it again, it will be 
smaller.


> Zsync files are only about 1%-2% of the size the deb, but also typically 
> require ~30% of the new deb to be downloaded.
> 
> My solution would be to just use bsdiff patches against data.tar and 
> control.tar. My experimentation has lead me to believe that a bsdiff patch is 
> typically 8% of the size of the whole deb . Hence putting up a bsdiff against 
> every file in  the official i386 Ubuntu CD should not use more than 60MB. 
> Putting up (n->n+1) patches for ten days would also allow people to follow 
> the latest version with minimal bandwidth.
> 


If zsync were hacked to decompress the tar.(gz, bz2, 7z) then sync that, 
the diffs would require even less space, and not be limited to working 
decently on the --rsyncable gzip debs.


> I know there is more than one official CD, but I suspect the total extra space 
> required on the mirrors would be insignificant anyway.
> 
> Perhaps we could add this as a feature of apt-torrent so that patches remain 
> so long as there exist seeds for them?
> 




More information about the ubuntu-devel mailing list