Compressing packages with bzip2 instead gzip?

John C. McCabe-Dansted gmatht at gmail.com
Thu Jan 19 02:47:24 GMT 2006


On Thursday 19 January 2006 04:50, Paul Sladen wrote:
> On Thu, 19 Jan 2006, John C. McCabe-Dansted wrote:
> > On Thursday 19 January 2006 02:41, Paul Sladen wrote:
> > > The big problem is that the server-side data also needs to be
> > > uncompressed.
> >
> > Just to be clear, we do not have generate any deb files using
> > --rsyncable.
>
> Assuming there is a small change and zsync, there are three options:
>
>   gzip, treated as data:  Download ~100% of file
>   gzip with look-inside:  Download ~30%-40% of file
>   gzip with --rsyncable:  Download ~ 5%-10% of file

In my testing, at
  http://www.livejournal.com/users/flyingreptile/101020.html
I found that with look-inside zsync needed to download about 30% of 
data.tar.gz. 

Take for example tetex...3.3 -> tetex...3.4:

 Adding the --rsyncable flag did not help, with lookinside it required 30.8% 
of the file either way. Using --rsyncable without lookinside (using the -Z 
flag to generate the .zsync file), we had to download 43.6% of the file. 
Using -Z without --rsyncable we had to download 87.6% of the file.

To get downloads in the range 5%~10%, you basically have to use bsdiff.

> > My solution would be to just use bsdiff patches against data.tar and
> > control.tar.
>
> This is O(N^2), right?  Whereas zsync is O(1).

You mean O(N^2) vs O(N) right?

Yes, if you put patches up against every possible combination of debs it is 
O(N^2), however if you only put up patches of the form:

1) diff X-in-OfficialCD X-latest
2) Diff Xn-to-Xn+1 (occured in last ten days)

At 8% per patch, (1) should require only 56MB per platform even if every 
single package on the official CD has to be updated. This would also mean 
that a dial up user like my mother with a monthly quota of 80MB could put an 
ubuntu CD in and get it up to date without blowing their whole quota in one 
go. 

Given that you should do regular upgrades to keep the latest security patches, 
(2) should be sufficient to keep your self up to date. Given the frequency of 
updates, I would imagine (2) would only require several MB per platform on 
the mirrors.

The bsdiff approach has two main advantages over zsync:
- about four times less bandwidth required.
- works with compression other than gzip.

> Yes, 'apt-torrent' could be used to fetch the chucks (instead of partial
> HTTP), once 'zsync' has figured out what they are.

I was thinking more that if users for some reason need a patch other than (1) 
or (2) they could look for another user who has the same patch. Bittorent 
might be too heavyweight for this. A lighter protocol where the tracker just 
allows clients to add and follow links to adhoc ftp servers, might be more 
appropriate.


-- 
John C. McCabe-Dansted
Masters Student



More information about the ubuntu-devel mailing list