Compressing packages with bzip2 instead gzip?
John C. McCabe-Dansted
gmatht at gmail.com
Thu Jan 19 02:47:24 GMT 2006
On Thursday 19 January 2006 04:50, Paul Sladen wrote:
> On Thu, 19 Jan 2006, John C. McCabe-Dansted wrote:
> > On Thursday 19 January 2006 02:41, Paul Sladen wrote:
> > > The big problem is that the server-side data also needs to be
> > > uncompressed.
> >
> > Just to be clear, we do not have generate any deb files using
> > --rsyncable.
>
> Assuming there is a small change and zsync, there are three options:
>
> gzip, treated as data: Download ~100% of file
> gzip with look-inside: Download ~30%-40% of file
> gzip with --rsyncable: Download ~ 5%-10% of file
In my testing, at
http://www.livejournal.com/users/flyingreptile/101020.html
I found that with look-inside zsync needed to download about 30% of
data.tar.gz.
Take for example tetex...3.3 -> tetex...3.4:
Adding the --rsyncable flag did not help, with lookinside it required 30.8%
of the file either way. Using --rsyncable without lookinside (using the -Z
flag to generate the .zsync file), we had to download 43.6% of the file.
Using -Z without --rsyncable we had to download 87.6% of the file.
To get downloads in the range 5%~10%, you basically have to use bsdiff.
> > My solution would be to just use bsdiff patches against data.tar and
> > control.tar.
>
> This is O(N^2), right? Whereas zsync is O(1).
You mean O(N^2) vs O(N) right?
Yes, if you put patches up against every possible combination of debs it is
O(N^2), however if you only put up patches of the form:
1) diff X-in-OfficialCD X-latest
2) Diff Xn-to-Xn+1 (occured in last ten days)
At 8% per patch, (1) should require only 56MB per platform even if every
single package on the official CD has to be updated. This would also mean
that a dial up user like my mother with a monthly quota of 80MB could put an
ubuntu CD in and get it up to date without blowing their whole quota in one
go.
Given that you should do regular upgrades to keep the latest security patches,
(2) should be sufficient to keep your self up to date. Given the frequency of
updates, I would imagine (2) would only require several MB per platform on
the mirrors.
The bsdiff approach has two main advantages over zsync:
- about four times less bandwidth required.
- works with compression other than gzip.
> Yes, 'apt-torrent' could be used to fetch the chucks (instead of partial
> HTTP), once 'zsync' has figured out what they are.
I was thinking more that if users for some reason need a patch other than (1)
or (2) they could look for another user who has the same patch. Bittorent
might be too heavyweight for this. A lighter protocol where the tracker just
allows clients to add and follow links to adhoc ftp servers, might be more
appropriate.
--
John C. McCabe-Dansted
Masters Student
More information about the ubuntu-devel
mailing list