Binary diffs for deb files

James Hall rio at x5g.com
Tue May 2 22:14:50 BST 2006


On Tue, 2006-05-02 at 22:30 +1000, dave kempe wrote:
> what about if you used the --rsyncable switch to gzip?
> from man gzip:
> --rsyncable

Matt Zimmerman wrote:
> zsync seems to be the most promising solution to this class of
problems.

But the zsync website states:
> If the content of a file is compressed, but not in a simple gzip
format, bear in mind that zsync may not be effective. Each compressed
stream can typically only be efficiently updated via the rsync method if
it is either completely unchanged, or the compression has been made
rsync-friendly (with, for example, gzip --rsync).
> 
> So, for example, zsync is useless for individual Debian or RPM package
files, and is useless for bzip2 files. 

Some other issues I have with zsync are:

1. How easy would it be to recompress all the packages with the gzip
--rsync flag?
2. How do we resign all these packages?
3. How much bandwidth would we save after doing both these things?
4. Is zsync finished? The site says its still in beta.

Some advantages of doing a binary diff on uncompressed deb packages:

1. *Much* smaller diffs (than binary diffs on gzip --rsyncable files)
2. The diffs themselves could be signed

Bandwidth Saving Stats:
------------------------

These are differences between the current version of software on Breezy
and Dapper to illustrate the possible bandwidth savings on a
dist-upgrade

2805228 gimp_2.2.8-2ubuntu6_i386.deb
2781226 gimp_2.2.11-1ubuntu1_i386.deb
607138  gimp_2.2.8-2ubuntu6_2.2.11-1ubuntu1_i386.diff
Saved bandwidth: 78.2%

1061534 synaptic_0.57.4ubuntu10_i386.deb
1035092 synaptic_0.57.8ubuntu10_i386.deb
235820  synaptic_0.57.4ubuntu10_0.57.8ubuntu10_i386.diff
Saved bandwidth: 77.2%

476670 gedit_2.12.1-0ubuntu1_i386.deb
590960 gedit_2.14.2-0ubuntu3_i386.deb
379656 gedit_2.12.1-0ubuntu1_2.14.2-0ubuntu3_i386.diff
Saved bandwidth: 35.8%

861612 gedit-common_2.12.1-0ubuntu1_all.deb
958216 gedit-common_2.14.2-0ubuntu3_all.deb
379656 gedit_2.12.1-0ubuntu1_2.14.2-0ubuntu3_i386.diff
Saved bandwidth: 60.3%

2508272 abiword_2.4.1-1ubuntu1_i386.deb
2510884 abiword_2.4.2-0ubuntu5_i386.deb
409418  abiword_2.4.1-1ubuntu1_2.4.2-0ubuntu5_i386.diff
Saved bandwidth: 83.7%

The last time I did a dist-upgrade I spent over 6 hours just downloading
packages. Imagine having this cut down to 3 hours! The biggest savings
to be made are on huge packages like Openoffice, where alot of the
resources (e.g. clip-art) remain exactly the same. This packages REALLY
waste bandwidth. 

I will carry on testing compression methods and will fully evaluate
zsync before dismissing it completely, although my initial feeling is it
will do very little. 

Currently there is no easy way I know of to make the repackaged ar files
have the same checksum as the original, even tho the contents are
identical. It seems dpkg does not use the proper 'ar' to make the
packages in the first place (as shown by not using forward slashes in
inside the 'ar'.) I'll keep working on this.

Regards,
James



More information about the ubuntu-devel mailing list