Binary diffs for deb files

Matt Zimmerman mdz at ubuntu.com
Tue May 2 22:50:50 BST 2006


On Tue, May 02, 2006 at 10:14:50PM +0100, James Hall wrote:
> Matt Zimmerman wrote:
> > zsync seems to be the most promising solution to this class of
> problems.
> 
> But the zsync website states:
> > If the content of a file is compressed, but not in a simple gzip
> format, bear in mind that zsync may not be effective. Each compressed
> stream can typically only be efficiently updated via the rsync method if
> it is either completely unchanged, or the compression has been made
> rsync-friendly (with, for example, gzip --rsync).
> > 
> > So, for example, zsync is useless for individual Debian or RPM package
> files, and is useless for bzip2 files. 

This seems like a limitation of the implementation, surely the technique is
extensible to these formats.

> Some other issues I have with zsync are:
> 
> 1. How easy would it be to recompress all the packages with the gzip
> --rsync flag?

Trivial.  At the appropriate time, we enable this flag by default for
dpkg-deb, and all builds from that point forward produce rsyncable debs.

> 2. How do we resign all these packages?

Packages aren't signed; the archive is, and it's automatically signed every
time new packages are published.  Your binary diffs, on the other hand,
create packages with new checksums which will fail validation.

> 3. How much bandwidth would we save after doing both these things?

That's a question which requires extensive analysis, for any proposed
implementation.  How much bandwidth would your proposed approach save, if
the problems with it were addressed somehow?

> 4. Is zsync finished? The site says its still in beta.

Software is never finished. :-)

> Some advantages of doing a binary diff on uncompressed deb packages:
> 
> 1. *Much* smaller diffs (than binary diffs on gzip --rsyncable files)

Not if you consider the general case.  Using binary diffs, you need to store
deltas between multiple versions of the package, whereas the rsync/zsync
approach works between any pair of files.

> 2. The diffs themselves could be signed

Any stream of bytes can be signed.

> Bandwidth Saving Stats:
> ------------------------
> 
> These are differences between the current version of software on Breezy
> and Dapper to illustrate the possible bandwidth savings on a
> dist-upgrade

Here are some examples of cases you haven't considered:

- The user doesn't have a copy of the original .deb from the installation
  media (these aren't saved on the system)

- The original installed package may have been superseded by a security or
  bugfix update

- Users who incrementally upgrade during development releases

-- 
 - mdz



More information about the ubuntu-devel mailing list