Binary diffs for deb files
Matt Zimmerman
mdz at ubuntu.com
Tue May 2 22:50:50 BST 2006
On Tue, May 02, 2006 at 10:14:50PM +0100, James Hall wrote:
> Matt Zimmerman wrote:
> > zsync seems to be the most promising solution to this class of
> problems.
>
> But the zsync website states:
> > If the content of a file is compressed, but not in a simple gzip
> format, bear in mind that zsync may not be effective. Each compressed
> stream can typically only be efficiently updated via the rsync method if
> it is either completely unchanged, or the compression has been made
> rsync-friendly (with, for example, gzip --rsync).
> >
> > So, for example, zsync is useless for individual Debian or RPM package
> files, and is useless for bzip2 files.
This seems like a limitation of the implementation, surely the technique is
extensible to these formats.
> Some other issues I have with zsync are:
>
> 1. How easy would it be to recompress all the packages with the gzip
> --rsync flag?
Trivial. At the appropriate time, we enable this flag by default for
dpkg-deb, and all builds from that point forward produce rsyncable debs.
> 2. How do we resign all these packages?
Packages aren't signed; the archive is, and it's automatically signed every
time new packages are published. Your binary diffs, on the other hand,
create packages with new checksums which will fail validation.
> 3. How much bandwidth would we save after doing both these things?
That's a question which requires extensive analysis, for any proposed
implementation. How much bandwidth would your proposed approach save, if
the problems with it were addressed somehow?
> 4. Is zsync finished? The site says its still in beta.
Software is never finished. :-)
> Some advantages of doing a binary diff on uncompressed deb packages:
>
> 1. *Much* smaller diffs (than binary diffs on gzip --rsyncable files)
Not if you consider the general case. Using binary diffs, you need to store
deltas between multiple versions of the package, whereas the rsync/zsync
approach works between any pair of files.
> 2. The diffs themselves could be signed
Any stream of bytes can be signed.
> Bandwidth Saving Stats:
> ------------------------
>
> These are differences between the current version of software on Breezy
> and Dapper to illustrate the possible bandwidth savings on a
> dist-upgrade
Here are some examples of cases you haven't considered:
- The user doesn't have a copy of the original .deb from the installation
media (these aren't saved on the system)
- The original installed package may have been superseded by a security or
bugfix update
- Users who incrementally upgrade during development releases
--
- mdz
More information about the ubuntu-devel
mailing list