Binary diffs for deb files

Matt Zimmerman mdz at ubuntu.com
Tue May 2 23:51:27 BST 2006


On Tue, May 02, 2006 at 11:24:26PM +0100, James Hall wrote:
> On Tue, 2006-05-02 at 14:50 -0700, Matt Zimmerman wrote:
> > This seems like a limitation of the implementation, surely the technique is
> > extensible to these formats.
> 
> The problem is it doesn't currently work on deb files and its not clear
> how much work would be needed to achieve this.

That seems like a minor hurdle compared to the more fundamental issues that
I see with your approach.

> On Tue, 2006-05-02 at 14:50 -0700, Matt Zimmerman wrote:
> > How much bandwidth would your proposed approach save, if
> > the problems with it were addressed somehow?
> 
> Binary diffs on uncompressed content will achieve better results than
> diffs on compressed content. Even with the --rsyncable options. Since it
> will only save space when large chunks of gzip data are the same.

In practice, having the absolute smallest delta is not the most important
consideration.  This is a problem which has been repeatedly discussed in
Debian, and deployment and maintainability issues have been paramount.  For
example, rsync itself has been found unsuitable because it scales very
poorly on the server side.

An rsync-like approach is much simpler and more flexible (note that it works
very well for our CD images, including the live CD whose compressed
structure is comparable to a .deb).

> This link seems to show zsync is no more efficient than rsync -z 
> http://zsync.moria.org.uk/paper/ch03s05.html

This is unsurprising, since zsync uses a variation of the rsync algorithm.

> On Tue, 2006-05-02 at 14:50 -0700, Matt Zimmerman wrote:
> > - The user doesn't have a copy of the original .deb from the
> > installation media (these aren't saved on the system)
> 
> Synaptic asks for the installation media, if none is given the new version
> is downloaded in its entirety.

It would be interesting to calculate a realistic total.

> On Tue, 2006-05-02 at 14:50 -0700, Matt Zimmerman wrote:
> > - The original installed package may have been superseded by a security
> > or bugfix update
> 
> This is solved by providing diffs between each package. Then towards the
> end of a distos life, diffs between these packages and the next distro are
> provided. All in the simple format:
> 
> packagename_oldversion_newversion_arch.diff (for example)
> 
> Diff's wouldn't *have* to be done for everything of course, maybe just the
> most useful ones such as OpenOffice if server space is an issue. If the
> client finds the diff they need is sat there, good stuff, if not, the
> download is only as slow as it was originally.

The space is a minor issue compared to the complexity.  Your approach
requires an NxM array of diffs, whereas an rsync-like approach requires only
one block manifest per package.  This is much easier to implement, and
doesn't need to be updated as frequently.

> On Tue, 2006-05-02 at 14:50 -0700, Matt Zimmerman wrote:
> > - Users who incrementally upgrade during development releases
> 
> Why would this need special thought?

Because an rsync-like approach works for those cases also, while yours does
not.  If one of the goals is to reduce overall bandwidth consumption, we
need to consider how the bandwidth is being used.

> Making diffs between each version is the only way to substantially reduce
> bandwidth. Security updates and text changes could be tiny.

Given that I reduce the size of my downloads using rsync every day, I can't
agree with this generalization.  It's up to you whether you would like to
pursue it, of course; you don't need to convince me.

-- 
 - mdz



More information about the ubuntu-devel mailing list