Using zsync for .deb downloads: initial benchmark results

Lars Wirzenius lars at ubuntu.com
Tue Jul 14 17:19:31 BST 2009


I'm working on a spec to add apt-sync support for karmic.
See https://wiki.ubuntu.com/AptSyncInKarmicSpec for details.

I've done some initial benchmarking of using zsync to download .debs for
updates, to see if apt-sync can be worth it. For simplicity, I've done
the benchmarking using the underlying zsync tool rather than apt-sync.

    Background explanation: A .deb is, essentially, a very thin wrapper
    around a couple of tar files, which can be compressed with gzip,
    bzip2, or lzma. Zsync uses the rsync algorithm and some HTTP
    features to implement rsync without having to have an rsync server
    running, making it feasible to use it for all mirrors. apt-sync is
    an implementation of the idea that when downloading updates to a
    package you already have installed, there is no point in
    re-downloading the unchanged parts of the package.

My benchmark consists of downloading all the security updates to hardy.
The goal of the benchmark is to get some figures for how good zsync
(and therefore apt-sync) actually is, or can be, at downloading .debs.

Here's the summary results:

    scenario        % saved     comment
    -----------------------------------
    plain            3.7        plain zsync, original .debs
    zsyncmakeZ       3.7        use zsync gzip magic, original .debs
    rsyncable       25          gzip magic, recompress . with --rsyncable
    rsyncable2      33          gzip magic, compress ar, not within tar
    rsyncable3      33          gzip magic, convert lzma, bzip2 to gzip
    uncompressed    50          uncompress tarballs within .debs

The percent saved is the number of bytes zsync did NOT need to download, 
i.e., how much it reused from the previous package.

Further explanation and discussion:

* plain: This is using plain zsync, and original .debs.

* zsyncmakeZ: This uses the -Z option to zsyncmake when creating the
  .zsync file, in order to use zsync's magic gzip handling. Turns out
  that this doesn't help at all, compared to the plain scenario.
  Unless we change how the archive generates .debs, plain or zsyncmakeZ 
  are the only options we can choose between, and they seem to be
  identical, and neither of them is likely to be worth the effort.

* rsyncable: This makes it easier for zsync to do magic things with gzip,
  by recompressing the gzipped tarballs within the .deb files with gzip
  --rsyncable. This provies a lot of improvement. Saving a quarter of
  the bandwidth is already fairly significant, especially since the
  size impact on the .debs is less than 1%.

* rsyncable2: This tests whether zsync's gzip magic works better if the
  gzip compression is the outermost layer. This is not a realistic option
  for the archive, but provides a data point for comparisons. Turns out,
  the differences are insignificant.
  
* rsyncable3: Some packages use lzma or bzip2 compression of the
  tarballs within the .deb. This benchmark converts those to be
  compressed with gzip --rsyncable. This improves things a bit compared
  to just rsyncable, at a 17% increase in size compared to rsyncable.
  Because most of the packages using lzma are OpenOffice.org related,
  it is probably not realistic to make them use gzip --rsyncable due
  to CD size limits, but it might be possible to use them for updates
  that don't get put into CDs.
  
* uncompressed: This uncompresses all tarballs within the .deb, to give
  a baseline for just how much zsync could save in an optimal situation.

I suspect the 25% value is a bit optimistic, since it comes from a
rather special case: security updates don't typically change the package
all that much. Backports and updates within a development cycle are
likely to change the packages much more. Upgrades from release to release
are also likely to change so much that zsync won't save a whole lot.

Before I continue working on this, I'd like to have some feedback on
this: is a 25% reduction in download sizes worthwhile to pursue? It
would seem to require changing dpkg to call the external gzip binary to
use --rsyncable, rather than use the internal zlib library.

What do other people think?

PS. I've fully automated my benchmark, and am happy to share the
scripts. If anyone wants to play with them, drop me a note, and I'll set
up a public bzr branch. You'll need fast access to a mirror, since they
download snapshots of hardy-security and the corresponding packages from
hardy, for a total of about three gigabytes.





More information about the ubuntu-devel mailing list