Using zsync for .deb downloads: initial benchmark results
Lars Wirzenius
lars at ubuntu.com
Tue Jul 14 17:19:31 BST 2009
I'm working on a spec to add apt-sync support for karmic.
See https://wiki.ubuntu.com/AptSyncInKarmicSpec for details.
I've done some initial benchmarking of using zsync to download .debs for
updates, to see if apt-sync can be worth it. For simplicity, I've done
the benchmarking using the underlying zsync tool rather than apt-sync.
Background explanation: A .deb is, essentially, a very thin wrapper
around a couple of tar files, which can be compressed with gzip,
bzip2, or lzma. Zsync uses the rsync algorithm and some HTTP
features to implement rsync without having to have an rsync server
running, making it feasible to use it for all mirrors. apt-sync is
an implementation of the idea that when downloading updates to a
package you already have installed, there is no point in
re-downloading the unchanged parts of the package.
My benchmark consists of downloading all the security updates to hardy.
The goal of the benchmark is to get some figures for how good zsync
(and therefore apt-sync) actually is, or can be, at downloading .debs.
Here's the summary results:
scenario % saved comment
-----------------------------------
plain 3.7 plain zsync, original .debs
zsyncmakeZ 3.7 use zsync gzip magic, original .debs
rsyncable 25 gzip magic, recompress . with --rsyncable
rsyncable2 33 gzip magic, compress ar, not within tar
rsyncable3 33 gzip magic, convert lzma, bzip2 to gzip
uncompressed 50 uncompress tarballs within .debs
The percent saved is the number of bytes zsync did NOT need to download,
i.e., how much it reused from the previous package.
Further explanation and discussion:
* plain: This is using plain zsync, and original .debs.
* zsyncmakeZ: This uses the -Z option to zsyncmake when creating the
.zsync file, in order to use zsync's magic gzip handling. Turns out
that this doesn't help at all, compared to the plain scenario.
Unless we change how the archive generates .debs, plain or zsyncmakeZ
are the only options we can choose between, and they seem to be
identical, and neither of them is likely to be worth the effort.
* rsyncable: This makes it easier for zsync to do magic things with gzip,
by recompressing the gzipped tarballs within the .deb files with gzip
--rsyncable. This provies a lot of improvement. Saving a quarter of
the bandwidth is already fairly significant, especially since the
size impact on the .debs is less than 1%.
* rsyncable2: This tests whether zsync's gzip magic works better if the
gzip compression is the outermost layer. This is not a realistic option
for the archive, but provides a data point for comparisons. Turns out,
the differences are insignificant.
* rsyncable3: Some packages use lzma or bzip2 compression of the
tarballs within the .deb. This benchmark converts those to be
compressed with gzip --rsyncable. This improves things a bit compared
to just rsyncable, at a 17% increase in size compared to rsyncable.
Because most of the packages using lzma are OpenOffice.org related,
it is probably not realistic to make them use gzip --rsyncable due
to CD size limits, but it might be possible to use them for updates
that don't get put into CDs.
* uncompressed: This uncompresses all tarballs within the .deb, to give
a baseline for just how much zsync could save in an optimal situation.
I suspect the 25% value is a bit optimistic, since it comes from a
rather special case: security updates don't typically change the package
all that much. Backports and updates within a development cycle are
likely to change the packages much more. Upgrades from release to release
are also likely to change so much that zsync won't save a whole lot.
Before I continue working on this, I'd like to have some feedback on
this: is a 25% reduction in download sizes worthwhile to pursue? It
would seem to require changing dpkg to call the external gzip binary to
use --rsyncable, rather than use the internal zlib library.
What do other people think?
PS. I've fully automated my benchmark, and am happy to share the
scripts. If anyone wants to play with them, drop me a note, and I'll set
up a public bzr branch. You'll need fast access to a mirror, since they
download snapshots of hardy-security and the corresponding packages from
hardy, for a total of about three gigabytes.
More information about the ubuntu-devel
mailing list