Using zsync for .deb downloads: initial benchmark results
Paul Sladen
ubuntu at paul.sladen.org
Wed Jul 15 01:37:43 BST 2009
On Tue, 14 Jul 2009, Martin Pitt wrote:
> Lars Wirzenius [2009-07-14 19:19 +0300]:
> > is a 25% reduction in download sizes worthwhile to pursue?
A daily fetch of Packages.gz for main/restricted/universe/multiverse/* is
~10MB (IIRC), times 30 days per month and X million users, this is where the
low-hanging fruit is likely to be.
> I had expected something like a 90% saving
To do it properly requires fixing the compiler and linker to make the same
address/load/optimisation choices as in the previous run. (So that a
two-line patch only creates a ten-byte binary churn).
It then requiries a gzip that compresses using the same restart and
dictionary choices as in the previous run. (So that the ten-byte binary
change only results in a <1kB compressed bitstream churn).
...
Three years ago, zsync fitted on the time/disk-space curve very well and
represented an excellent opportunity for a clever optimisation without
burning diskspace. With the increased deployment of LZMA/Bzip2 my personal
humble opinion is that zsync-style methods applied *only* for Packages.gz
make sense and instead to focus on bsdiff-based delta-diffs for the .deb
packages themselves. Diskspace is now cheaper than bandwidth (particularly
the type over GPRS) and the issue of the mirror growing 20% to store those
deltadebs is not a problem.
Deflate, bzip2, lzma are merely transmission encodings.
The thing that is holding back the possibility of *all* of these fancy delta
methods is that our secure distribution is based on the signing hashes of
*encoded data*, not of the content within it.[2] Until that is changed,
deploying most of these optimisations is over-complicated (and held back)
because of the requirement to have to recompress the exact same data.tar.gz,
rather than just produce the _equivalent_ uncompressed data.tar.
-Paul
[1] --rsyncable adds lots of restart markers, to increase the likelyhood
that some will align. There is no need to do this (for this application),
it merely necessary to re-use the ones in the previous compression run.
[2] It is not possible to just recompress a data.tar.gz->data.tar.bz2
because the signature will change; despite the *content* being the same.
It is not possible to use the advcomp implementation instead of gzip as this
would produce a different (hopefully smaller) encoding, with a different
hash.
--
Why do one side of a triangle when you can do all three. Somewhere, GB.
More information about the ubuntu-devel
mailing list