Using zsync for .deb downloads: initial benchmark results
scott at open-vote.org
Wed Jul 15 00:59:53 BST 2009
Lars Wirzenius wrote:
> Before I continue working on this, I'd like to have some feedback on
> this: is a 25% reduction in download sizes worthwhile to pursue? It
> would seem to require changing dpkg to call the external gzip binary to
> use --rsyncable, rather than use the internal zlib library.
> What do other people think?
I think it's worth doing even if it's only a 3% reduction in package
size. The reason is that a small reduction in package size can result
in a much bigger reduction in actual download wait time.
Imagine a mirror with 1000 kb upload the day a new update package comes
out, and it's 1000 kb in size. Suppose, initially, 10 users try to
download the update simulatneously, each getting 100 kb/sec. Each of
them would take 10 seconds to get the update -- until the connection
5 seconds later, with everyone half done, another 30 users jump online.
So now we have 10 people half done, downloading at 25 kb/sec, and 30
people just starting, also downloading at 25 kb/sec.
20 seconds later, the initial 10 people finish, making their total
download time 25 seconds. Then the other 30, having downloaded half the
file, up their speed from 25 kb/sec to 33.3 kb/sec. They finish the
other half at 500/33 = 15 seconds.
So the first 10 people took a total of 5+20=25 seconds, and the other 30
people took a total of 20+15=35 seconds. This gives an average wait
time of (10*25 + 30*35) / 40 = 32.5 seconds.
Now, suppose the download amount is reduced by 25% with zsync, and the
package is only 750 kb in size. Again, 10 users go at 100 kb/sec for 5
seconds, but this time they get 2/3 of the file (500/750 kb). As
before, those other 30 jump on, so everyone's back down to 25 kb/sec.
Now it's only 10 seconds later until the initial 10 people finish,
leaving the server with 30 people downloading 500 kb at 33.3 kb/sec. As
before, they finish this last 500 kb in 15 seconds.
So the first 10 people took 5+10 = 15 seconds, and the other 30 people
took 10+15 = 25 seconds. This time the average wait is (10*15 + 30*25)
/ 40 = 22.5 seconds
If we look at the numbers we find something very curious: A 25% drop in
file size resulted in over a 30% drop in average download time! If you
were one of those 10 initial users, your download time dropped from 25
to 15 seconds, a 40% reduction.
The reason is that downloading packages from a crowded mirror is a lot
like waiting in line to buy a sandwich. If it takes 4 minutes to make a
sandwich, and a new customer gets in line every 3 minutes, then there's
going to be a huge line by the end of lunch. Cutting the time down to 3
minutes per sandwich means the line never grows, and for the poor guy
who comes in at the end of lunch it means he saves a whole lot of time.
As it turns out, 25% is the _minimum_ amount of time you save; when
you're alone at the deli you get your sandwich a minute quicker.
This isn't even considering secondary effects. If we reduce the burden
on individual mirrors (and thus the expense of bandwidth), we'll likely
get more people willing to run mirrors, further easing the problem.
So, yes, I think it's worth it - and thank you Lars for doing these
More information about the ubuntu-devel