Scott Ritchie scott at open-vote.org
Wed Jul 15 00:59:53 BST 2009

```Lars Wirzenius wrote:
> Before I continue working on this, I'd like to have some feedback on
> this: is a 25% reduction in download sizes worthwhile to pursue? It
> would seem to require changing dpkg to call the external gzip binary to
> use --rsyncable, rather than use the internal zlib library.
>
> What do other people think?
>

I think it's worth doing even if it's only a 3% reduction in package
size.  The reason is that a small reduction in package size can result

Imagine a mirror with 1000 kb upload the day a new update package comes
out, and it's 1000 kb in size. Suppose, initially, 10 users try to
them would take 10 seconds to get the update -- until the connection
slows down.

5 seconds later, with everyone half done, another 30 users jump online.

20 seconds later, the initial 10 people finish, making their total
file, up their speed from 25 kb/sec to 33.3 kb/sec.  They finish the
other half at 500/33 = 15 seconds.

So the first 10 people took a total of 5+20=25 seconds, and the other 30
people took a total of 20+15=35 seconds.  This gives an average wait
time of (10*25 + 30*35) / 40 = 32.5 seconds.

Now, suppose the download amount is reduced by 25% with zsync, and the
package is only 750 kb in size.  Again, 10 users go at 100 kb/sec for 5
seconds, but this time they get 2/3 of the file (500/750 kb).  As
before, those other 30 jump on, so everyone's back down to 25 kb/sec.

Now it's only 10 seconds later until the initial 10 people finish,
before, they finish this last 500 kb in 15 seconds.

So the first 10 people took 5+10 = 15 seconds, and the other 30 people
took 10+15 = 25 seconds.  This time the average wait is (10*15 + 30*25)
/ 40 = 22.5 seconds

If we look at the numbers we find something very curious: A 25% drop in
file size resulted in over a 30% drop in average download time!  If you
to 15 seconds, a 40% reduction.

like waiting in line to buy a sandwich.  If it takes 4 minutes to make a
sandwich, and a new customer gets in line every 3 minutes, then there's
going to be a huge line by the end of lunch.  Cutting the time down to 3
minutes per sandwich means the line never grows, and for the poor guy
who comes in at the end of lunch it means he saves a whole lot of time.
As it turns out, 25% is the _minimum_ amount of time you save; when
you're alone at the deli you get your sandwich a minute quicker.

This isn't even considering secondary effects.  If we reduce the burden
on individual mirrors (and thus the expense of bandwidth), we'll likely
get more people willing to run mirrors, further easing the problem.

So, yes, I think it's worth it - and thank you Lars for doing these
benchmarks!

Thanks,
Scott Ritchie

```