Binary diffs for deb files
James Hall
rio at x5g.com
Thu May 4 01:29:13 BST 2006
On Wed, 2006-05-03 at 21:15 +1200, john wrote:
> > 2. How do we resign all these packages?
>
> I believe the consensus was that it would be best to maintain binary
> equality.
My current set of scripts will preserve checksums so a patched deb can
share exactly the same signature as a full deb.
> My results suggested that zsync would reduce bandwidth to ~33% for
> regular upgrades (not necessarily dist-upgrades) compared to 10%-5% for bsdiff.
Bsdiff does very little for compressed Debs. As I illustrated earlier,
using bsdiff on _uncompressed_ debs reduces bandwidth by around 70% and
even more for larger packages.
Looking at other mailing lists and blogs - a main concern for not
implementing something like this seems to be server disk space. Ideally
there would be a diff between each package version and the original
packages themselves remain intact but there are alternatives.
Here are a few scenarios we could try: (A script could do some clever
sums and figure out how to best use the space available maybe)
-------------------------------------------
1. As it is now, download 10MB for each update:
Version 1 Version 2 Version 3
10MB 10MB 10MB = 30MB
2. Diff between all with originals:
(The best but most disk space consuming)
Version 1 <--diff--> Version 2 <--diff--> Version 3
10MB 3MB 10MB 3MB 10MB = 36MB
3. Diff between all with one original:
(Worst, saves space)
Version 1 <--diff--> <--diff-->
10MB 3MB 3MB = 16MB
4. Some originals, diffs between all
(A mix of 2 and 3, this one shows 7 package versions instead of 3)
F = Full package 10MB
D = Difference 3MB
1 2 3 4 5 6 7
F D D D F D D D F = 48MB (vs. 70MB for Scenario1)
-------------------------------------------
Scenario 3 actually saves server space but could be inconvenient for
users who don't update regularly. Scenario 3 could be improved by
ensuring it doesn't become substantially less efficient for people not
updating frequently (As shown in Scenario 4).
Looking at Scenario 4 more carefully
====================================
Say the client already has version 3, and needs version 7:
a. The client calculates how much downloading is needed just with diffs:
Diff between 3 and 4: 3MB
Diff between 4 and 5: 3MB
Diff between 5 and 6: 3MB
Diff between 6 and 7: 3MB
Total: 12MB
b. The client calculates again using the latest full package size +
diffs if necessary:
Full package 7: 10MB
Total: 10MB
After doing these two simple add ups, it goes ahead and downloads the
least amount required. Which in this case would be B.
---------------------------------
So there are other ways around the server space problem. The bigger the
gap between full packages - the less efficient it is for people who
don't update regularly. The smaller the gap - the more disk space
required by the mirrors. 'Real-world' numbers need to be made to find a
healthy compromise between the two. The goal of binary diffs is to make
regular upgrades as painless as possible, security updates are TINY with
bsdiff. The diff between firefox 1.0.7 and 1.0.8 is over 20 times
smaller than a normal update. The advantages are obvious, but we need to
reduce the disadvantages if we're going to see this implemented in my
lifetime ;).
Kind Regards,
James Hall
More information about the ubuntu-devel
mailing list