Binary diffs to reduce update download sizes

Lourens Veen lourens at rainbowdesert.net
Thu Aug 3 11:52:11 BST 2006


Note: I'm not subscribed to the list. Please CC me; I will also keep an 
eye on the archives.

The Problem

Some time ago I installed Ubuntu on my mother's computer, and noticed 
that there is a lot of downloading involved to keep the system up to 
date. After installing Ubuntu 6.06 LTS, a full update means a new 
kernel, a new X.org, a new GNOME, and a whole bunch of miscellaneous 
applications. And after that, new updates keep arriving regularly.

Now, I have a fast broadband connection at home, so downloading a few 
dozen megabytes is no problem. On a 56k modem it's a different story 
however: a simple kernel update to fix a security problem means 
downloading a 21MB kernel package. That takes at least an hour, and 
much longer if your connection isn't very good. Kernel updates aren't 
that rare.

I "solved" my mother's problem by disabling updates, but of course that 
is not a good idea in the long term (and probably not in the short term 
either). She has since upgraded to a DSL line, and updates have been 
reenabled, so my immediate problem has been solved.

This got me thinking however. There are a lot of places in the world 
where Internet access isn't ubiquitous and cheap, and people in those 
places should be able to run free software, and Ubuntu, as well. They 
too should be able to get updates easily, even on a slow connection.

Experiment

Most updates are only small source patches, and security updates doubly 
so. I figured that the binaries of the different versions of a 
programme would probably be similar too, and that we should be able to 
make much smaller updates. I decided to experiment a little.

I installed bsdiff, which is a binary diff tool. I grabbed 
linux-image-2.6.15-26-386_2.6.15-26.44_i386.deb and 
linux-image-2.6.15-26-386_2.6.15-26.45_i386.deb out of 
my /var/cache/apt/archives. Next, I unpacked both packages (ar x, then 
tar xf on the control and data files within). I then wrote a simple 
script to traverse the two trees (bsdiff doesn't traverse directories 
on its own), find differing files, bsdiff them, and write the diff file 
to a third directory tree mirroring the first two. Finally, I zipped up 
the resulting patch tree into a tarball.

The resulting diff.tar.gz is 1482935 bytes, or about 1.4MB. That's less 
than 7% of the original size.

Proposed Solution

Now, instead of downloading the new package, a user (or rather, their 
update manager applet) could download this patch tree, take the old 
package out of the cache, unpack it, apply the patches and then repack 
it, to obtain the updated package. The result will be exactly the same, 
except that the download time (assuming 5kB/s) on a 56k modem is 
reduced from 70 minutes to 4 minutes.

There are a few details to be worked out when implementing this deb-diff 
scheme.

First the downloader would need the exact same ar, tar, and gzip as the 
original packager to get the exact same package file. This is necessary 
for the checksum and signature to be correct. This is doable I think, 
as almost all Ubuntu installs will have the same packages installed, 
and ar, tar and gzip are very stable. Also, file modification dates 
need to be looked at, because they also need to be the same, but that 
too is a solvable problem.

Second, the maintainer would have to use deb-diff to create the patches 
and supply them along with the full packages.

Third, apt-get, or the update client, would have to be modified to fetch 
diffs when possible, and use deb-patch to recreate the new package. We 
should probably avoid cleaning the package cache too often.

Note that aside from creating the diff there is no extra work for 
package maintainers: the exact same package will be installed whether 
it was downloaded as a whole or created through a patch versus an older 
version. No extra configurations are introduced. Also, nothing breaks: 
packages without diffs can still be downloaded as a whole.


This will probably have to be taken up with the Debian developers as 
well, but as I use Ubuntu I figured I'd post it here first and see what 
people think.

So, do you agree that there is a demand for smaller updates? Can you see 
any unforeseen problems with this approach? Other comments?

Cheers, and thanks for all the great work so far,

Lourens
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20060803/11a0ec74/attachment-0001.pgp 


More information about the ubuntu-devel mailing list