Binary diffs to reduce update download sizes
Lourens Veen
lourens at rainbowdesert.net
Thu Aug 3 11:52:11 BST 2006
Note: I'm not subscribed to the list. Please CC me; I will also keep an
eye on the archives.
The Problem
Some time ago I installed Ubuntu on my mother's computer, and noticed
that there is a lot of downloading involved to keep the system up to
date. After installing Ubuntu 6.06 LTS, a full update means a new
kernel, a new X.org, a new GNOME, and a whole bunch of miscellaneous
applications. And after that, new updates keep arriving regularly.
Now, I have a fast broadband connection at home, so downloading a few
dozen megabytes is no problem. On a 56k modem it's a different story
however: a simple kernel update to fix a security problem means
downloading a 21MB kernel package. That takes at least an hour, and
much longer if your connection isn't very good. Kernel updates aren't
that rare.
I "solved" my mother's problem by disabling updates, but of course that
is not a good idea in the long term (and probably not in the short term
either). She has since upgraded to a DSL line, and updates have been
reenabled, so my immediate problem has been solved.
This got me thinking however. There are a lot of places in the world
where Internet access isn't ubiquitous and cheap, and people in those
places should be able to run free software, and Ubuntu, as well. They
too should be able to get updates easily, even on a slow connection.
Experiment
Most updates are only small source patches, and security updates doubly
so. I figured that the binaries of the different versions of a
programme would probably be similar too, and that we should be able to
make much smaller updates. I decided to experiment a little.
I installed bsdiff, which is a binary diff tool. I grabbed
linux-image-2.6.15-26-386_2.6.15-26.44_i386.deb and
linux-image-2.6.15-26-386_2.6.15-26.45_i386.deb out of
my /var/cache/apt/archives. Next, I unpacked both packages (ar x, then
tar xf on the control and data files within). I then wrote a simple
script to traverse the two trees (bsdiff doesn't traverse directories
on its own), find differing files, bsdiff them, and write the diff file
to a third directory tree mirroring the first two. Finally, I zipped up
the resulting patch tree into a tarball.
The resulting diff.tar.gz is 1482935 bytes, or about 1.4MB. That's less
than 7% of the original size.
Proposed Solution
Now, instead of downloading the new package, a user (or rather, their
update manager applet) could download this patch tree, take the old
package out of the cache, unpack it, apply the patches and then repack
it, to obtain the updated package. The result will be exactly the same,
except that the download time (assuming 5kB/s) on a 56k modem is
reduced from 70 minutes to 4 minutes.
There are a few details to be worked out when implementing this deb-diff
scheme.
First the downloader would need the exact same ar, tar, and gzip as the
original packager to get the exact same package file. This is necessary
for the checksum and signature to be correct. This is doable I think,
as almost all Ubuntu installs will have the same packages installed,
and ar, tar and gzip are very stable. Also, file modification dates
need to be looked at, because they also need to be the same, but that
too is a solvable problem.
Second, the maintainer would have to use deb-diff to create the patches
and supply them along with the full packages.
Third, apt-get, or the update client, would have to be modified to fetch
diffs when possible, and use deb-patch to recreate the new package. We
should probably avoid cleaning the package cache too often.
Note that aside from creating the diff there is no extra work for
package maintainers: the exact same package will be installed whether
it was downloaded as a whole or created through a patch versus an older
version. No extra configurations are introduced. Also, nothing breaks:
packages without diffs can still be downloaded as a whole.
This will probably have to be taken up with the Debian developers as
well, but as I use Ubuntu I figured I'd post it here first and see what
people think.
So, do you agree that there is a demand for smaller updates? Can you see
any unforeseen problems with this approach? Other comments?
Cheers, and thanks for all the great work so far,
Lourens
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20060803/11a0ec74/attachment-0001.pgp
More information about the ubuntu-devel
mailing list