Binary diffs to reduce update download sizes
Alex Jones
alex at weej.com
Sat Sep 16 14:52:24 BST 2006
Hi
Not a solution to the problem, but to reduce the volume of updates to
only essential security updates, disable dapper-updates but leave
dapper-security enabled.
HTH!
On Thu, 2006-08-03 at 12:51 +0200, Lourens Veen wrote:
> Note: I'm not subscribed to the list. Please CC me; I will also keep an
> eye on the archives.
>
> The Problem
>
> Some time ago I installed Ubuntu on my mother's computer, and noticed
> that there is a lot of downloading involved to keep the system up to
> date. After installing Ubuntu 6.06 LTS, a full update means a new
> kernel, a new X.org, a new GNOME, and a whole bunch of miscellaneous
> applications. And after that, new updates keep arriving regularly.
>
> Now, I have a fast broadband connection at home, so downloading a few
> dozen megabytes is no problem. On a 56k modem it's a different story
> however: a simple kernel update to fix a security problem means
> downloading a 21MB kernel package. That takes at least an hour, and
> much longer if your connection isn't very good. Kernel updates aren't
> that rare.
>
> I "solved" my mother's problem by disabling updates, but of course that
> is not a good idea in the long term (and probably not in the short term
> either). She has since upgraded to a DSL line, and updates have been
> reenabled, so my immediate problem has been solved.
>
> This got me thinking however. There are a lot of places in the world
> where Internet access isn't ubiquitous and cheap, and people in those
> places should be able to run free software, and Ubuntu, as well. They
> too should be able to get updates easily, even on a slow connection.
>
> Experiment
>
> Most updates are only small source patches, and security updates doubly
> so. I figured that the binaries of the different versions of a
> programme would probably be similar too, and that we should be able to
> make much smaller updates. I decided to experiment a little.
>
> I installed bsdiff, which is a binary diff tool. I grabbed
> linux-image-2.6.15-26-386_2.6.15-26.44_i386.deb and
> linux-image-2.6.15-26-386_2.6.15-26.45_i386.deb out of
> my /var/cache/apt/archives. Next, I unpacked both packages (ar x, then
> tar xf on the control and data files within). I then wrote a simple
> script to traverse the two trees (bsdiff doesn't traverse directories
> on its own), find differing files, bsdiff them, and write the diff file
> to a third directory tree mirroring the first two. Finally, I zipped up
> the resulting patch tree into a tarball.
>
> The resulting diff.tar.gz is 1482935 bytes, or about 1.4MB. That's less
> than 7% of the original size.
>
> Proposed Solution
>
> Now, instead of downloading the new package, a user (or rather, their
> update manager applet) could download this patch tree, take the old
> package out of the cache, unpack it, apply the patches and then repack
> it, to obtain the updated package. The result will be exactly the same,
> except that the download time (assuming 5kB/s) on a 56k modem is
> reduced from 70 minutes to 4 minutes.
>
> There are a few details to be worked out when implementing this deb-diff
> scheme.
>
> First the downloader would need the exact same ar, tar, and gzip as the
> original packager to get the exact same package file. This is necessary
> for the checksum and signature to be correct. This is doable I think,
> as almost all Ubuntu installs will have the same packages installed,
> and ar, tar and gzip are very stable. Also, file modification dates
> need to be looked at, because they also need to be the same, but that
> too is a solvable problem.
>
> Second, the maintainer would have to use deb-diff to create the patches
> and supply them along with the full packages.
>
> Third, apt-get, or the update client, would have to be modified to fetch
> diffs when possible, and use deb-patch to recreate the new package. We
> should probably avoid cleaning the package cache too often.
>
> Note that aside from creating the diff there is no extra work for
> package maintainers: the exact same package will be installed whether
> it was downloaded as a whole or created through a patch versus an older
> version. No extra configurations are introduced. Also, nothing breaks:
> packages without diffs can still be downloaded as a whole.
>
>
> This will probably have to be taken up with the Debian developers as
> well, but as I use Ubuntu I figured I'd post it here first and see what
> people think.
>
> So, do you agree that there is a demand for smaller updates? Can you see
> any unforeseen problems with this approach? Other comments?
>
> Cheers, and thanks for all the great work so far,
>
> Lourens
--
Alex Jones <alex at weej.com>
More information about the ubuntu-devel
mailing list