Binary diffs to reduce update download sizes

Alex Jones alex at weej.com
Sat Sep 16 14:52:24 BST 2006


Hi

Not a solution to the problem, but to reduce the volume of updates to
only essential security updates, disable dapper-updates but leave
dapper-security enabled.

HTH!

On Thu, 2006-08-03 at 12:51 +0200, Lourens Veen wrote:
> Note: I'm not subscribed to the list. Please CC me; I will also keep an 
> eye on the archives.
> 
> The Problem
> 
> Some time ago I installed Ubuntu on my mother's computer, and noticed 
> that there is a lot of downloading involved to keep the system up to 
> date. After installing Ubuntu 6.06 LTS, a full update means a new 
> kernel, a new X.org, a new GNOME, and a whole bunch of miscellaneous 
> applications. And after that, new updates keep arriving regularly.
> 
> Now, I have a fast broadband connection at home, so downloading a few 
> dozen megabytes is no problem. On a 56k modem it's a different story 
> however: a simple kernel update to fix a security problem means 
> downloading a 21MB kernel package. That takes at least an hour, and 
> much longer if your connection isn't very good. Kernel updates aren't 
> that rare.
> 
> I "solved" my mother's problem by disabling updates, but of course that 
> is not a good idea in the long term (and probably not in the short term 
> either). She has since upgraded to a DSL line, and updates have been 
> reenabled, so my immediate problem has been solved.
> 
> This got me thinking however. There are a lot of places in the world 
> where Internet access isn't ubiquitous and cheap, and people in those 
> places should be able to run free software, and Ubuntu, as well. They 
> too should be able to get updates easily, even on a slow connection.
> 
> Experiment
> 
> Most updates are only small source patches, and security updates doubly 
> so. I figured that the binaries of the different versions of a 
> programme would probably be similar too, and that we should be able to 
> make much smaller updates. I decided to experiment a little.
> 
> I installed bsdiff, which is a binary diff tool. I grabbed 
> linux-image-2.6.15-26-386_2.6.15-26.44_i386.deb and 
> linux-image-2.6.15-26-386_2.6.15-26.45_i386.deb out of 
> my /var/cache/apt/archives. Next, I unpacked both packages (ar x, then 
> tar xf on the control and data files within). I then wrote a simple 
> script to traverse the two trees (bsdiff doesn't traverse directories 
> on its own), find differing files, bsdiff them, and write the diff file 
> to a third directory tree mirroring the first two. Finally, I zipped up 
> the resulting patch tree into a tarball.
> 
> The resulting diff.tar.gz is 1482935 bytes, or about 1.4MB. That's less 
> than 7% of the original size.
> 
> Proposed Solution
> 
> Now, instead of downloading the new package, a user (or rather, their 
> update manager applet) could download this patch tree, take the old 
> package out of the cache, unpack it, apply the patches and then repack 
> it, to obtain the updated package. The result will be exactly the same, 
> except that the download time (assuming 5kB/s) on a 56k modem is 
> reduced from 70 minutes to 4 minutes.
> 
> There are a few details to be worked out when implementing this deb-diff 
> scheme.
> 
> First the downloader would need the exact same ar, tar, and gzip as the 
> original packager to get the exact same package file. This is necessary 
> for the checksum and signature to be correct. This is doable I think, 
> as almost all Ubuntu installs will have the same packages installed, 
> and ar, tar and gzip are very stable. Also, file modification dates 
> need to be looked at, because they also need to be the same, but that 
> too is a solvable problem.
> 
> Second, the maintainer would have to use deb-diff to create the patches 
> and supply them along with the full packages.
> 
> Third, apt-get, or the update client, would have to be modified to fetch 
> diffs when possible, and use deb-patch to recreate the new package. We 
> should probably avoid cleaning the package cache too often.
> 
> Note that aside from creating the diff there is no extra work for 
> package maintainers: the exact same package will be installed whether 
> it was downloaded as a whole or created through a patch versus an older 
> version. No extra configurations are introduced. Also, nothing breaks: 
> packages without diffs can still be downloaded as a whole.
> 
> 
> This will probably have to be taken up with the Debian developers as 
> well, but as I use Ubuntu I figured I'd post it here first and see what 
> people think.
> 
> So, do you agree that there is a demand for smaller updates? Can you see 
> any unforeseen problems with this approach? Other comments?
> 
> Cheers, and thanks for all the great work so far,
> 
> Lourens
-- 
Alex Jones <alex at weej.com>




More information about the ubuntu-devel mailing list