Diff-debs: bsdiff (92% reduction) vs zsync (~70%)

Phillip Susi psusi at cfl.rr.com
Mon Jan 16 16:37:15 GMT 2006

I say again, you do not want to diff the compressed data stream ( 
data.tar.gz ) because small changes in the original files cause large 
changes in the compressed stream.  You need to take an xdelta of the 
original uncompressed files, then compress the deltas and that's what 
you send as the patch.

I would not be surprised if you changed your test case to apply bsdiff 
to the uncompressed files and found that it reduced the patch size by 
99% reduction.

John C. McCabe-Dansted wrote:
> I wrote a utility to compare the effectiveness of different tools for creating 
> Deb-diffs. (attached)
> I quickly eliminated xdelta, on the basis that the bsdiff patches were always 
> smaller; sometimes as much as 5 times smaller. Bsdiff on average reduced the 
> amount that needed to be downloaded by 92%. Higher compression rates could be 
> achieved if we repack the .gz files contained in data.tar.gz
> Interestingly bsdiff was able to save 33% on the download of the i686 kernel 
> image if patched against the i386 image. However, this reduction is probably 
> not worth the effort. 
> Although zsync only reduced the required bandwidth, in my tests, by about 70%, 
> it stayed in the race because of it greater flexibility - it should be able 
> to use any existing .deb on the system, even ones regenerated from installed 
> files by dpkg-repack. If we use bz2 to compress instead of gz, zsync is 
> likely to become useless however. (bz2 would also have the problem that any 
> processing of deb files will take much more cpu time).
> The extra space required by either of these methods should be minimal. The 
> size of a zsync file seems to be about 1% of the size of the original. With 
> bsdiff files, we should be able to limit the extra storage required on the 
> mirrors to under a gig by only including 
>  a) patches against the files on the official CD-ROM(s).
>  b) updates that occurred in the last (e.g.) 10 days.
> (a) should allow (e.g dial up) users to easily get up-to-date immediately 
> after install, while they still have the official CD-ROM in their computer.
>  (b) would help users keep up to date, and perhaps also help keep mirrors 
> up-to-date regardless of network congestion.
> clearly if an appropriate patch isn't found we can still download the 
> whole .deb normally.
> Some more info is available on my blog at:
> 	http://www.livejournal.com/users/flyingreptile/101020.html
> These results seem very promising to me. I am very busy at the moment, but if 
> no-one else steps up, I'll start work in a couple of months.

More information about the ubuntu-devel mailing list