Using zsync for .deb downloads: Courgette

Paul Sladen ubuntu at paul.sladen.org
Sat Jul 18 03:14:03 BST 2009


On Thu, 16 Jul 2009, Matthew Paul Thomas wrote:
> replaced bsdiff with a more efficient algorithm called "Courgette"

It's not so much a new algorithm as a new domain-specific preprocessor (one
specific to x86 .code sections extracted from PE/COFF executables).  bsdiff
is still used to do the actual diff and LZMA is still used to compress that
resulting .bsdiff.  Source is at:

  http://src.chromium.org/viewvc/chrome/trunk/src/courgette/

The .code sections in executables are disassembled into a symbolic
representation and the bsdiff is taken between the transformed symbolic
representations (think of diffing .S files).  The new pre-processing is a
more refined version of the BCJ reversible absolute<->relative jump
transformation already performed by the .7z/MS-LZX stacks.

In time, equivalent domain-specific code could also be added for other
executable/library formats+instruction sets, or for PNG images, or for HTML
files....

The secret to better diffing is to work in the most _un_compressed space
possible.  Eg. looking inside .deb files, looking inside .gz/.bz2 streams,
looking inside .tar streams, looking inside ELF binaries, looking inside
.code sections, and finally looking inside individual instructions
(disassembling).

As long as the same pre-processing steps can be undertaken at each end, and
as long as the steps are fully reversible[0] then the system works and the
more layers you work down and inspect, the better the diffing can be.

	-Paul

[0] Encoding to a compressed bitstream is not, as there are multiple
equivalent compressed encoding for one uncompressed input.
-- 
Why do one side of a triangle when you can do all three.  Somewhere, GB.




More information about the ubuntu-devel mailing list