Using zsync for .deb downloads: Courgette
Paul Sladen
ubuntu at paul.sladen.org
Sat Jul 18 03:14:03 BST 2009
On Thu, 16 Jul 2009, Matthew Paul Thomas wrote:
> replaced bsdiff with a more efficient algorithm called "Courgette"
It's not so much a new algorithm as a new domain-specific preprocessor (one
specific to x86 .code sections extracted from PE/COFF executables). bsdiff
is still used to do the actual diff and LZMA is still used to compress that
resulting .bsdiff. Source is at:
http://src.chromium.org/viewvc/chrome/trunk/src/courgette/
The .code sections in executables are disassembled into a symbolic
representation and the bsdiff is taken between the transformed symbolic
representations (think of diffing .S files). The new pre-processing is a
more refined version of the BCJ reversible absolute<->relative jump
transformation already performed by the .7z/MS-LZX stacks.
In time, equivalent domain-specific code could also be added for other
executable/library formats+instruction sets, or for PNG images, or for HTML
files....
The secret to better diffing is to work in the most _un_compressed space
possible. Eg. looking inside .deb files, looking inside .gz/.bz2 streams,
looking inside .tar streams, looking inside ELF binaries, looking inside
.code sections, and finally looking inside individual instructions
(disassembling).
As long as the same pre-processing steps can be undertaken at each end, and
as long as the steps are fully reversible[0] then the system works and the
more layers you work down and inspect, the better the diffing can be.
-Paul
[0] Encoding to a compressed bitstream is not, as there are multiple
equivalent compressed encoding for one uncompressed input.
--
Why do one side of a triangle when you can do all three. Somewhere, GB.
More information about the ubuntu-devel
mailing list