Better compression

Robert Collins robertc at robertcollins.net
Wed Jul 30 10:57:17 BST 2008


On Mon, 2008-07-28 at 22:31 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1


> So I was talking to Martin about this, and I realize that we can
> actually transform the delta without having to extract all of the full
> texts. I'll outline it here. In the short term, we should know what the
> CPU overhead is, before we decide to spend the full work to implement
> this. But here goes:

Paraphrasing: reuse the matching line data by translating the byte-reuse
information into lines-in-texts information, and then applying that to
the output stream for the texts that this references, followed by
insertion statements for texts that are being omitted.

I actually considered some variations of this when Aaron raised the
issue of getting efficient 'matching_blocks' information about texts
stored in this format.

My feeling about this was that its probably cheaper just to diff on the
fly. I'd certainly want to hold off on this until we've done other less
complex things first. I could be wrong - the complexity of this
transform is O(text-length * texts in stream before this text), wereas
re-diffing can be O(text-length ^ 2) [though I haven't rigorously
analysed this, possibly its lower)

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080730/6ae25a7e/attachment.pgp 


More information about the bazaar mailing list