groupcompress extraction 10x faster

Thu Feb 19 22:23:53 GMT 2009

On Thu, 2009-02-19 at 16:04 -0600, John Arbash Meinel wrote:
> 
> Looking at the raw bits I'm seeing. I can often see a lot of "copy 1
> line, copy 2 lines, copy 1 line, copy 1 line, copy 10 lines, etc". It
> might be reasonable to only collapse the "copy <2 lines" together into
> an "insert".

If you think about this like knit compression - inserting full texts to
stop really deep chains - what you're proposing is inserting full texts
here and there. Another way is just to start a new group when the weight
of these things starts to be too much. But seeing these runs of copies
is precisely the goal of groupcompress, so its not actually concerning
me *that its like that*. What concerns me is more what actual
overhead/benefit we're paying.

Consider a file that goes
ACEG
ABCEG
ABCDEG
ABCDEFG

we'll output
ABCDEFG
then
c1,5 c7,1
then
c1,3 c5,1 c7,1
then
c1,1 c3,1 c5,1 c7,1

So Block copying a reduced region to allow later texts to reference it
more pithily is a decent idea; its hard though to know where the
tradeoff will sit - I don't have any quick suggestions :(.

-Rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090220/eaa1b20e/attachment.pgp