Better compression

Ian Clatworthy ian.clatworthy at canonical.com
Thu Jul 17 08:50:19 BST 2008


Robert Collins wrote:
> On Thu, 2008-07-17 at 16:00 +1000, Rob Weir wrote:
>> On 17 Jul 2008, raindog at macrohmasheen.com wrote:
>>> What level perf increase is seen with this? Do you have any
>>> comparissons?  Sent from my Verizon Wireless BlackBerry
>> With regards to space, I converted bzr.dev from pack-0.92 to
>> btree+groupcompress:
>>
>> 85664   backup.bzr
>> 40232   .bzr/
>>
>> a saving of over 50%.

I saw a similar gain on Python last night: 346.5 MB -> 181.4 MB.

> The compressor would have been getting texts in topological order there.
> It does much better with reverse-topological order, which is why I need
> a new InterRepository object or something... I tested a bzr.dev
> repository and saw more like 30 MB with better ordering in use, but it
> was a custom-hack rather than a reusable thing.

So I'm wondering what that implies w.r.t. maximising how fast-import
works. If we know that compressing in a forward direction will generally
take more space than compressing going in the other direction, we'll
probably always want to consume the stream *quickly*, then compress as
much as we can during the final pack that fastimport does, yes?

So I guess I'm requesting enough flexibility in our API to say
"load quickly" so that fastimport doesn't take forever overly
compressing stuff on a first pass which will only be thrown away.

Ian C.



More information about the bazaar mailing list