Compressing weaved revisions?

John A Meinel john at arbash-meinel.com
Sat Oct 1 06:31:49 BST 2005


John A Meinel wrote:
> Aaron Bentley wrote:
> 
>> Hi all,
>>
>> For trees with, e.g. 500 revisions, the revision storage may actually be
>> larger than tree storage.
>>
>> The new format doesn't weave revisions, because they're almost
>> completely different for each revision.  They also don't compress well.
>>  For my example tree, the revisions are 47 k uncompressed and 32 k
>> compressed.
>>
>> But if they're tarred and gzipped, the tarfile is 7.2k, and if they're
>> tarred and bzipped, that's 5.7k.
>>
>> I expect if revisions were weaved and gzipped, we'd see roughly the same
>> level of compression, i.e. 6x
>>
>> They would have the disadvantages of weaves, but we've already accepted
>> those disadvantages for texts and inventories.
>>
>> Aaron
> 
> 
> Well, just to get the numbers straight, I went ahead and created a 
> plugin, which can turn the current revision-store into a weave file. It 
> requires that the branch be upgraded, simply because I trusted the 
> newformat weave code more than the old weave code.

I decided to add another datapoint. What would it look like as an sqlite 
database.

Available from here:
http://bzr.arbash-meinel.com/plugins/revstore2sql/

It should be compatible with the old and new bzr formats. There are 
quite a few checks that you have to do now, and naturally you have to be 
running the bzr that supports the given branch format.

Basically, I normalized out the revisions into their own table, so that 
I can reference them with just a number, and I did the same to the 
committer.
Ancestry is represented as another id->id table. With appropriate indexes.

First, the time to convert is *way* faster. I don't remember what the 
times used to be, but it is something like 10min down to 30s.

   1894400 revisions.tar
    879000 du -sh --apparent
    859138 revisions.zip
    547840 revisions.sqlite
    246171 revisions.tar.gz

So just with just that amount of compression, I am able to beat zip 
compression, though I come in after gzip'd tarfile (or weave).

I just thought it would be an interesting comparison. I'm not advocating 
or anything, I just thought of it as a possible comparison point.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051001/2d7f6f77/attachment.pgp 


More information about the bazaar mailing list