Compressing weaved revisions?
John A Meinel
john at arbash-meinel.com
Sat Oct 1 06:31:49 BST 2005
John A Meinel wrote:
> Aaron Bentley wrote:
>
>> Hi all,
>>
>> For trees with, e.g. 500 revisions, the revision storage may actually be
>> larger than tree storage.
>>
>> The new format doesn't weave revisions, because they're almost
>> completely different for each revision. They also don't compress well.
>> For my example tree, the revisions are 47 k uncompressed and 32 k
>> compressed.
>>
>> But if they're tarred and gzipped, the tarfile is 7.2k, and if they're
>> tarred and bzipped, that's 5.7k.
>>
>> I expect if revisions were weaved and gzipped, we'd see roughly the same
>> level of compression, i.e. 6x
>>
>> They would have the disadvantages of weaves, but we've already accepted
>> those disadvantages for texts and inventories.
>>
>> Aaron
>
>
> Well, just to get the numbers straight, I went ahead and created a
> plugin, which can turn the current revision-store into a weave file. It
> requires that the branch be upgraded, simply because I trusted the
> newformat weave code more than the old weave code.
I decided to add another datapoint. What would it look like as an sqlite
database.
Available from here:
http://bzr.arbash-meinel.com/plugins/revstore2sql/
It should be compatible with the old and new bzr formats. There are
quite a few checks that you have to do now, and naturally you have to be
running the bzr that supports the given branch format.
Basically, I normalized out the revisions into their own table, so that
I can reference them with just a number, and I did the same to the
committer.
Ancestry is represented as another id->id table. With appropriate indexes.
First, the time to convert is *way* faster. I don't remember what the
times used to be, but it is something like 10min down to 30s.
1894400 revisions.tar
879000 du -sh --apparent
859138 revisions.zip
547840 revisions.sqlite
246171 revisions.tar.gz
So just with just that amount of compression, I am able to beat zip
compression, though I come in after gzip'd tarfile (or weave).
I just thought it would be an interesting comparison. I'm not advocating
or anything, I just thought of it as a possible comparison point.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051001/2d7f6f77/attachment.pgp
More information about the bazaar
mailing list