compressed weaves, and revision.weave

John A Meinel john at arbash-meinel.com
Fri Oct 28 02:33:05 BST 2005


Michael Ellerman wrote:
> On Thu, 27 Oct 2005 17:00, Michael Ellerman wrote:
>> A whole lot of useless junk
>> ... 
>> Actually it doesn't look so bad when you run it a few times, I've got a
>> fair amount of RAM so I'm probably caching the whole tree.
> 
> Ooops. Those numbers yesterday were on a format 7 branch, but not compressed, 
> so they're all bogus. I'll try and get some compressed numbers today 
> sometime.

Thanks.
I would be curious to compare format 6 versus 7 compressed and uncompressed.

Also, if you could just do a "wc -l" against the inventory.weave
(naturally you would need to uncompress the compressed format 7).

I'm curious if what you are seeing is because the inventory.weave file
simply has almost 2x the lines in it.

Basically, format 7 has 2 lines per inventory entry, while format 6 only
has 1. Per revision, both of them generally only add 1 line per modified
entry. (where in format 7, this line is shorter than format 6).

For a tree like bzr.dev, this means format 7 has a few hundred extra
lines, because the number of revisions is large, and the number of files
is small.

For your kernel archive, you have the opposite effect. Your number of
revisions is small, but your number of files is large. So you probably
have thousands of extra lines.

I would guess that as the number of revisions increases, both formats
would slow down at approximately the same rate. So if you had 1000
revisions, format 7 would still be 3 seconds slower than format 6,
though both would take 30+ seconds instead of 10s.

I'm curious what changes we could do to make weave extraction faster.
I know Martin is working on an indexed append-only structure. Which
might mean you don't have to read all the lines.
I'm wondering about using some sort of binary tree rather than a stack
to keep track of what lines are active in what revision. So that you
could skip large chunks of lines at a time.

But I haven't really sorted out in my head what that might look like.

Thanks for your performance testing. Is there somewhere that you could
put the inventory.weave(.gz) file? I was thinking to test python's gzip
versus native gzip for a large, real-life file.

John
=:->

> 
> cheers
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051027/76311476/attachment.pgp 


More information about the bazaar mailing list