Compressing weaved revisions?

James Blackwell jblack at merconline.com
Sat Oct 8 14:38:38 BST 2005


On Thu, Oct 06, 2005 at 12:26:11PM -0500, John A Meinel wrote:
> James Blackwell wrote:
> >On Fri, Sep 30, 2005 at 10:04:41AM -0400, Aaron Bentley wrote:
> >
> >>For trees with, e.g. 500 revisions, the revision storage may actually be
> >>larger than tree storage.
> >
> >
> >I converted the Bazaar-NG tree to newformat. I got a ratio of about 7:1,
> >which is much better than the old 29:1, but not as good as Mercurial's
> >2.5:1.
> >
> 
> Are you using --apparent, and are you considering just revision-store or 
> all of .bzr?

Not at that time I wasn't. Rob mentioned it to me this morning and I ended
up with 5.1:1 :

jblack at pluto:~/nf$ du -s --apparent .
10735   .
jblack at pluto:~/nf$ du -s --apparent .bzr
8644    .bzr


10735 / (10735 - 8644) = 5.13


I'm not sure why we end up with different apparent numbers (the
nonapparent would easily be because of blocksize)

The mercurial numbers were non-apparent, btw. I think they have less files
in their revision stores, so their ratio may actually be _higher_.


> So .bzr vs working is 5:1 (3.2:1 apparent)
> .bzr/everything-else vs .bzr/revision-store is 1:1 (7:1 apparent)
> 
> With my revstore2sql plugin, if I trim out all the inventory stuff, I 
> can get a revisions.sqlite down to 561K, it gets compression mostly by 
> not duplicating the revision_id everywhere (switching to just a number).
> It loses some by having indexes, but in theory that would make access 
> very fast.
> 
> I don't know if we would want to use an sqlite store for everything, 
> since there isn't a way to remotely access it, or download only part of 
> it. But it is small, and you could download it locally and then upload it.
> 
> If I pack all of the inventory into the sqlite db, then the size goes up 
> to 9.4M, but 8.1M of that is because I don't do any delta compression of 
> inventories (so each revision has a complete list of inventory entries). 
> (though this is better than the old 24M version).
> But, I can get any inventory out of it in about 0.05s, whereas on this 
> machine branch.get_inventory() takes 1.2s. (That is with bzrlib already 
> loaded)
> 
> John
> =:->
> 






More information about the bazaar mailing list