B+Tree indices: ongoing progress
Robert Collins
robertc at robertcollins.net
Wed Jul 2 04:19:54 BST 2008
On Tue, 2008-07-01 at 21:59 -0500, John Arbash Meinel wrote:
>
> So... If you trust that we will always get at least 2:1 compression
> (it
> seems reasonable given the amount of entropy in our revision_ids),
> then
> I can cut the time down quite a bit, and retain *most* of the saving.
>
> (9b) gives 4,612,096 versus 6,172,672 without repacking, is 1:1.34,
> instead of the maximum 1:1.4.
>
> Note that this is also one of those places where you would probably
> like
> "bzr pack" to set max_repacks = 100, and have it go to town.
>
> Interestingly, if I set compression_level = 9 I get exactly the same
> compression.
re safety:
As long as in the event of an overrun it *at worst* asserts during
writing, I'm _ok_. To be happy I'd like it to always work no matter what
(short of 4K long values :P). I expect that (bzr-git revids will be very
high entropy and thus unlikely to get 2:1 compression.
re: compression level - thats because of the window size - we're dealing
in a very small stream anyway. We can drop the compression but we can't
raise it using zlib.
re: performance - I found while tuning commit to be hg-speed that we
really need to eliminate fat all the way through the process. At the end
of that process I was seeking 0.1 second wins throughout the code base.
(On a 20K file tree - or 1/4 the size of this data set. So any expansion
here is something we will need to counteract to prevent commit()
becoming slower).
And some good news:
Baseline overhead: Read 92994 keys in 7.075
GraphIndex: Wrote 10147567 bytes in 10.949
GraphIndex: iter_all_entries in 9.671
GraphIndex: iter_random_one in 11.915
GraphIndex: get_components_positions(2000) in 20.535
GraphIndex: search_from_revid in 6.516
GraphIndex: miss torture in 343.138
GraphIndex: -------Done---------
Baseline overhead: Read 92994 keys in 7.072
BTreeIndex: Wrote 4104336 bytes in 20.299
BTreeIndex: iter_all_entries in 1.850
BTreeIndex: iter_random_one in 115.365
BTreeIndex: get_components_positions(2000) in 33.332
BTreeIndex: search_from_revid in 4.581
BTreeIndex: miss torture in 41.803
BTreeIndex: -------Done---------
Note the search_from_revid and miss torture results.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: indexbench.py
Type: text/x-python
Size: 5802 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/4d66948a/attachment-0001.py
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/4d66948a/attachment-0001.pgp
More information about the bazaar
mailing list