b+tree index update

Robert Collins robertc at robertcollins.net
Sun Jul 13 18:45:20 BST 2008


I've merged john's index tweaks; this brings a dependency on a newer
pybloom from http://bazaar.launchpad.net/~jameinel/+junk/bzr-pybloom.
I've also implemented disk-spilling during writing. Basically it writes
out to disk every 100000 nodes, and does a merge sort with exponential
backoff between the in memory nodes and previously written disk nodes.

Currently, it uses blooms for this spilled-to-disk data, but I think
this is counter productive and instead they should be turned off, this
will come later.

The key message though, is that no matter how big your tree, the index
layer will cap at whatever is needed to hold 100000 node-values (about
250MB I think). We could push this up or down based on some experience.
The most important thing about this is scaling, of course.

I've also implemented (crudely) the missing iter_entries_prefix, so it
is now ready to be a full drop-in for GraphIndex.

-Rob


-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080714/e8101c4b/attachment.pgp 


More information about the bazaar mailing list