New toy for bzr developers: B+Tree index sketch

Tue Jul 1 14:57:51 BST 2008

On Tue, 2008-07-01 at 13:17 +1000, Robert Collins wrote:
> I have been tinkering with a B+Tree approach to the disk indices. I
> prefer this over a hash based index because of the ability to do
> in-order traversal when desired.
> 
> In short, I think we can save substantial size with the currently
> developed B+Tree and it allows addressing some of the memory problems
> Andrew has spent some time debugging.

I have turned this into a repository implementation - currently only
handling plain repositories - but its easy enough to extend to rich
roots and subtrees if there is interest in looking at this.

lp:~lifeless/+junk/bzr-index2

Known defects:
 - iter_entries_prefix is not yet implemented (but its not needed in
bzr.dev either)
 - multi-key searches do not use the index as well as it can be to
reduce IO. In particular we just bisect each time.

I'd love to see Ian's usertest run against --btree-plain, and any other
low-human-time to perform benchmarks. For instance, bzr's repository
size has been compared against gits, seeing what this does to help with
such comparisons might be nice, on a bzr-svn converted repository
(though this needs a rich-root format - see above). I'll toss one of
those in tomorrow probably.

As a reminder:
http://bazaar-vcs.org/RobertCollins#head-b2dc9ae46f1860c46f79682db455c3512152d148
has my notes on improving the index layer. This btree index -should-
address:
 size (roughly half as big as our current indices)
 performance (theoretically much cheaper find-key logic)
 memory utilisation (fixed size lru cache)

To compare two indices - (using launchpad as a handy, sizable code base):
 - btree : 7.8M
 - graphindex : 22M

A linear scan across all the keys in the indices:
 - btree : 3.554 seconds
 - graphindex : 13.013 seconds

Random access - grab all the keys, permute, then access each key individually once:
 - btree : 297 seconds (100 page cache)
 - btree : 95 seconds (1000 page cache)
 - graphindex : 25 seconds

Not tested yet:
 - high latency (NFS or Remote or http) links

I think the btree random access is so much slower because it is totally
unoptimised. One thing is that is always bisecting to find the page of a
key. If we had a plain dict of keys to probe in first, we could avoid
that extra work. We could further eliminate the leaf nodes from the page
cache (so getting more hits on locator nodes), but that would then
require an IO on every key miss; so I'd prefer to not do that. Another
possibility would be to put the keys from a leaf node into a plain dict,
and have a callback on leaf node removal from the _node_cache to remove
those keys from the dict.

Clearly the raw access to all keys is cheaper, so it should be possible
to have random access be roughly the same (modulo cache thrashing
situations - and we should try to avoid those from higher up in the
code).

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080701/6abb45ff/attachment.pgp