B+Tree indices: ongoing progress
John Arbash Meinel
john at arbash-meinel.com
Wed Jul 2 04:46:48 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
| On Wed, 2008-07-02 at 13:19 +1000, Robert Collins wrote:
|
|
|> And some good news:
|> Baseline overhead: Read 92994 keys in 7.075
|> GraphIndex: Wrote 10147567 bytes in 10.949
|> GraphIndex: iter_all_entries in 9.671
|> GraphIndex: iter_random_one in 11.915
|> GraphIndex: get_components_positions(2000) in 20.535
|> GraphIndex: search_from_revid in 6.516
|> GraphIndex: miss torture in 343.138
|> GraphIndex: -------Done---------
|> Baseline overhead: Read 92994 keys in 7.072
|> BTreeIndex: Wrote 4104336 bytes in 20.299
|> BTreeIndex: iter_all_entries in 1.850
|> BTreeIndex: iter_random_one in 115.365
|> BTreeIndex: get_components_positions(2000) in 33.332
|> BTreeIndex: search_from_revid in 4.581
|> BTreeIndex: miss torture in 41.803
|> BTreeIndex: -------Done---------
|>
|> Note the search_from_revid and miss torture results.
|
| Some next steps:
| - I plan to integrate bloom filters into internal nodes. I'll do this
| by reserving some fixed byte count (say 200 bytes) and assuming the
| filter won't compress at all. I'll also propogate every key seen all the
| way up the in-progress tree. This will also need a 'complete' filter
| which will accumulate every key so that when we do add a new row we
| don't have to pass over seen keys again.
| - iter_random_one concerns me, but I see John is hacking on the lookup
| performance, which could help that. A large cache size or even dynamic
| cache may help, but I'd like a bloom in place before that to avoid
| misses causing cache exhaustion and skewing results.
| - I'll add a rich root/subtree formats shortly for more real world
| testing.
|
| -Rob
|
bloom filters generally are 8 bits per number of nodes to get a decent
false positive rate. Specifically, 9.6 bits gives 1%, 4.8 bits gives
10%. I would consider not having a bloom filter if it would have more
referred keys than 4 * num bytes (fpr > 50%).
So if you are reserving 200 bytes, then any time you have more than 800
keys underneath that node, the bloom filter is pretty worthless.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhq+igACgkQJdeBCYSNAAN0MwCeK0SnE1Kr6r53dyU+e0d7gW1D
0kkAnjh8csSGNqsvfMH2S0td2hBna8K0
=IJ+9
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list