b+tree + bloom status
John Arbash Meinel
john at arbash-meinel.com
Wed Jul 16 17:42:12 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
So I did a bit more work on bloom filters, and finally have some results
showing that they help. :)
Specifically, I switched the hash function, so rather than computing a
sha1 and extracting 5 offsets from that, I use the Murmur hash
(http://murmurhash.googlepages.com/) in its 'slow but safe' form (which
could easily be tweaked), and I implement some of the bit checking in a
pyrex extension.
(One nice thing about Murmur is that it is simple enough to be
implemented in Python, even though it is slow enough I would recommend
disabling blooms if the compiled extension is not available.)
I ran the 'indexbench' suite against both bzr.dev[1] and an
'unpacked'[2] mysql[3] repository.
The performance improvement from using blooms basically depends on how
much of the file you are going to read, and how many requested keys are
going to miss. Some examples
1) Overhead of creating the global bloom is approx 15-25% (10s => 12s,
and 44s => 51s). Insertion is still being done in python, though, so
there is probably some room to help here.
2) iter_all_entries is uneffected, iter_random_one and
iter_random_reload are both slower. This is because they only iterate
entries that exist. Approx 15% slower (6.7s=>8.0s, 29s=>33s)
3) get_components_positions() depends a lot on how many nodes you are
actually extracting. For bzr.dev, even with a great filter, we still
read 576/577 (filtered/unfiltered) nodes. So for bzr.dev and 2000 texts,
we see a slight slowdown (1.75 => 1.79), but for 10 texts we see a large
speed up (0.063 => 0.046) and for mysql we see bigger speedups (10.4 =>
9.4 for 2000 texts, 0.374 => 0.046 for 10 texts).
So best case is about 8x faster.
4) search_from_revid is ~ equivalent. We probably save a bit on lookups,
but as it is doing a whole ancestry graph search, we lose again by
reading all the leaf nodes. (0.967/0.998 and 3.291/3.120)
5) miss torture is also approx the same (9.6/9.0, 40.5/41.9). Part of
the issue here is that we have quite a few nodes with no blooms, because
everything fits on a single page. Looking at the details, blooms are
filtering about 2/3rds of the revision requests and dropping the number
of leaves processed from 3.3k => 2.6k. I would have to look a bit closer
to see if our FPR isn't what I expect it to be.
6) These are not yet bandwidth minimal. I have in mind to update the
bloom class to work in 'pages' internally. Then it should be able to
read only a portion of the total bloom, and still do its job. (Of
course, the more keys you filter, the higher the chance you have to read
the whole thing.)
Actually, for the 1-key case, we might even consider doing the
completely minimal request of reading just the 5 bytes necessary to do
the job.
We certainly can do a lot of tuning for how much to read when, versus
the expectation of needing the data later and extra round trips to get them.
(For example, if you were searching for 20,000 keys, you could read just
the pages with the highest hit rate, try to filter them, then go on to
read the next pages, etc. By filtering before you read all the pages,
you have a good chance of not even needing the other pages. At the
expense of doing more CPU and another round trip.)
7) I really want to do a bit more real-world testing in a network
latency case.
John
=:->
[1] bzr.dev repo has .rix of:
20070, 1001, 999, 930, 10, 10, 10, 10, 10, 10, 10, 1, 1, 1, 1, 1, 1, 1
and .tix of:
44087, 2187, 1932, 1403, 36, 29, 29, 29, 29, 29, 29, 18, 17, 17, 16, 15,
11, 1
[2] lp:~jameinel/+junk/bzr-unpack-repo will do multiple fetches
approximating the 'optimal' unpacked ratio. It ends up skipping some
revisions, so it isn't perfect, but it is a good starting point.
[3] mysql repo has .rix of:
13181, 10637, 10162, 9033, 8799, 8614, 676, 8, 1, 1, 1, 1
and .tix of:
47152, 46012, 42222, 38497, 35579, 31946, 4132, 9, 8, 1, 0, 0
[4] Raw results:
$ bzr indexbench repo_snapshot/bzr.dev/
Baseline overhead: Read 105645 keys in 4.040
BTreeIndex: Wrote 4694016 bytes in 10.297
BTreeIndex: iter_all_entries in 1.029
BTreeIndex: iter_random_one in 6.693,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(64150,7),
~ leaf_node(104049,1067) bisect(169273,0) dupes(0)
BTreeIndex: iter_random_reload in 1.185,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,100),
~ leaf_node(0,800) bisect(900,0) dupes(0)
BTreeIndex: get_components_positions(2000) in 1.748,
~ bloom(0,0) leaf_value(0,0) miss(124067), internal(338,5),
~ leaf_node(11166,564) bisect(28,69330) dupes(0)
BTreeIndex: get_components_positions(10) in 0.063,
~ bloom(0,0) leaf_value(0,0) miss(659), internal(99,5),
~ leaf_node(215,37) bisect(126,246) dupes(0)
BTreeIndex: search_from_revid in 0.967 found 18135,
~ bloom(0,0) leaf_value(0,0) miss(103105), internal(0,0),
~ leaf_node(11121,206) bisect(1562,36577) dupes(0)
BTreeIndex: miss torture in 9.579,
~ bloom(0,0) leaf_value(0,0) miss(1795965), internal(0,7),
~ leaf_node(0,621) bisect(0,326298) dupes(0)
BTreeIndex: -------Done---------
Baseline overhead: Read 105645 keys in 4.118
BTreeIndex_bloom: Wrote 4890624 bytes in 12.730
BTreeIndex_bloom: iter_all_entries in 1.045
BTreeIndex_bloom: iter_random_one in 8.065,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(64150,7),
~ leaf_node(104049,1067) bisect(169273,0) dupes(0)
BTreeIndex_bloom: iter_random_reload in 1.357,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,100),
~ leaf_node(0,800) bisect(900,0) dupes(0)
BTreeIndex_bloom: get_components_positions(2000) in 1.794,
~ bloom(24277,0) leaf_value(0,0) miss(99790), internal(338,5),
~ leaf_node(9892,563) bisect(37,43836) dupes(0)
BTreeIndex_bloom: get_components_positions(10) in 0.046,
~ bloom(128,0) leaf_value(0,0) miss(531), internal(99,5),
~ leaf_node(98,26) bisect(122,115) dupes(0)
BTreeIndex_bloom: search_from_revid in 0.998 found 18135,
~ bloom(20005,0) leaf_value(0,0) miss(83100), internal(0,0),
~ leaf_node(8622,203) bisect(824,17310) dupes(0)
BTreeIndex_bloom: miss torture in 9.091,
~ bloom(317245,0) leaf_value(0,0) miss(1478720), internal(0,6),
~ leaf_node(0,115) bisect(4,270) dupes(0)
BTreeIndex_bloom: -------Done---------
Baseline overhead: Read 105645 keys in 4.056
GraphIndex: Wrote 11242000 bytes in 5.398
GraphIndex: iter_all_entries in 4.103
GraphIndex: iter_random_one in 6.162,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: iter_random_reload in 3.759,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: get_components_positions(2000) in 3.791,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: get_components_positions(10) in 0.094,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: search_from_revid in 1.544 found 18135,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: miss torture in 77.548,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: -------Done---------
$ bzr indexbench mysql-unpacked/mysql-6.0/
Baseline overhead: Read 367786 keys in 25.210
BTreeIndex: Wrote 14804148 bytes in 43.508
BTreeIndex: iter_all_entries in 3.869
BTreeIndex: iter_random_one in 28.938,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(241377,31),
~ leaf_node(364195,3549) bisect(609152,0) dupes(0)
BTreeIndex: iter_random_reload in 1.591,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,300),
~ leaf_node(0,1050) bisect(1350,0) dupes(0)
BTreeIndex: get_components_positions(2000) in 10.421,
~ bloom(0,0) leaf_value(0,0) miss(112868), internal(2152,31),
~ leaf_node(29876,2259) bisect(369,156267) dupes(0)
BTreeIndex: get_components_positions(10) in 0.374,
~ bloom(0,0) leaf_value(0,0) miss(355), internal(106,19),
~ leaf_node(214,74) bisect(93,419) dupes(0)
BTreeIndex: search_from_revid in 3.291 found 59351,
~ bloom(0,0) leaf_value(0,0) miss(359946), internal(0,0),
~ leaf_node(51096,482) bisect(139,255471) dupes(0)
BTreeIndex: miss torture in 40.513,
~ bloom(0,0) leaf_value(0,0) miss(3554530), internal(0,31),
~ leaf_node(0,3354) bisect(0,3438698) dupes(0)
BTreeIndex: -------Done---------
Baseline overhead: Read 367786 keys in 25.303
BTreeIndex_bloom: Wrote 15410356 bytes in 51.074
BTreeIndex_bloom: iter_all_entries in 3.854
BTreeIndex_bloom: iter_random_one in 33.508,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(241377,31),
~ leaf_node(364195,3549) bisect(609152,0) dupes(0)
BTreeIndex_bloom: iter_random_reload in 1.888,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,300),
~ leaf_node(0,1050) bisect(1350,0) dupes(0)
BTreeIndex_bloom: get_components_positions(2000) in 9.376,
~ bloom(66326,0) leaf_value(0,0) miss(46542), internal(1620,31),
~ leaf_node(15339,2005) bisect(350,46662) dupes(0)
BTreeIndex_bloom: get_components_positions(10) in 0.046,
~ bloom(221,0) leaf_value(0,0) miss(134), internal(49,13),
~ leaf_node(46,22) bisect(92,43) dupes(0)
BTreeIndex_bloom: search_from_revid in 3.120 found 59351,
~ bloom(195580,0) leaf_value(0,0) miss(164366), internal(0,0),
~ leaf_node(28795,479) bisect(210,59820) dupes(0)
BTreeIndex_bloom: miss torture in 41.902,
~ bloom(2195302,0) leaf_value(0,0) miss(1359228), internal(0,31),
~ leaf_node(0,2667) bisect(0,20308) dupes(0)
BTreeIndex_bloom: -------Done---------
Baseline overhead: Read 367786 keys in 24.976
GraphIndex: Wrote 50985363 bytes in 25.303
GraphIndex: iter_all_entries in 25.100
GraphIndex: iter_random_one in 27.612,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: iter_random_reload in 6.630,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: get_components_positions(2000) in 12.558,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: get_components_positions(10) in 0.453,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: search_from_revid in 6.209 found 59351,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: miss torture in 1035.225,
~ bloom(0,0) leaf_value(0,0) miss(0), internal(0,0),
~ leaf_node(0,0) bisect(0,0) dupes(0)
GraphIndex: -------Done---------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkh+JOQACgkQJdeBCYSNAAOCeACdHMmGT1+ynyJl8xeGgtl1KTkZ
wdwAnjNjYJm5aUUv7fSbu6592H+Mqqcz
=71g/
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list