Rev 18: Merge Johns improvements. in http://people.ubuntu.com/~robertc/baz2.0/plugins/index2/trunk
Robert Collins
robertc at robertcollins.net
Wed Jul 2 23:18:45 BST 2008
At http://people.ubuntu.com/~robertc/baz2.0/plugins/index2/trunk
------------------------------------------------------------
revno: 18
revision-id: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
parent: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 08:18:44 +1000
message:
Merge Johns improvements.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.23
revision-id: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
parent: john at arbash-meinel.com-20080702210046-j00t51pj8kod09hz
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 16:04:19 -0500
message:
Restore the multi-way bisecting
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.22
revision-id: john at arbash-meinel.com-20080702210046-j00t51pj8kod09hz
parent: john at arbash-meinel.com-20080702204409-s512bff7tus0cb8n
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 16:00:46 -0500
message:
temporary work to revert back to key-at-a-time to test performance
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.21
revision-id: john at arbash-meinel.com-20080702204409-s512bff7tus0cb8n
parent: john at arbash-meinel.com-20080702202600-3q9fnpmwedog3j5v
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 15:44:09 -0500
message:
disable the alternative multi-way bisecting, and allow the leaf_value cache to be disabled.
LRUCaches and random access do not play well together.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.20
revision-id: john at arbash-meinel.com-20080702202600-3q9fnpmwedog3j5v
parent: john at arbash-meinel.com-20080702201701-f40mgcxxf229qhei
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 15:26:00 -0500
message:
Consider bisecting more often when relevant, but it doesn't seem to help a whole lot.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.19
revision-id: john at arbash-meinel.com-20080702201701-f40mgcxxf229qhei
parent: john at arbash-meinel.com-20080702195000-hgd1exet98nfxt7e
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 15:17:01 -0500
message:
More instrumenting
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
------------------------------------------------------------
revno: 8.1.18
revision-id: john at arbash-meinel.com-20080702195000-hgd1exet98nfxt7e
parent: john at arbash-meinel.com-20080702182656-cudnvqe2kfstxits
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 14:50:00 -0500
message:
Add an LRU cache for leaf values rather than leaf nodes.
Add a bunch more hit/miss rate metrics.
It turns out that the critical path for iter_ancestry() is the miss rate.
However, when searching, any node that misses in an index is not requested
again from that index. However, it does cause us to load the leaf node
where it *might* have hit. Causing us to repeatedly load the missing nodes.
So we do need to cache leaf nodes, so we can answer 'missing' results quickly.
Blooms don't answer the question fast enough yet, or at least they
don't have a very high hit rate for filtering out new requests.
Out of 58,000 actual misses, blooms only caught 6000.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
------------------------------------------------------------
revno: 8.1.17
revision-id: john at arbash-meinel.com-20080702182656-cudnvqe2kfstxits
parent: john at arbash-meinel.com-20080702181746-xlca6kf0iusfmxsy
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 13:26:56 -0500
message:
Make sure we find the right nodes, even if the keys are randomly permutated.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.16
revision-id: john at arbash-meinel.com-20080702181746-xlca6kf0iusfmxsy
parent: john at arbash-meinel.com-20080702174605-j9oaswfan8zvg2k7
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 13:17:46 -0500
message:
Updates for when bloom filters are active.
Set NodeBloom._num_entries because it is required for Bloom.__repr__
Allow requesting more nodes than can be cached, just mutter that it won't
actually be stored. But since the function returns the objects, they will
be valid for a little while.
When using bloom filters, we need to sort our keys if they weren't already
sorted.
Update the indexbench tests, to use graph.iter_ancestry() and to assert
that the value is actually correct.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.15
revision-id: john at arbash-meinel.com-20080702174605-j9oaswfan8zvg2k7
parent: john at arbash-meinel.com-20080702172707-906pywtz35vgv99f
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 12:46:05 -0500
message:
Update the test suite to take a branch, so we know that we are iterating a reasonable amount of history
modified:
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
------------------------------------------------------------
revno: 8.1.14
revision-id: john at arbash-meinel.com-20080702172707-906pywtz35vgv99f
parent: john at arbash-meinel.com-20080702165051-rt6pl41h8tt0wo3u
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 12:27:07 -0500
message:
pass the parameter in correctly
modified:
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
------------------------------------------------------------
revno: 8.1.13
revision-id: john at arbash-meinel.com-20080702165051-rt6pl41h8tt0wo3u
parent: john at arbash-meinel.com-20080702162254-tgks103u1pyz2fl6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 11:50:51 -0500
message:
Split out the NodeBloom.get_raw_offsets into a static function.
Break the node caching into 2 caches, one for internal nodes, one for leaf nodes.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.12
revision-id: john at arbash-meinel.com-20080702162254-tgks103u1pyz2fl6
parent: john at arbash-meinel.com-20080702161654-vnxiym2kv1ygtxn6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 11:22:54 -0500
message:
Handle when the only node is a leaf node
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.11
revision-id: john at arbash-meinel.com-20080702161654-vnxiym2kv1ygtxn6
parent: john at arbash-meinel.com-20080702161204-uvj9yxiq464wngc0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 11:16:54 -0500
message:
Start using the raw offsets when iterating.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.10
revision-id: john at arbash-meinel.com-20080702161204-uvj9yxiq464wngc0
parent: john at arbash-meinel.com-20080702160822-4q8qdi10120l1aen
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 11:12:04 -0500
message:
Assert that the raw offsets are independent of the bloom size.
modified:
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.9
revision-id: john at arbash-meinel.com-20080702160822-4q8qdi10120l1aen
parent: john at arbash-meinel.com-20080702155655-z77lyi16cvfkpwn6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 11:08:22 -0500
message:
Add the ability to deal in 'raw' offsets, which are independent of bloom length
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.8
revision-id: john at arbash-meinel.com-20080702155655-z77lyi16cvfkpwn6
parent: john at arbash-meinel.com-20080702145412-3qulu1g2k4cm3r57
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 10:56:55 -0500
message:
Reset the current bloom, when we add a new internal node.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.7
revision-id: john at arbash-meinel.com-20080702145412-3qulu1g2k4cm3r57
parent: john at arbash-meinel.com-20080702145001-vz3rohjcio2226cb
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 09:54:12 -0500
message:
Get rid of the left bisect, as we don't use it
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.6
revision-id: john at arbash-meinel.com-20080702145001-vz3rohjcio2226cb
parent: john at arbash-meinel.com-20080702144538-m492wdmmu7li4ceq
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 09:50:01 -0500
message:
move _use_blooms to a member variable to make the tests independent of default settings
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.5
revision-id: john at arbash-meinel.com-20080702144538-m492wdmmu7li4ceq
parent: john at arbash-meinel.com-20080702055951-2aadto8k1552yvd5
parent: robertc at robertcollins.net-20080702125159-o72w19dm8g5yrtt0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 09:45:38 -0500
message:
Merge in Robert's latest work and get the tests passing again.
Note that the bloom tests require use_blooms = True for them to actually work.
added:
indexbench.py indexbench.py-20080702083855-5tju02y79rw7kkzh-1
modified:
__init__.py __init__.py-20080624222253-p0x5f92uyh5hw734-5
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
------------------------------------------------------------
revno: 8.1.4
revision-id: john at arbash-meinel.com-20080702055951-2aadto8k1552yvd5
parent: john at arbash-meinel.com-20080702054002-tripfvw7uca0jyn2
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 00:59:51 -0500
message:
Read leaf nodes in one go without caching.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 8.1.3
revision-id: john at arbash-meinel.com-20080702054002-tripfvw7uca0jyn2
parent: john at arbash-meinel.com-20080702053739-zxv81xne6i1pl1gx
parent: john at arbash-meinel.com-20080702045350-i5vjt3tqkyvsh3re
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 00:40:02 -0500
message:
Bring in the iter_entries changes, and clean them up slightly.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 7.1.8
revision-id: john at arbash-meinel.com-20080702045350-i5vjt3tqkyvsh3re
parent: john at arbash-meinel.com-20080702032221-9qn5l8n7owpg11u7
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Tue 2008-07-01 23:53:50 -0500
message:
Change the looping structure for iter_entries
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 7.1.7
revision-id: john at arbash-meinel.com-20080702032221-9qn5l8n7owpg11u7
parent: john at arbash-meinel.com-20080701231851-rvuj7146p23sr25v
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Tue 2008-07-01 22:22:21 -0500
message:
Implement and test left-way bisecting, revert to the bisect_XX builtin for len(in_keys) == 1
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 7.1.6
revision-id: john at arbash-meinel.com-20080701231851-rvuj7146p23sr25v
parent: john at arbash-meinel.com-20080701231134-73vr4qkr6dogf3h0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Tue 2008-07-01 18:18:51 -0500
message:
Return the keys themselves along with the node we care about.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 7.1.5
revision-id: john at arbash-meinel.com-20080701231134-73vr4qkr6dogf3h0
parent: john at arbash-meinel.com-20080701192734-9mbeqa4zszat3z1f
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Tue 2008-07-01 18:11:34 -0500
message:
Write a helper function that can turn 2 lists of keys into a position list.
modified:
btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 8.1.2
revision-id: john at arbash-meinel.com-20080702053739-zxv81xne6i1pl1gx
parent: john at arbash-meinel.com-20080702052941-xfx5k9vhlph5seut
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 00:37:39 -0500
message:
Use >= to make it clearer what is going on.
modified:
chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
------------------------------------------------------------
revno: 8.1.1
revision-id: john at arbash-meinel.com-20080702052941-xfx5k9vhlph5seut
parent: robertc at robertcollins.net-20080701224410-2lbqoqc2dc5v3iey
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index2
timestamp: Wed 2008-07-02 00:29:41 -0500
message:
Bring in some limitations on the repacking, which shows decent performance wins.
modified:
chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
Diff too large for email (1058 lines, the limit is 1000).
More information about the bazaar-commits
mailing list