Rev 18: Merge Johns improvements. in http://people.ubuntu.com/~robertc/baz2.0/plugins/index2/trunk

Robert Collins robertc at robertcollins.net
Wed Jul 2 23:18:45 BST 2008


At http://people.ubuntu.com/~robertc/baz2.0/plugins/index2/trunk

------------------------------------------------------------
revno: 18
revision-id: robertc at robertcollins.net-20080702221844-aw67l2n2hdt2hvw0
parent: robertc at robertcollins.net-20080702220240-73x9ykl98pqpg8by
parent: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
committer: Robert Collins <robertc at robertcollins.net>
branch nick: trunk
timestamp: Thu 2008-07-03 08:18:44 +1000
message:
  Merge Johns improvements.
modified:
  btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
  chunk_writer.py                chunk_writer.py-20080630234519-6ggn4id17nipovny-1
  indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
  tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.23
    revision-id: john at arbash-meinel.com-20080702210419-4xfq1jb9k4cuksk1
    parent: john at arbash-meinel.com-20080702210046-j00t51pj8kod09hz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 16:04:19 -0500
    message:
      Restore the multi-way bisecting
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.22
    revision-id: john at arbash-meinel.com-20080702210046-j00t51pj8kod09hz
    parent: john at arbash-meinel.com-20080702204409-s512bff7tus0cb8n
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 16:00:46 -0500
    message:
      temporary work to revert back to key-at-a-time to test performance
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.21
    revision-id: john at arbash-meinel.com-20080702204409-s512bff7tus0cb8n
    parent: john at arbash-meinel.com-20080702202600-3q9fnpmwedog3j5v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 15:44:09 -0500
    message:
      disable the alternative multi-way bisecting, and allow the leaf_value cache to be disabled.
      
      LRUCaches and random access do not play well together.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.20
    revision-id: john at arbash-meinel.com-20080702202600-3q9fnpmwedog3j5v
    parent: john at arbash-meinel.com-20080702201701-f40mgcxxf229qhei
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 15:26:00 -0500
    message:
      Consider bisecting more often when relevant, but it doesn't seem to help a whole lot.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.19
    revision-id: john at arbash-meinel.com-20080702201701-f40mgcxxf229qhei
    parent: john at arbash-meinel.com-20080702195000-hgd1exet98nfxt7e
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 15:17:01 -0500
    message:
      More instrumenting
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    ------------------------------------------------------------
    revno: 8.1.18
    revision-id: john at arbash-meinel.com-20080702195000-hgd1exet98nfxt7e
    parent: john at arbash-meinel.com-20080702182656-cudnvqe2kfstxits
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 14:50:00 -0500
    message:
      Add an LRU cache for leaf values rather than leaf nodes.
      
      Add a bunch more hit/miss rate metrics.
      It turns out that the critical path for iter_ancestry() is the miss rate.
      However, when searching, any node that misses in an index is not requested
      again from that index. However, it does cause us to load the leaf node
      where it *might* have hit. Causing us to repeatedly load the missing nodes.
      So we do need to cache leaf nodes, so we can answer 'missing' results quickly.
      Blooms don't answer the question fast enough yet, or at least they
      don't have a very high hit rate for filtering out new requests.
      Out of 58,000 actual misses, blooms only caught 6000.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    ------------------------------------------------------------
    revno: 8.1.17
    revision-id: john at arbash-meinel.com-20080702182656-cudnvqe2kfstxits
    parent: john at arbash-meinel.com-20080702181746-xlca6kf0iusfmxsy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 13:26:56 -0500
    message:
      Make sure we find the right nodes, even if the keys are randomly permutated.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.16
    revision-id: john at arbash-meinel.com-20080702181746-xlca6kf0iusfmxsy
    parent: john at arbash-meinel.com-20080702174605-j9oaswfan8zvg2k7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 13:17:46 -0500
    message:
      Updates for when bloom filters are active.
      Set NodeBloom._num_entries because it is required for Bloom.__repr__
      Allow requesting more nodes than can be cached, just mutter that it won't
      actually be stored. But since the function returns the objects, they will
      be valid for a little while.
      When using bloom filters, we need to sort our keys if they weren't already
      sorted.
      Update the indexbench tests, to use graph.iter_ancestry() and to assert
      that the value is actually correct.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.15
    revision-id: john at arbash-meinel.com-20080702174605-j9oaswfan8zvg2k7
    parent: john at arbash-meinel.com-20080702172707-906pywtz35vgv99f
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 12:46:05 -0500
    message:
      Update the test suite to take a branch, so we know that we are iterating a reasonable amount of history
    modified:
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    ------------------------------------------------------------
    revno: 8.1.14
    revision-id: john at arbash-meinel.com-20080702172707-906pywtz35vgv99f
    parent: john at arbash-meinel.com-20080702165051-rt6pl41h8tt0wo3u
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 12:27:07 -0500
    message:
      pass the parameter in correctly
    modified:
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    ------------------------------------------------------------
    revno: 8.1.13
    revision-id: john at arbash-meinel.com-20080702165051-rt6pl41h8tt0wo3u
    parent: john at arbash-meinel.com-20080702162254-tgks103u1pyz2fl6
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 11:50:51 -0500
    message:
      Split out the NodeBloom.get_raw_offsets into a static function.
      Break the node caching into 2 caches, one for internal nodes, one for leaf nodes.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.12
    revision-id: john at arbash-meinel.com-20080702162254-tgks103u1pyz2fl6
    parent: john at arbash-meinel.com-20080702161654-vnxiym2kv1ygtxn6
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 11:22:54 -0500
    message:
      Handle when the only node is a leaf node
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.11
    revision-id: john at arbash-meinel.com-20080702161654-vnxiym2kv1ygtxn6
    parent: john at arbash-meinel.com-20080702161204-uvj9yxiq464wngc0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 11:16:54 -0500
    message:
      Start using the raw offsets when iterating.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.10
    revision-id: john at arbash-meinel.com-20080702161204-uvj9yxiq464wngc0
    parent: john at arbash-meinel.com-20080702160822-4q8qdi10120l1aen
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 11:12:04 -0500
    message:
      Assert that the raw offsets are independent of the bloom size.
    modified:
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.9
    revision-id: john at arbash-meinel.com-20080702160822-4q8qdi10120l1aen
    parent: john at arbash-meinel.com-20080702155655-z77lyi16cvfkpwn6
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 11:08:22 -0500
    message:
      Add the ability to deal in 'raw' offsets, which are independent of bloom length
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.8
    revision-id: john at arbash-meinel.com-20080702155655-z77lyi16cvfkpwn6
    parent: john at arbash-meinel.com-20080702145412-3qulu1g2k4cm3r57
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 10:56:55 -0500
    message:
      Reset the current bloom, when we add a new internal node.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.7
    revision-id: john at arbash-meinel.com-20080702145412-3qulu1g2k4cm3r57
    parent: john at arbash-meinel.com-20080702145001-vz3rohjcio2226cb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 09:54:12 -0500
    message:
      Get rid of the left bisect, as we don't use it
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.6
    revision-id: john at arbash-meinel.com-20080702145001-vz3rohjcio2226cb
    parent: john at arbash-meinel.com-20080702144538-m492wdmmu7li4ceq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 09:50:01 -0500
    message:
      move _use_blooms to a member variable to make the tests independent of default settings
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.5
    revision-id: john at arbash-meinel.com-20080702144538-m492wdmmu7li4ceq
    parent: john at arbash-meinel.com-20080702055951-2aadto8k1552yvd5
    parent: robertc at robertcollins.net-20080702125159-o72w19dm8g5yrtt0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 09:45:38 -0500
    message:
      Merge in Robert's latest work and get the tests passing again.
      
      Note that the bloom tests require use_blooms = True for them to actually work.
    added:
      indexbench.py                  indexbench.py-20080702083855-5tju02y79rw7kkzh-1
    modified:
      __init__.py                    __init__.py-20080624222253-p0x5f92uyh5hw734-5
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      chunk_writer.py                chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
      tests/test_chunk_writer.py     test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
    ------------------------------------------------------------
    revno: 8.1.4
    revision-id: john at arbash-meinel.com-20080702055951-2aadto8k1552yvd5
    parent: john at arbash-meinel.com-20080702054002-tripfvw7uca0jyn2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 00:59:51 -0500
    message:
      Read leaf nodes in one go without caching.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 8.1.3
    revision-id: john at arbash-meinel.com-20080702054002-tripfvw7uca0jyn2
    parent: john at arbash-meinel.com-20080702053739-zxv81xne6i1pl1gx
    parent: john at arbash-meinel.com-20080702045350-i5vjt3tqkyvsh3re
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 00:40:02 -0500
    message:
      Bring in the iter_entries changes, and clean them up slightly.
    modified:
      btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
        ------------------------------------------------------------
        revno: 7.1.8
        revision-id: john at arbash-meinel.com-20080702045350-i5vjt3tqkyvsh3re
        parent: john at arbash-meinel.com-20080702032221-9qn5l8n7owpg11u7
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: index2
        timestamp: Tue 2008-07-01 23:53:50 -0500
        message:
          Change the looping structure for iter_entries
        modified:
          btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
        ------------------------------------------------------------
        revno: 7.1.7
        revision-id: john at arbash-meinel.com-20080702032221-9qn5l8n7owpg11u7
        parent: john at arbash-meinel.com-20080701231851-rvuj7146p23sr25v
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: index2
        timestamp: Tue 2008-07-01 22:22:21 -0500
        message:
          Implement and test left-way bisecting, revert to the bisect_XX builtin for len(in_keys) == 1
        modified:
          btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
          tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
        ------------------------------------------------------------
        revno: 7.1.6
        revision-id: john at arbash-meinel.com-20080701231851-rvuj7146p23sr25v
        parent: john at arbash-meinel.com-20080701231134-73vr4qkr6dogf3h0
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: index2
        timestamp: Tue 2008-07-01 18:18:51 -0500
        message:
          Return the keys themselves along with the node we care about.
        modified:
          btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
          tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
        ------------------------------------------------------------
        revno: 7.1.5
        revision-id: john at arbash-meinel.com-20080701231134-73vr4qkr6dogf3h0
        parent: john at arbash-meinel.com-20080701192734-9mbeqa4zszat3z1f
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: index2
        timestamp: Tue 2008-07-01 18:11:34 -0500
        message:
          Write a helper function that can turn 2 lists of keys into a position list.
        modified:
          btree_index.py                 index.py-20080624222253-p0x5f92uyh5hw734-7
          tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13
    ------------------------------------------------------------
    revno: 8.1.2
    revision-id: john at arbash-meinel.com-20080702053739-zxv81xne6i1pl1gx
    parent: john at arbash-meinel.com-20080702052941-xfx5k9vhlph5seut
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 00:37:39 -0500
    message:
      Use >= to make it clearer what is going on.
    modified:
      chunk_writer.py                chunk_writer.py-20080630234519-6ggn4id17nipovny-1
    ------------------------------------------------------------
    revno: 8.1.1
    revision-id: john at arbash-meinel.com-20080702052941-xfx5k9vhlph5seut
    parent: robertc at robertcollins.net-20080701224410-2lbqoqc2dc5v3iey
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: index2
    timestamp: Wed 2008-07-02 00:29:41 -0500
    message:
      Bring in some limitations on the repacking, which shows decent performance wins.
    modified:
      chunk_writer.py                chunk_writer.py-20080630234519-6ggn4id17nipovny-1
      tests/test_btree_index.py      test_index.py-20080624222253-p0x5f92uyh5hw734-13

Diff too large for email (1058 lines, the limit is 1000).



More information about the bazaar-commits mailing list