Rev 3791: Merge John and Andrew's outstanding work. in http://people.ubuntu.com/~robertc/baz2.0/repository

Robert Collins robertc at robertcollins.net
Mon Dec 1 04:36:46 GMT 2008


At http://people.ubuntu.com/~robertc/baz2.0/repository

------------------------------------------------------------
revno: 3791
revision-id: robertc at robertcollins.net-20081201043625-77tcktr1dfxc7w5l
parent: robertc at robertcollins.net-20081201041249-l3t4hpbv4jqlfbek
parent: john at arbash-meinel.com-20081120203437-t2n4xy1emhjhkx0n
committer: Robert Collins <robertc at robertcollins.net>
branch nick: repository
timestamp: Mon 2008-12-01 15:36:25 +1100
message:
  Merge John and Andrew's outstanding work.
modified:
  bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
  bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.19
    revision-id: john at arbash-meinel.com-20081120203437-t2n4xy1emhjhkx0n
    parent: john at arbash-meinel.com-20081120200322-oqospzaz4uw7z80j
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 14:34:37 -0600
    message:
      Refactor iter_interesting a little bit.
      
      Move the search for uninteresting nodes into a helper so that it can
      shortcut the 'no uninteresting' case (which is the First Branch case).
      Also, change the logic so that we yield interesting records as we process
      them, rather than doing buffering at that point.
      We still buffer the first request if there is an uninteresting key,
      but it helps to buffer less overall (and no buffering in the initial-branch
      case, which would be our highest memory consumption).
      It fails for the 'branch -r1 && pull' case, but it should work well
      for the 'pull the last 10' and the 'initial branch' cases.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.18
    revision-id: john at arbash-meinel.com-20081120200322-oqospzaz4uw7z80j
    parent: john at arbash-meinel.com-20081120194126-qemvcimvnlbjrgfz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 14:03:22 -0600
    message:
      Make the versionedfile import lazy.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3778.1.17
    revision-id: john at arbash-meinel.com-20081120194126-qemvcimvnlbjrgfz
    parent: john at arbash-meinel.com-20081120190006-9v8a0lwk1cdws66m
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 13:41:26 -0600
    message:
      Pass around a progress bar and switch to using an adapter.
      
      The way the code worked was to use include_delta_closure=True, but that
      causes KVF.get_record_stream() to buffer everything before it starts returning
      data, which means that we don't get any opportunity to tick the progress bar
      until everything is extracted.
      Using an adapter also means that we can return the records as they exist,
      rather than casting up to fulltext to have them compressed again during
      insert_record_stream.
      It also means we *could* handle compressed records and copy them as
      compressed, though currently we only support the knit-ft-gz => fulltext
      adapter.
      
      This doesn't make the copy any faster, it seems to be a fraction slower.
      However we get progress as we go, rather than nothing.
      Only using pb.tick() since we don't really have an idea how many intermediate
      bits we are going to need to process.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3778.1.16
    revision-id: john at arbash-meinel.com-20081120190006-9v8a0lwk1cdws66m
    parent: john at arbash-meinel.com-20081120184852-hbed0mka3y5suuum
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 13:00:06 -0600
    message:
      Fix _find_file_keys_to_fetch to break things down into groups.
      
      This is fairly important, as we do get duplicate items from iter_interesting_nodes.
    modified:
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3778.1.15
    revision-id: john at arbash-meinel.com-20081120184852-hbed0mka3y5suuum
    parent: john at arbash-meinel.com-20081120180956-6m56ocdoyno1buvz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 12:48:52 -0600
    message:
      Found a bug in iter_interesting_nodes and its test suite.
      
      It seems we weren't properly testing the items that were returned, and
      we were only returning the items found on the last page.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.14
    revision-id: john at arbash-meinel.com-20081120180956-6m56ocdoyno1buvz
    parent: john at arbash-meinel.com-20081118193129-o7ucw3zyl6ft0in1
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Thu 2008-11-20 12:09:56 -0600
    message:
      Start using the iter_interesting_nodes.
      
      It can be used for both fetch() and for determining file texts to fetch.
      This clearly highlights how we are double (and could be triple) handling
      the inventory nodes.
      
      There is also a subtle bug right now, which is causing us to not fetch
      any of the actual file content, need to debug that.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/fetch.py                fetch.py-20050818234941-26fea6105696365d
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3778.1.13
    revision-id: john at arbash-meinel.com-20081118193129-o7ucw3zyl6ft0in1
    parent: john at arbash-meinel.com-20081118192746-hwqxewgi1nxx7ewb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Tue 2008-11-18 13:31:29 -0600
    message:
      Remove the CHKMap.copy_to() completely.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.12
    revision-id: john at arbash-meinel.com-20081118192746-hwqxewgi1nxx7ewb
    parent: john at arbash-meinel.com-20081116004615-i6nhr9xyk27aj0f4
    parent: vila at scythe-20081114084456-smayxy58w72rut67
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Tue 2008-11-18 13:27:46 -0600
    message:
      Merge brisbane-core, cut out the copy_to shortcut.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/intertree_implementations/__init__.py __init__.py-20060724101752-09ysswo1a92uqyoz-3
      bzrlib/tests/intertree_implementations/test_compare.py test_compare.py-20060724101752-09ysswo1a92uqyoz-2
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
      bzrlib/tests/test_inv.py       testinv.py-20050722220913-1dc326138d1a5892
      bzrlib/tree.py                 tree.py-20050309040759-9d5f2496be663e77
    ------------------------------------------------------------
    revno: 3778.1.11
    revision-id: john at arbash-meinel.com-20081116004615-i6nhr9xyk27aj0f4
    parent: john at arbash-meinel.com-20081115233003-a2hl72788umwk1od
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 18:46:15 -0600
    message:
      Handle when an InternalNode decides it needs to split.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.10
    revision-id: john at arbash-meinel.com-20081115233003-a2hl72788umwk1od
    parent: john at arbash-meinel.com-20081115231836-tox5mvgrf6stegmx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 17:30:03 -0600
    message:
      Use copy_to when directly inserting inventories, even though it is chatty.
      
      Update it to also copy the parent_id_basename_to_file_id map.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
    ------------------------------------------------------------
    revno: 3778.1.9
    revision-id: john at arbash-meinel.com-20081115231836-tox5mvgrf6stegmx
    parent: john at arbash-meinel.com-20081115165421-c5o1bloqpbid1ob8
    parent: andrew.bennetts at canonical.com-20081111070622-14i0n8g8meqo4yl8
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 17:18:36 -0600
    message:
      Merge in the copy_to code in preparation for fixing it up.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
        ------------------------------------------------------------
        revno: 3756.1.6
        revision-id: andrew.bennetts at canonical.com-20081111070622-14i0n8g8meqo4yl8
        parent: andrew.bennetts at canonical.com-20081111043629-ojx8u4wob9kwuatt
        committer: Andrew Bennetts <andrew.bennetts at canonical.com>
        branch nick: chk
        timestamp: Tue 2008-11-11 17:06:22 +1000
        message:
          Make fetch much much faster.  Adds CHKMap.copy_to(other_store)
        modified:
          bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
          bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
          bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.8
    revision-id: john at arbash-meinel.com-20081115165421-c5o1bloqpbid1ob8
    parent: john at arbash-meinel.com-20081115165023-cq0tm2epbzaswfif
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 10:54:21 -0600
    message:
      Simplify the interface.
      
      We always returned the identical set of records we read vs records to read.
      So just yield the records we read. Consider changing that to just the keys
      in the future, since we may not want to allow maintaining refs to record objects.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.7
    revision-id: john at arbash-meinel.com-20081115165023-cq0tm2epbzaswfif
    parent: john at arbash-meinel.com-20081115163058-0ofyl1lmpi8n6ncr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 10:50:23 -0600
    message:
      Cleanup pass.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3778.1.6
    revision-id: john at arbash-meinel.com-20081115163058-0ofyl1lmpi8n6ncr
    parent: john at arbash-meinel.com-20081115152959-0fu4wkl5d0jaabiv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 10:30:58 -0600
    message:
      Add a first pass to the interesting search.
      
      This does a quick match between known uninteresting and possible interesting,
      exact matches can be culled from all searches. In the future, we could do
      this in a more incremental method, and pay attention to the (serialized) key
      prefixes. It takes a lot more work, though.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.5
    revision-id: john at arbash-meinel.com-20081115152959-0fu4wkl5d0jaabiv
    parent: john at arbash-meinel.com-20081115005743-uekt9z9xvygy5bj5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Sat 2008-11-15 09:29:59 -0600
    message:
      Don't allow an InternalNode to add a key that doesn't fit.
      
      Instead split out a new prefix and add self as a child.
      Seems to fix the stability issues from earlier.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.4
    revision-id: john at arbash-meinel.com-20081115005743-uekt9z9xvygy5bj5
    parent: john at arbash-meinel.com-20081114223003-sknu0dgjzqclknqb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Fri 2008-11-14 18:57:43 -0600
    message:
      Some small cleanups, and fix _dump_tree to handle in-progress nodes.
      
      The bug seems to be that LeafNode's split based on what keys have been added, and
      InternalNodes never think about when they would want to split or combine.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.3
    revision-id: john at arbash-meinel.com-20081114223003-sknu0dgjzqclknqb
    parent: john at arbash-meinel.com-20081114221130-gtoytnuh4rvdnb7h
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Fri 2008-11-14 16:30:03 -0600
    message:
      Add a direct test case that shows how the map() function is not
      properly deterministic.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.2
    revision-id: john at arbash-meinel.com-20081114221130-gtoytnuh4rvdnb7h
    parent: john at arbash-meinel.com-20081114084445-a0en1xevr1oef4xj
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Fri 2008-11-14 16:11:30 -0600
    message:
      Add a _dump_tree helper that assists in debugging what is going on.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3778.1.1
    revision-id: john at arbash-meinel.com-20081114084445-a0en1xevr1oef4xj
    parent: robertc at robertcollins.net-20081114021953-ckqpcsakzrk1ns1l
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: fetch
    timestamp: Fri 2008-11-14 02:44:45 -0600
    message:
      Start working on an iter_interesting_nodes, which can find nodes to transmit
      in 'parallel'. Having some small problems when looking at multiple dicts.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py	2008-11-14 03:29:57 +0000
+++ b/bzrlib/chk_map.py	2008-11-20 20:34:37 +0000
@@ -38,7 +38,10 @@
 """
 
 import heapq
-import osutils
+from bzrlib import lazy_import
+lazy_import.lazy_import(globals(), """
+from bzrlib import versionedfile
+""")
 
 
 class CHKMap(object):
@@ -101,6 +104,30 @@
         stream = self._store.get_record_stream([key], 'unordered', True)
         return stream.next().get_bytes_as('fulltext')
 
+    def _dump_tree(self):
+        """Return the tree in a string representation."""
+        self._ensure_root()
+        res = self._dump_tree_node(self._root_node, prefix='', indent='')
+        return '\n'.join(res)
+
+    def _dump_tree_node(self, node, prefix, indent):
+        """For this node and all children, generate a string representation."""
+        result = []
+        node_key = node.key()
+        if node_key is not None:
+            node_key = node_key[0]
+        result.append('%s%r %s %s' % (indent, prefix, node.__class__.__name__,
+                                        node_key))
+        if isinstance(node, InternalNode):
+            # Trigger all child nodes to get loaded
+            list(node._iter_nodes(self._store))
+            for prefix, sub in sorted(node._items.iteritems()):
+                result.extend(self._dump_tree_node(sub, prefix, indent + '  '))
+        else:
+            for key, value in sorted(node._items.iteritems()):
+                result.append('      %r %r' % (key, value))
+        return result
+
     @classmethod
     def from_dict(klass, store, initial_value, maximum_size=0, key_width=1):
         """Create a CHKMap in store with initial_value as the content.
@@ -315,7 +342,7 @@
         if len(node_details) == 1:
             self._root_node = node_details[0][1]
         else:
-            self._root_node = InternalNode()
+            self._root_node = InternalNode(prefix)
             self._root_node.set_maximum_size(node_details[0][1].maximum_size)
             self._root_node._key_width = node_details[0][1]._key_width
             for split, node in node_details:
@@ -363,6 +390,14 @@
         # The pointers/values this node has - meaning defined by child classes.
         self._items = {}
 
+    def __repr__(self):
+        items_str = sorted(self._items)
+        if len(items_str) > 20:
+            items_str = items_str[16] + '...]'
+        return '%s(key:%s len:%s size:%s max:%s items:%s)' % (
+            self.__class__.__name__, self._key, self._len, self._size,
+            self._maximum_size, items_str)
+
     def key(self):
         return self._key
 
@@ -451,13 +486,18 @@
             for item in self._items.iteritems():
                 yield item
 
+    def _key_value_len(self, key, value):
+        # TODO: Should probably be done without actually joining the key, but
+        #       then that can be done via the C extension
+        return 2 + len('\x00'.join(key)) + len(value)
+
     def map(self, store, key, value):
         """Map key to value."""
         if key in self._items:
-            self._size -= 2 + len('\x00'.join(key)) + len(self._items[key])
+            self._size -= self._key_value_len(key, self._items[key])
             self._len -= 1
         self._items[key] = value
-        self._size += 2 + len('\x00'.join(key)) + len(value)
+        self._size += self._key_value_len(key, value)
         self._len += 1
         self._key = None
         if (self._maximum_size and self._current_size() > self._maximum_size and
@@ -541,12 +581,21 @@
     nodes. It is greedy - it will defer splitting itself as long as possible.
     """
 
-    def __init__(self):
+    def __init__(self, prefix=''):
         Node.__init__(self)
         # The size of an internalnode with default values and no children.
         # self._size = 12
         # How many octets key prefixes within this node are.
         self._node_width = 0
+        self._prefix = prefix
+
+    def __repr__(self):
+        items_str = sorted(self._items)
+        if len(items_str) > 20:
+            items_str = items_str[16] + '...]'
+        return '%s(key:%s len:%s size:%s max:%s prefix:%s items:%s)' % (
+            self.__class__.__name__, self._key, self._len, self._size,
+            self._maximum_size, self._prefix, items_str)
 
     def add_node(self, prefix, node):
         """Add a child node with prefix prefix, and node node.
@@ -554,9 +603,13 @@
         :param prefix: The serialised key prefix for node.
         :param node: The node being added.
         """
+        assert self._prefix is not None
+        assert prefix.startswith(self._prefix)
+        assert len(prefix) == len(self._prefix) + 1
         self._len += len(node)
         if not len(self._items):
             self._node_width = len(prefix)
+        assert self._node_width == len(self._prefix) + 1
         self._items[prefix] = node
         self._key = None
 
@@ -591,6 +644,7 @@
         result._key_width = width
         result._size = len(bytes)
         result._node_width = len(prefix)
+        result._prefix = result.unique_serialised_prefix()
         return result
 
     def iteritems(self, store, key_filter=None):
@@ -644,17 +698,35 @@
         """Map key to value."""
         if not len(self._items):
             raise AssertionError("cant map in an empty InternalNode.")
+        serialised_key = self._serialised_key(key)
+        assert self._node_width == len(self._prefix) + 1
+        if not serialised_key.startswith(self._prefix):
+            # This key doesn't fit in this index, so we need to split at the
+            # point where it would fit.
+            # XXX: Do we need the serialised_key in its maximum length?
+            new_prefix = self.unique_serialised_prefix(serialised_key)
+            new_parent = InternalNode(new_prefix)
+            new_parent.set_maximum_size(self._maximum_size)
+            new_parent._key_width = self._key_width
+            new_parent.add_node(self._prefix[:len(new_prefix)+1], self)
+            assert new_parent._node_width == len(new_parent._prefix) + 1
+            return new_parent.map(store, key, value)
         children = self._iter_nodes(store, key_filter=[key])
-        serialised_key = self._serialised_key(key)
         if children:
             child = children[0]
+            # if isinstance(child, InternalNode):
+            #     child_prefix = child._prefix
+            #     child_ser_key = child._serialised_key(key)
+            #     if not child_ser_key.startswith(child_prefix):
+            #         import pdb; pdb.set_trace()
         else:
             # new child needed:
             child = self._new_child(serialised_key, LeafNode)
         old_len = len(child)
         prefix, node_details = child.map(store, key, value)
         if len(node_details) == 1:
-            # child may have shrunk, or might be the same.
+            # child may have shrunk, or might be a new node
+            child = node_details[0][1]
             self._len = self._len - old_len + len(child)
             self._items[serialised_key] = child
             self._key = None
@@ -663,7 +735,9 @@
         # XXX: This is where we might want to try and expand our depth
         # to refer to more bytes of every child (which would give us
         # multiple pointers to child nodes, but less intermediate nodes)
+        # TODO: Is this mapped as serialised_key or as prefix?
         child = self._new_child(serialised_key, InternalNode)
+        child._prefix = prefix
         for split, node in node_details:
             child.add_node(split, node)
         self._len = self._len - old_len + len(child)
@@ -744,7 +818,7 @@
                 refs.append(value.key())
         return refs
 
-    def unique_serialised_prefix(self):
+    def unique_serialised_prefix(self, extra_key=None):
         """Return the unique key prefix for this node.
 
         :return: A bytestring of the longest serialised key prefix that is
@@ -753,6 +827,8 @@
         # may want to cache this eventually :- but wait for enough
         # functionality to profile.
         keys = list(self._items.keys())
+        if extra_key is not None:
+            keys.append(extra_key)
         if not keys:
             return ""
         current_prefix = keys.pop(-1)
@@ -799,3 +875,174 @@
         return InternalNode.deserialise(bytes, key)
     else:
         raise AssertionError("Unknown node type.")
+
+
+def _find_children_info(store, interesting_keys, uninteresting_keys,
+                        adapter, pb):
+    """Read the associated records, and determine what is interesting."""
+    uninteresting_keys = set(uninteresting_keys)
+    chks_to_read = uninteresting_keys.union(interesting_keys)
+    next_uninteresting = set()
+    next_interesting = set()
+    uninteresting_items = set()
+    interesting_items = set()
+    interesting_records = []
+    # records_read = set()
+    for record in store.get_record_stream(chks_to_read, 'unordered', True):
+        # records_read.add(record.key())
+        if pb is not None:
+            pb.tick()
+        if record.storage_kind != 'fulltext':
+            bytes = adapter.get_bytes(record,
+                        record.get_bytes_as(record.storage_kind))
+        else:
+            bytes = record.get_bytes_as('fulltext')
+        node = _deserialise(bytes, record.key)
+        if record.key in uninteresting_keys:
+            if isinstance(node, InternalNode):
+                next_uninteresting.update(node.refs())
+            else:
+                # We know we are at a LeafNode, so we can pass None for the
+                # store
+                uninteresting_items.update(node.iteritems(None))
+        else:
+            interesting_records.append(record)
+            if isinstance(node, InternalNode):
+                next_interesting.update(node.refs())
+            else:
+                interesting_items.update(node.iteritems(None))
+    # TODO: Filter out records that have already been read, as node splitting
+    #       can cause us to reference the same nodes via shorter and longer
+    #       paths
+    return (next_uninteresting, uninteresting_items,
+            next_interesting, interesting_records, interesting_items)
+
+
+def _find_all_uninteresting(store, interesting_root_keys,
+                            uninteresting_root_keys, adapter, pb):
+    """Determine the full set of uninteresting keys."""
+    # What about duplicates between interesting_root_keys and
+    # uninteresting_root_keys?
+    if not uninteresting_root_keys:
+        # Shortcut case. We know there is nothing uninteresting to filter out
+        # So we just let the rest of the algorithm do the work
+        # We know there is nothing uninteresting, and we didn't have to read
+        # any interesting records yet.
+        return (set(), set(), set(interesting_root_keys), [], set())
+    all_uninteresting_chks = set(uninteresting_root_keys)
+    all_uninteresting_items = set()
+
+    # First step, find the direct children of both the interesting and
+    # uninteresting set
+    (uninteresting_keys, uninteresting_items,
+     interesting_keys, interesting_records,
+     interesting_items) = _find_children_info(store, interesting_root_keys,
+                                              uninteresting_root_keys,
+                                              adapter=adapter, pb=pb)
+    all_uninteresting_chks.update(uninteresting_keys)
+    all_uninteresting_items.update(uninteresting_items)
+    del uninteresting_items
+    # Note: Exact matches between interesting and uninteresting do not need
+    #       to be search further. Non-exact matches need to be searched in case
+    #       there is a future exact-match
+    uninteresting_keys.difference_update(interesting_keys)
+
+    # Second, find the full set of uninteresting bits reachable by the
+    # uninteresting roots
+    chks_to_read = uninteresting_keys
+    while chks_to_read:
+        next_chks = set()
+        for record in store.get_record_stream(chks_to_read, 'unordered', False):
+            # TODO: Handle 'absent'
+            if pb is not None:
+                pb.tick()
+            if record.storage_kind != 'fulltext':
+                bytes = adapter.get_bytes(record,
+                            record.get_bytes_as(record.storage_kind))
+            else:
+                bytes = record.get_bytes_as('fulltext')
+            node = _deserialise(bytes, record.key)
+            if isinstance(node, InternalNode):
+                # uninteresting_prefix_chks.update(node._items.iteritems())
+                chks = node._items.values()
+                # TODO: We remove the entries that are already in
+                #       uninteresting_chks ?
+                next_chks.update(chks)
+                all_uninteresting_chks.update(chks)
+            else:
+                all_uninteresting_items.update(node._items.iteritems())
+        chks_to_read = next_chks
+    return (all_uninteresting_chks, all_uninteresting_items,
+            interesting_keys, interesting_records, interesting_items)
+
+
+def iter_interesting_nodes(store, interesting_root_keys,
+                           uninteresting_root_keys, pb=None):
+    """Given root keys, find interesting nodes.
+
+    Evaluate nodes referenced by interesting_root_keys. Ones that are also
+    referenced from uninteresting_root_keys are not considered interesting.
+
+    :param interesting_root_keys: keys which should be part of the
+        "interesting" nodes (which will be yielded)
+    :param uninteresting_root_keys: keys which should be filtered out of the
+        result set.
+    :return: Yield
+        (interesting records, interesting chk's, interesting key:values)
+    """
+    # TODO: consider that it may be more memory efficient to use the 20-byte
+    #       sha1 string, rather than tuples of hexidecimal sha1 strings.
+
+    # A way to adapt from the compressed texts back into fulltexts
+    # In a way, this seems like a layering inversion to have CHKMap know the
+    # details of versionedfile
+    adapter_class = versionedfile.adapter_registry.get(
+        ('knit-ft-gz', 'fulltext'))
+    adapter = adapter_class(store)
+
+    (all_uninteresting_chks, all_uninteresting_items, interesting_keys,
+     interesting_records, interesting_items) = _find_all_uninteresting(store,
+        interesting_root_keys, uninteresting_root_keys, adapter, pb)
+
+    # Now that we know everything uninteresting, we can yield information from
+    # our first request
+    interesting_items.difference_update(all_uninteresting_items)
+    records = dict((record.key, record) for record in interesting_records
+                    if record.key not in all_uninteresting_chks)
+    if records or interesting_items:
+        yield records, interesting_items
+    interesting_keys.difference_update(all_uninteresting_chks)
+
+    chks_to_read = interesting_keys
+    while chks_to_read:
+        next_chks = set()
+        for record in store.get_record_stream(chks_to_read, 'unordered', False):
+            if pb is not None:
+                pb.tick()
+            # TODO: Handle 'absent'?
+            if record.storage_kind != 'fulltext':
+                bytes = adapter.get_bytes(record,
+                            record.get_bytes_as(record.storage_kind))
+            else:
+                bytes = record.get_bytes_as('fulltext')
+            node = _deserialise(bytes, record.key)
+            if isinstance(node, InternalNode):
+                chks = set(node.refs())
+                chks.difference_update(all_uninteresting_chks)
+                # Is set() and .difference_update better than:
+                # chks = [chk for chk in node.refs()
+                #              if chk not in all_uninteresting_chks]
+                next_chks.update(chks)
+                # These are now uninteresting everywhere else
+                all_uninteresting_chks.update(chks)
+                interesting_items = []
+            else:
+                interesting_items = [item for item in node._items.iteritems()
+                                     if item not in all_uninteresting_items]
+                # TODO: Do we need to filter out items that we have already
+                #       seen on other pages? We don't really want to buffer the
+                #       whole thing, but it does mean that callers need to
+                #       understand they may get duplicate values.
+                # all_uninteresting_items.update(interesting_items)
+            yield {record.key: record}, interesting_items
+        chks_to_read = next_chks

=== modified file 'bzrlib/fetch.py'
--- a/bzrlib/fetch.py	2008-11-11 04:28:43 +0000
+++ b/bzrlib/fetch.py	2008-11-20 20:34:37 +0000
@@ -241,9 +241,12 @@
             # know for unselected inventories whether all their required
             # texts are present in the other repository - it could be
             # corrupt.
-            if (self.from_repository._format.supports_chks or
-                self.to_repository._format.supports_chks):
-                # Hack to make chk->chk fetch: copy the inventories as
+            if (self.from_repository._format.supports_chks and
+                self.to_repository._format.supports_chks):
+                self._fetch_chk_inventories(revs, pb)
+            elif (self.from_repository._format.supports_chks or
+                self.to_repository._format.supports_chks):
+                # Hack to make not-chk->chk fetch: copy the inventories as
                 # inventories.
                 total = len(revs)
                 for pos, inv in enumerate(
@@ -283,6 +286,67 @@
         # revision stream, when you weren't ever supposed to have deltas.
         # So we now *force* fulltext copying for signatures and revisions
 
+    def _fetch_chk_inventories(self, revs, pb):
+        """Fetch the inventory texts, along with the associated chk maps."""
+        from bzrlib import inventory, chk_map
+        # We want an inventory outside of the search set, so that we can filter
+        # out uninteresting chk pages. For now we use
+        # _find_revision_outside_set, but if we had a Search with cut_revs, we
+        # could use that instead.
+        start_rev_id = self.from_repository._find_revision_outside_set(revs)
+        start_rev_key = (start_rev_id,)
+        inv_keys_to_fetch = [(rev_id,) for rev_id in revs]
+        if start_rev_id != NULL_REVISION:
+            inv_keys_to_fetch.append((start_rev_id,))
+        # Any repo that supports chk_bytes must also support out-of-order
+        # insertion. At least, that is how we expect it to work
+        # We use get_record_stream instead of iter_inventories because we want
+        # to be able to insert the stream as well. We could instead fetch
+        # allowing deltas, and then iter_inventories, but we don't know whether
+        # source or target is more 'local' anway.
+        inv_stream = self.from_repository.inventories.get_record_stream(
+            inv_keys_to_fetch, 'unordered',
+            True) # We need them as full-texts so we can find their references
+        uninteresting_chk_roots = set()
+        interesting_chk_roots = set()
+        for record in inv_stream:
+            bytes = record.get_bytes_as('fulltext')
+            chk_inv = inventory.CHKInventory.deserialise(
+                self.from_repository.chk_bytes, bytes, record.key)
+            if record.key == start_rev_key:
+                uninteresting_chk_roots.add(chk_inv.id_to_entry.key())
+                p_id_map = chk_inv.parent_id_basename_to_file_id
+                if p_id_map is not None:
+                    uninteresting_chk_roots.add(p_id_map.key())
+            else:
+                self.to_repository.inventories.insert_record_stream([record])
+                interesting_chk_roots.add(chk_inv.id_to_entry.key())
+                p_id_map = chk_inv.parent_id_basename_to_file_id
+                if p_id_map is not None:
+                    interesting_chk_roots.add(p_id_map.key())
+        # Now that we have worked out all of the interesting root nodes, grab
+        # all of the interesting pages and insert them
+        interesting = chk_map.iter_interesting_nodes(
+            self.from_repository.chk_bytes, interesting_chk_roots,
+            uninteresting_chk_roots, pb=pb)
+        def to_stream_adapter():
+            """Adapt the iter_interesting_nodes result to a single stream.
+
+            iter_interesting_nodes returns records as it processes them, which
+            can be in batches. But we only want a single stream to be inserted.
+            """
+            for record, items in interesting:
+                for value in record.itervalues():
+                    yield value
+        # XXX: We could instead call get_record_stream(records.keys())
+        #      ATM, this will always insert the records as fulltexts, and
+        #      requires that you can hang on to records once you have gone
+        #      on to the next one. Further, it causes the target to
+        #      recompress the data. Testing shows it to be faster than
+        #      requesting the records again, though.
+        self.to_repository.chk_bytes.insert_record_stream(
+            to_stream_adapter())
+
     def _generate_root_texts(self, revs):
         """This will be called by __fetch between fetching weave texts and
         fetching the inventory weave.

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2008-11-14 01:28:40 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2008-11-20 19:41:26 +0000
@@ -2182,59 +2182,32 @@
         rich_root = self.supports_rich_root()
         revision_outside_set = self._find_revision_outside_set(revision_ids)
         if revision_outside_set == _mod_revision.NULL_REVISION:
-            uninteresting_chk_refs = set()
+            uninteresting_root_keys = set()
         else:
             uninteresting_inv = self.get_inventory(revision_outside_set)
-            uninteresting_map = uninteresting_inv.id_to_entry
-            uninteresting_map._ensure_root()
-            uninteresting_chk_refs = set(uninteresting_map._root_node.refs())
-            pending_nodes = set([uninteresting_map._root_node])
-            while pending_nodes:
-                nodes = pending_nodes
-                pending_nodes = set()
-                for node in nodes:
-                    uninteresting_chk_refs.add(node.key())
-                    if not isinstance(node, chk_map.InternalNode):
-                        continue
-                    for subnode in node._iter_nodes(uninteresting_map._store):
-                        pending_nodes.add(subnode)
-        # XXX: This currently only avoids traversing top level unchanged trees.
-        # What we need is parallel traversal with uninteresting-only trees
-        # pruned, interesting-only trees included, common trees with identical
-        # pointers pruned, common trees with different pointers examined
-        # further.
+            uninteresting_root_keys = set([uninteresting_inv.id_to_entry.key()])
+        interesting_root_keys = set()
         for idx, inv in enumerate(self.iter_inventories(revision_ids)):
-            if pb is not None:
-                pb.update('fetch', idx, len(revision_ids))
-            inv_chk_map = inv.id_to_entry
-            inv_chk_map._ensure_root()
-            pending_nodes = set([inv_chk_map._root_node])
-            while pending_nodes:
-                nodes = pending_nodes
-                pending_nodes = set()
-                for node in nodes:
-                    # Do not examine this node again
-                    uninteresting_chk_refs.add(node.key())
-                    if not isinstance(node, chk_map.InternalNode):
-                        # Leaf node: pull out its contents:
-                        for name, bytes in node.iteritems(inv_chk_map._store):
-                            entry = inv._bytes_to_entry(bytes)
-                            if entry.name == '' and not rich_root:
-                                continue
-                            if entry.revision == inv.revision_id:
-                                yield ("file", entry.file_id, [entry.revision])
-                        continue
-                    # Recurse deeper
-                    # Two-pass; api fixup needed to allow exclusion
-                    wanted_keys = set()
-                    for key, value in node._items.iteritems():
-                        if key in uninteresting_chk_refs:
-                            continue
-                        wanted_keys.add((key,))
-                    for subnode in node._iter_nodes(inv_chk_map._store,
-                        key_filter=wanted_keys):
-                        pending_nodes.add(subnode)
-        
+            interesting_root_keys.add(inv.id_to_entry.key())
+        revision_ids = frozenset(revision_ids)
+        file_id_revisions = {}
+        for records, items in chk_map.iter_interesting_nodes(self.chk_bytes,
+                    interesting_root_keys, uninteresting_root_keys,
+                    pb=pb):
+            # This is cheating a bit to use the last grabbed 'inv', but it
+            # works
+            for name, bytes in items:
+                entry = inv._bytes_to_entry(bytes)
+                if entry.name == '' and not rich_root:
+                    continue
+                if entry.revision in revision_ids:
+                    # Would we rather build this up into file_id => revision
+                    # maps?
+                    s = file_id_revisions.setdefault(entry.file_id, set())
+                    s.add(entry.revision)
+        for file_id, revisions in file_id_revisions.iteritems():
+            yield ('file', file_id, revisions)
+
     def fileids_altered_by_revision_ids(self, revision_ids, _inv_weave=None):
         """Find the file ids and versions affected by revisions.
 

=== modified file 'bzrlib/tests/test_chk_map.py'
--- a/bzrlib/tests/test_chk_map.py	2008-11-14 03:29:57 +0000
+++ b/bzrlib/tests/test_chk_map.py	2008-12-01 04:36:25 +0000
@@ -16,6 +16,9 @@
 
 """Tests for maps built on a CHK versionedfiles facility."""
 
+from itertools import izip
+
+from bzrlib import chk_map
 from bzrlib.chk_map import (
     CHKMap,
     InternalNode,
@@ -109,6 +112,146 @@
         # updated key.
         self.assertEqual(new_root, chkmap._root_node._key)
 
+    def test_apply_delta_is_deterministic(self):
+        chk_bytes = self.get_chk_bytes()
+        chkmap1 = CHKMap(chk_bytes, None)
+        chkmap1._root_node.set_maximum_size(10)
+        chkmap1.apply_delta([(None, ('aaa',), 'common'),
+                             (None, ('bba',), 'target2'),
+                             (None, ('bbb',), 'common')])
+        root_key1 = chkmap1._save()
+        chkmap2 = CHKMap(chk_bytes, None)
+        chkmap2._root_node.set_maximum_size(10)
+        chkmap2.apply_delta([(None, ('bbb',), 'common'),
+                             (None, ('bba',), 'target2'),
+                             (None, ('aaa',), 'common')])
+        root_key2 = chkmap2._save()
+        self.assertEqualDiff(chkmap1._dump_tree(), chkmap2._dump_tree())
+        self.assertEqual(root_key1, root_key2)
+
+    def test_stable_splitting(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(30)
+        chkmap.map(('aaa',), 'v')
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'",
+                             chkmap._dump_tree())
+        chkmap.map(('aab',), 'v')
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aab',) 'v'",
+                             chkmap._dump_tree())
+        # Creates a new internal node, and splits the others into leaves
+        chkmap.map(('aac',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "  'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n"
+                             "  'aac' LeafNode None\n"
+                             "      ('aac',) 'v'",
+                             chkmap._dump_tree())
+        # Splits again, because it can't fit in the current structure
+        chkmap.map(('bbb',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'a' InternalNode None\n"
+                             "    'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "    'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n"
+                             "    'aac' LeafNode None\n"
+                             "      ('aac',) 'v'\n"
+                             "  'b' LeafNode None\n"
+                             "      ('bbb',) 'v'",
+                             chkmap._dump_tree())
+
+    def test_deep_splitting(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(40)
+        chkmap.map(('aaaaaaaa',), 'v')
+        chkmap.map(('aaaaabaa',), 'v')
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaaaaaaa',) 'v'\n"
+                             "      ('aaaaabaa',) 'v'",
+                             chkmap._dump_tree())
+        chkmap.map(('aaabaaaa',), 'v')
+        chkmap.map(('aaababaa',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaaa' LeafNode None\n"
+                             "      ('aaaaaaaa',) 'v'\n"
+                             "      ('aaaaabaa',) 'v'\n"
+                             "  'aaab' LeafNode None\n"
+                             "      ('aaabaaaa',) 'v'\n"
+                             "      ('aaababaa',) 'v'",
+                             chkmap._dump_tree())
+        chkmap.map(('aaabacaa',), 'v')
+        chkmap.map(('aaabadaa',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaaa' LeafNode None\n"
+                             "      ('aaaaaaaa',) 'v'\n"
+                             "      ('aaaaabaa',) 'v'\n"
+                             "  'aaab' InternalNode None\n"
+                             "    'aaabaa' LeafNode None\n"
+                             "      ('aaabaaaa',) 'v'\n"
+                             "    'aaabab' LeafNode None\n"
+                             "      ('aaababaa',) 'v'\n"
+                             "    'aaabac' LeafNode None\n"
+                             "      ('aaabacaa',) 'v'\n"
+                             "    'aaabad' LeafNode None\n"
+                             "      ('aaabadaa',) 'v'",
+                             chkmap._dump_tree())
+        chkmap.map(('aaababba',), 'v')
+        chkmap.map(('aaababca',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaaa' LeafNode None\n"
+                             "      ('aaaaaaaa',) 'v'\n"
+                             "      ('aaaaabaa',) 'v'\n"
+                             "  'aaab' InternalNode None\n"
+                             "    'aaabaa' LeafNode None\n"
+                             "      ('aaabaaaa',) 'v'\n"
+                             "    'aaabab' InternalNode None\n"
+                             "      'aaababa' LeafNode None\n"
+                             "      ('aaababaa',) 'v'\n"
+                             "      'aaababb' LeafNode None\n"
+                             "      ('aaababba',) 'v'\n"
+                             "      'aaababc' LeafNode None\n"
+                             "      ('aaababca',) 'v'\n"
+                             "    'aaabac' LeafNode None\n"
+                             "      ('aaabacaa',) 'v'\n"
+                             "    'aaabad' LeafNode None\n"
+                             "      ('aaabadaa',) 'v'",
+                             chkmap._dump_tree())
+        # Now we add a node that should fit around an existing InternalNode,
+        # but has a slightly different key prefix, which causes a new
+        # InternalNode split
+        chkmap.map(('aaabDaaa',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaaa' LeafNode None\n"
+                             "      ('aaaaaaaa',) 'v'\n"
+                             "      ('aaaaabaa',) 'v'\n"
+                             "  'aaab' InternalNode None\n"
+                             "    'aaabD' LeafNode None\n"
+                             "      ('aaabDaaa',) 'v'\n"
+                             "    'aaaba' InternalNode None\n"
+                             "      'aaabaa' LeafNode None\n"
+                             "      ('aaabaaaa',) 'v'\n"
+                             "      'aaabab' InternalNode None\n"
+                             "        'aaababa' LeafNode None\n"
+                             "      ('aaababaa',) 'v'\n"
+                             "        'aaababb' LeafNode None\n"
+                             "      ('aaababba',) 'v'\n"
+                             "        'aaababc' LeafNode None\n"
+                             "      ('aaababca',) 'v'\n"
+                             "      'aaabac' LeafNode None\n"
+                             "      ('aaabacaa',) 'v'\n"
+                             "      'aaabad' LeafNode None\n"
+                             "      ('aaabadaa',) 'v'",
+                             chkmap._dump_tree())
+
     def test_iter_changes_empty_ab(self):
         # Asking for changes between an empty dict to a dict with keys returns
         # all the keys.
@@ -350,6 +493,38 @@
         leaf_node = LeafNode()
         self.assertEqual([key], leaf_node.serialise(chkmap._store))
 
+    def test__dump_tree(self):
+        chkmap = self._get_map({("aaa",): "value1", ("aab",): "value2",
+                                ("bbb",): "value3",},
+                               maximum_size=10)
+        self.assertEqualDiff('\n'.join([
+            "'' InternalNode sha1:cd9b68f18c9754a79065b06379fba543f9031742",
+            "  'a' InternalNode sha1:ed0ceb5aeb87c56df007a17997134328ff4d0b8d",
+            "    'aaa' LeafNode sha1:16fa5a38b80d29b529afc45f7a4f894650fc067f",
+            "      ('aaa',) 'value1'",
+            "    'aab' LeafNode sha1:8fca5400dc99ef1b464e60ca25da53b57406ed38",
+            "      ('aab',) 'value2'",
+            "  'b' LeafNode sha1:67f15d1dfa451d388ed08ff17b4f9578ba010d01",
+            "      ('bbb',) 'value3'",
+            ]), chkmap._dump_tree())
+
+    def test__dump_tree_in_progress(self):
+        chkmap = self._get_map({("aaa",): "value1", ("aab",): "value2"},
+                               maximum_size=10)
+        chkmap.map(('bbb',), 'value3')
+        # XXX: Note that this representation is different than the one for
+        #      test__dump_tree, even though they have the same values
+        self.assertEqualDiff('\n'.join([
+            "'' InternalNode None",
+            "  'a' InternalNode sha1:ed0ceb5aeb87c56df007a17997134328ff4d0b8d",
+            "    'aaa' LeafNode sha1:16fa5a38b80d29b529afc45f7a4f894650fc067f",
+            "      ('aaa',) 'value1'",
+            "    'aab' LeafNode sha1:8fca5400dc99ef1b464e60ca25da53b57406ed38",
+            "      ('aab',) 'value2'",
+            "  'b' LeafNode None",
+            "      ('bbb',) 'value3'",
+            ]), chkmap._dump_tree())
+
 
 class TestLeafNode(TestCaseWithStore):
 
@@ -522,7 +697,7 @@
 class TestInternalNode(TestCaseWithStore):
 
     def test_add_node_empty_new(self):
-        node = InternalNode()
+        node = InternalNode('fo')
         child = LeafNode()
         child.set_maximum_size(100)
         child.map(None, ("foo",), "bar")
@@ -550,7 +725,7 @@
         self.assertEqual(3, node._node_width)
 
     def test_add_node_resets_key_new(self):
-        node = InternalNode()
+        node = InternalNode('fo')
         child = LeafNode()
         child.set_maximum_size(100)
         child.map(None, ("foo",), "bar")
@@ -756,3 +931,131 @@
 # BA
 # AB - split, but we want to end up with AB, BA, in one node, with 
 # 1-4K get0
+
+
+class TestIterInterestingNodes(TestCaseWithStore):
+
+    def get_chk_bytes(self):
+        if getattr(self, '_chk_bytes', None) is None:
+            self._chk_bytes = super(TestIterInterestingNodes,
+                                    self).get_chk_bytes()
+        return self._chk_bytes
+
+    def get_map_key(self, a_dict):
+        c_map = self._get_map(a_dict, maximum_size=10,
+                              chk_bytes=self.get_chk_bytes())
+        return c_map.key()
+
+    def assertIterInteresting(self, expected, interesting_keys,
+                              uninteresting_keys):
+        """Check the result of iter_interesting_nodes.
+
+        :param expected: A list of (record_keys, interesting_chk_pages,
+                                    interesting key value pairs)
+        """
+        store = self.get_chk_bytes()
+        iter_nodes = chk_map.iter_interesting_nodes(store, interesting_keys,
+                                                    uninteresting_keys)
+        for count, (exp, act) in enumerate(izip(expected, iter_nodes)):
+            exp_record_keys, exp_items = exp
+            records, items = act
+            exp_tuple = (sorted(exp_record_keys), sorted(exp_items))
+            act_tuple = (sorted(records.keys()), sorted(items))
+            self.assertEqual(exp_tuple, act_tuple)
+        self.assertEqual(len(expected), count + 1)
+
+    def test_empty_to_one_keys(self):
+        target = self.get_map_key({('a',): 'content'})
+        self.assertIterInteresting(
+            [([target], [(('a',), 'content')])],
+            [target], [])
+
+    def test_none_to_one_key(self):
+        basis = self.get_map_key({})
+        target = self.get_map_key({('a',): 'content'})
+        self.assertIterInteresting(
+            [([target], [(('a',), 'content')])],
+            [target], [basis])
+
+    def test_one_to_none_key(self):
+        basis = self.get_map_key({('a',): 'content'})
+        target = self.get_map_key({})
+        self.assertIterInteresting(
+            [([target], [])],
+            [target], [basis])
+
+    def test_common_pages(self):
+        basis = self.get_map_key({('a',): 'content',
+                                  ('b',): 'content',
+                                  ('c',): 'content',
+                                 })
+        target = self.get_map_key({('a',): 'content',
+                                   ('b',): 'other content',
+                                   ('c',): 'content',
+                                  })
+        # Is there a way to get this more directly?
+        b_key = ('sha1:1d7a45ded01ab77c069350c0e290ae34db5b549b',)
+        # This should return the root node, and the node for the 'b' key
+        self.assertIterInteresting(
+            [([target], []),
+             ([b_key], [(('b',), 'other content')])],
+            [target], [basis])
+
+    def test_common_sub_page(self):
+        basis = self.get_map_key({('aaa',): 'common',
+                                  ('c',): 'common',
+                                 })
+        target = self.get_map_key({('aaa',): 'common',
+                                   ('aab',): 'new',
+                                   ('c',): 'common',
+                                  })
+        self.assertEqualDiff(
+            "'' InternalNode sha1:f88b38806015efe27013260d7402219b7b4d4332\n"
+            "  'a' InternalNode sha1:2ce01860338a614b93883a5bbeb89920137ac7ef\n"
+            "    'aaa' LeafNode sha1:0b38f800c49ff9ffae346ca6f7e80a4626a5eaca\n"
+            "      ('aaa',) 'common'\n"
+            "    'aab' LeafNode sha1:10567a3bfcc764fb8d8d9edaa28c0934ada366c5\n"
+            "      ('aab',) 'new'\n"
+            "  'c' LeafNode sha1:263208de2fce0a8f9db614c1ca39e8f6de8b3802\n"
+            "      ('c',) 'common'",
+            CHKMap(self.get_chk_bytes(), target)._dump_tree())
+        # The key for the internal aa node
+        aa_key = ('sha1:2ce01860338a614b93883a5bbeb89920137ac7ef',)
+        # The key for the leaf aab node
+        aab_key = ('sha1:10567a3bfcc764fb8d8d9edaa28c0934ada366c5',)
+        self.assertIterInteresting(
+            [([target], []),
+             ([aa_key], []),
+             ([aab_key], [(('aab',), 'new')])],
+            [target], [basis])
+
+    def test_multiple_maps(self):
+        basis1 = self.get_map_key({('aaa',): 'common',
+                                   ('aab',): 'basis1',
+                                  })
+        basis2 = self.get_map_key({('bbb',): 'common',
+                                   ('bbc',): 'basis2',
+                                  })
+        target1 = self.get_map_key({('aaa',): 'common',
+                                    ('aac',): 'target1',
+                                    ('bbb',): 'common',
+                                   })
+        target2 = self.get_map_key({('aaa',): 'common',
+                                    ('bba',): 'target2',
+                                    ('bbb',): 'common',
+                                   })
+        # The key for the target1 internal aa node
+        aa_key = ('sha1:4c6b1e3e6ecb68fe039d2b00c9091bc037ebf203',)
+        # The key for the leaf aac node
+        aac_key = ('sha1:8089f6b4f3bd2a058c41be199ef5af0c5b9a0c4f',)
+        # The key for the target2 internal bb node
+        bb_key = ('sha1:bcc229e6bd1d606ef4630073dc15756e60508365',)
+        # The key for the leaf bba node
+        bba_key = ('sha1:5ce6a69a21060222bb0a5b48fdbfcca586cc9183',)
+        self.assertIterInteresting(
+            [([target1, target2], []),
+             ([aa_key], []),
+             ([bb_key], []),
+             ([aac_key], [(('aac',), 'target1')]),
+             ([bba_key], [(('bba',), 'target2')]),
+            ], [target1, target2], [basis1, basis2])




More information about the bazaar-commits mailing list