Rev 3796: Merge in the CHKMap._check_remap() code. in http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core

John Arbash Meinel john at arbash-meinel.com
Thu Dec 4 17:02:28 GMT 2008


At http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core

------------------------------------------------------------
revno: 3796
revision-id: john at arbash-meinel.com-20081204170109-3g9j0j762y594bgg
parent: jelmer at samba.org-20081203231026-rt42wz5iiq4u32ht
parent: john at arbash-meinel.com-20081203035643-0x4npc4s8mh8nqlh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Thu 2008-12-04 11:01:09 -0600
message:
  Merge in the CHKMap._check_remap() code.
  
  Ensures canonical form even when nodes shrink.
modified:
  bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.14
    revision-id: john at arbash-meinel.com-20081203035643-0x4npc4s8mh8nqlh
    parent: john at arbash-meinel.com-20081202235625-h5fo44xy2hxeopo2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 21:56:43 -0600
    message:
      obsolete the .cix index along with the rest.
    modified:
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3791.1.13
    revision-id: john at arbash-meinel.com-20081202235625-h5fo44xy2hxeopo2
    parent: john at arbash-meinel.com-20081202234745-80n34m4wg4gcgx8i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 17:56:25 -0600
    message:
      Refactor the LeafNode.map() code so we can do _check_remap more cheaply.
      
      LeafNode.map() was actually doing all the work to split into many leaf nodes, but
      all we care about is *if* we would have to do that.
      Also, we don't have to do as many safety checks because we know that all the child
      keys that we would be adding cannot already be present.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3791.1.12
    revision-id: john at arbash-meinel.com-20081202234745-80n34m4wg4gcgx8i
    parent: john at arbash-meinel.com-20081202234434-b2ahyaq5vqg6jylz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 17:47:45 -0600
    message:
      Add a TODO describing a possible optimization.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3791.1.11
    revision-id: john at arbash-meinel.com-20081202234434-b2ahyaq5vqg6jylz
    parent: john at arbash-meinel.com-20081202220545-sflppo9zeho3z3j3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 17:44:34 -0600
    message:
      Add logic to map() so that it can also collapse when necessary.
      
      It would be nice if check_remap could use size information to determine this
      without actually having to build up a LeafNode, but that will depend on the
      serializer in the end. Profiling will show if it ever really matters.
      
      One other possibility would be to limit the number of new nodes we pull in at
      a time. If we are going to have to load 50 LeafNodes then we are rather positive
      that it won't fit into a new LeafNode.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.10
    revision-id: john at arbash-meinel.com-20081202220545-sflppo9zeho3z3j3
    parent: john at arbash-meinel.com-20081202205054-ajsix3e8w8jgnnod
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 16:05:45 -0600
    message:
      Change how _check_remap works so it doesn't have to load all keys.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.9
    revision-id: john at arbash-meinel.com-20081202205054-ajsix3e8w8jgnnod
    parent: john at arbash-meinel.com-20081202204350-i7r9uithhqv1aieq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 14:50:54 -0600
    message:
      Switch _dump_tree to returning trailing '\n' for nicer results
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.8
    revision-id: john at arbash-meinel.com-20081202204350-i7r9uithhqv1aieq
    parent: john at arbash-meinel.com-20081202204314-uwfpn7zplj4muwm7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 14:43:50 -0600
    message:
      more comment
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3791.1.7
    revision-id: john at arbash-meinel.com-20081202204314-uwfpn7zplj4muwm7
    parent: john at arbash-meinel.com-20081202190056-1d7abe90mtf0kkag
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 14:43:14 -0600
    message:
      Turns out that the LeafNode split code needs to be aware of having a key
      that will fit in the '\x00' location.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
    ------------------------------------------------------------
    revno: 3791.1.6
    revision-id: john at arbash-meinel.com-20081202190056-1d7abe90mtf0kkag
    parent: john at arbash-meinel.com-20081202185728-tt9nrap4t9fx97p2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 13:00:56 -0600
    message:
      It seems that map()'s split code fails when the new key is simply longer.
      
      Add a failing test so I don't forget to come back and fix it.
    modified:
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.5
    revision-id: john at arbash-meinel.com-20081202185728-tt9nrap4t9fx97p2
    parent: john at arbash-meinel.com-20081202184452-830u3t32hjagsk1y
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 12:57:28 -0600
    message:
      Ensure that unmap() work even when the LeafNode isn't empty.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.4
    revision-id: john at arbash-meinel.com-20081202184452-830u3t32hjagsk1y
    parent: john at arbash-meinel.com-20081202183902-pzwr8vxzdubumrjy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 12:44:52 -0600
    message:
      Add a test that unmap() properly chains back up multiple levels.
    modified:
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.3
    revision-id: john at arbash-meinel.com-20081202183902-pzwr8vxzdubumrjy
    parent: john at arbash-meinel.com-20081202163706-g5ibq2cot6x31l5d
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 12:39:02 -0600
    message:
      At the end of unmap() see if children can be packed into a single Leaf.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.2
    revision-id: john at arbash-meinel.com-20081202163706-g5ibq2cot6x31l5d
    parent: john at arbash-meinel.com-20081202161210-0ffdlfr9c0t0sowv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 10:37:06 -0600
    message:
      Add some CHKMap test helpers to make sure that maps end up in 'canonical' form.
    modified:
      bzrlib/tests/test_chk_map.py   test_chk_map.py-20081001014447-ue6kkuhofvdecvxa-2
    ------------------------------------------------------------
    revno: 3791.1.1
    revision-id: john at arbash-meinel.com-20081202161210-0ffdlfr9c0t0sowv
    parent: robertc at robertcollins.net-20081201043625-77tcktr1dfxc7w5l
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 10:12:10 -0600
    message:
      Clean up some trailing whitespace.
    modified:
      bzrlib/chk_map.py              chk_map.py-20081001014447-ue6kkuhofvdecvxa-1
-------------- next part --------------
=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py	2008-11-20 20:34:37 +0000
+++ b/bzrlib/chk_map.py	2008-12-02 23:56:25 +0000
@@ -22,7 +22,7 @@
 with internal nodes of 8-bit fan out; The key tuples are mapped to strings by
 joining them by \x00, and \x00 padding shorter keys out to the length of the
 longest key. Leaf nodes are packed as densely as possible, and internal nodes
-are all and additional 8-bits wide leading to a sparse upper tree. 
+are all and additional 8-bits wide leading to a sparse upper tree.
 
 Updates to a CHKMap are done preferentially via the apply_delta method, to
 allow optimisation of the update operation; but individual map/unmap calls are
@@ -90,7 +90,7 @@
         Node that this does not update the _items dict in objects containing a
         reference to this node. As such it does not prevent subsequent IO being
         performed.
-        
+
         :param node: A tuple key or node object.
         :return: A node object.
         """
@@ -108,6 +108,7 @@
         """Return the tree in a string representation."""
         self._ensure_root()
         res = self._dump_tree_node(self._root_node, prefix='', indent='')
+        res.append('') # Give a trailing '\n'
         return '\n'.join(res)
 
     def _dump_tree_node(self, node, prefix, indent):
@@ -131,7 +132,7 @@
     @classmethod
     def from_dict(klass, store, initial_value, maximum_size=0, key_width=1):
         """Create a CHKMap in store with initial_value as the content.
-        
+
         :param store: The store to record initial_value in, a VersionedFiles
             object with 1-tuple keys supporting CHK key generation.
         :param initial_value: A dict to store in store. Its keys and values
@@ -176,7 +177,7 @@
         #    after reading the 'a' subtree, but it is encountered in the first
         #    tree immediately. Variations on this may have read internal nodes like this.
         #    we want to cut the entire pending subtree when we realise we have a common node.
-        #    For this we use a list of keys - the path to a node - and check the entire path is 
+        #    For this we use a list of keys - the path to a node - and check the entire path is
         #    clean as we process each item.
         if self._node_key(self._root_node) == self._node_key(basis._root_node):
             return
@@ -210,7 +211,7 @@
             # Note that this is N^2, it depends on us trimming trees
             # aggressively to not become slow.
             # A better implementation would probably have a reverse map
-            # back to the children of a node, and jump straight to it when 
+            # back to the children of a node, and jump straight to it when
             # a common node is detected, the proceed to remove the already
             # pending children. bzrlib.graph has a searcher module with a
             # similar problem.
@@ -358,7 +359,8 @@
     def unmap(self, key):
         """remove key from the map."""
         self._ensure_root()
-        self._root_node.unmap(self._store, key)
+        unmapped = self._root_node.unmap(self._store, key)
+        self._root_node = unmapped
 
     def _save(self):
         """Save the map completely.
@@ -420,7 +422,7 @@
 
 class LeafNode(Node):
     """A node containing actual key:value pairs.
-    
+
     :ivar _items: A dict of key->value items. The key is in tuple form.
     """
 
@@ -491,23 +493,46 @@
         #       then that can be done via the C extension
         return 2 + len('\x00'.join(key)) + len(value)
 
+    def _map_no_split(self, key, value):
+        """Map a key to a value.
+
+        This assumes either the key does not already exist, or you have already
+        removed its size and length from self.
+
+        :return: True if adding this node should cause us to split.
+        """
+        self._items[key] = value
+        self._size += self._key_value_len(key, value)
+        self._len += 1
+        if (self._len > 1
+            and self._maximum_size
+            and self._current_size() > self._maximum_size):
+            return True
+        return False
+
     def map(self, store, key, value):
         """Map key to value."""
         if key in self._items:
             self._size -= self._key_value_len(key, self._items[key])
             self._len -= 1
-        self._items[key] = value
-        self._size += self._key_value_len(key, value)
-        self._len += 1
         self._key = None
-        if (self._maximum_size and self._current_size() > self._maximum_size and
-            self._len > 1):
+        if self._map_no_split(key, value):
             common_prefix = self.unique_serialised_prefix()
             split_at = len(common_prefix) + 1
             result = {}
             for key, value in self._items.iteritems():
                 serialised_key = self._serialised_key(key)
                 prefix = serialised_key[:split_at]
+                # TODO: Generally only 1 key can be exactly the right length,
+                #       which means we can only have 1 key in the node pointed
+                #       at by the 'prefix\0' key. We might want to consider
+                #       folding it into the containing InternalNode rather than
+                #       having a fixed length-1 node.
+                #       Note this is probably not true for hash keys, as they
+                #       may get a '\00' node anywhere, but won't have keys of
+                #       different lengths.
+                if len(prefix) < split_at:
+                    prefix += '\x00'*(split_at - len(prefix))
                 if prefix not in result:
                     node = LeafNode()
                     node.set_maximum_size(self._maximum_size)
@@ -576,7 +601,7 @@
 
 class InternalNode(Node):
     """A node that contains references to other nodes.
-    
+
     An InternalNode is responsible for mapping serialised key prefixes to child
     nodes. It is greedy - it will defer splitting itself as long as possible.
     """
@@ -723,6 +748,10 @@
             # new child needed:
             child = self._new_child(serialised_key, LeafNode)
         old_len = len(child)
+        if isinstance(child, LeafNode):
+            old_size = child._current_size()
+        else:
+            old_size = None
         prefix, node_details = child.map(store, key, value)
         if len(node_details) == 1:
             # child may have shrunk, or might be a new node
@@ -730,7 +759,15 @@
             self._len = self._len - old_len + len(child)
             self._items[serialised_key] = child
             self._key = None
-            return self.unique_serialised_prefix(), [("", self)]
+            new_node = self
+            if (isinstance(child, LeafNode)
+                and (old_size is None or child._current_size() < old_size)):
+                # The old node was an InternalNode which means it has now
+                # collapsed, so we need to check if it will chain to a collapse
+                # at this level. Or the LeafNode has shrunk in size, so we need
+                # to check that as well.
+                new_node = self._check_remap(store)
+            return new_node.unique_serialised_prefix(), [("", new_node)]
         # child has overflown - create a new intermediate node.
         # XXX: This is where we might want to try and expand our depth
         # to refer to more bytes of every child (which would give us
@@ -845,26 +882,107 @@
         """Remove key from this node and it's children."""
         if not len(self._items):
             raise AssertionError("cant unmap in an empty InternalNode.")
-        serialised_key = self._serialised_key(key)
         children = self._iter_nodes(store, key_filter=[key])
-        serialised_key = self._serialised_key(key)
         if children:
             child = children[0]
         else:
             raise KeyError(key)
         self._len -= 1
         unmapped = child.unmap(store, key)
+        self._key = None
+        serialised_key = self._serialised_key(key)
         if len(unmapped) == 0:
             # All child nodes are gone, remove the child:
             del self._items[serialised_key]
+            unmapped = None
         else:
             # Stash the returned node
             self._items[serialised_key] = unmapped
         if len(self._items) == 1:
             # this node is no longer needed:
             return self._items.values()[0]
-        self._key = None
-        return self
+        if isinstance(unmapped, InternalNode):
+            return self
+        return self._check_remap(store)
+
+    def _check_remap(self, store):
+        """Check if all keys contained by children fit in a single LeafNode.
+
+        :param store: A store to use for reading more nodes
+        :return: Either self, or a new LeafNode which should replace self.
+        """
+        # Logic for how we determine when we need to rebuild
+        # 1) Implicitly unmap() is removing a key which means that the child
+        #    nodes are going to be shrinking by some extent.
+        # 2) If all children are LeafNodes, it is possible that they could be
+        #    combined into a single LeafNode, which can then completely replace
+        #    this internal node with a single LeafNode
+        # 3) If *one* child is an InternalNode, we assume it has already done
+        #    all the work to determine that its children cannot collapse, and
+        #    we can then assume that those nodes *plus* the current nodes don't
+        #    have a chance of collapsing either.
+        #    So a very cheap check is to just say if 'unmapped' is an
+        #    InternalNode, we don't have to check further.
+
+        # TODO: Another alternative is to check the total size of all known
+        #       LeafNodes. If there is some formula we can use to determine the
+        #       final size without actually having to read in any more
+        #       children, it would be nice to have. However, we have to be
+        #       careful with stuff like nodes that pull out the common prefix
+        #       of each key, as adding a new key can change the common prefix
+        #       and cause size changes greater than the length of one key.
+        #       So for now, we just add everything to a new Leaf until it
+        #       splits, as we know that will give the right answer
+        new_leaf = LeafNode()
+        new_leaf.set_maximum_size(self._maximum_size)
+        new_leaf._key_width = self._key_width
+        keys = {}
+        # There is some overlap with _iter_nodes here, but not a lot, and it
+        # allows us to do quick evaluation without paging everything in
+        for prefix, node in self._items.iteritems():
+            if type(node) == tuple:
+                keys[node] = prefix
+            else:
+                if isinstance(node, InternalNode):
+                    # Without looking at any leaf nodes, we are sure
+                    return self
+                for key, value in node._items.iteritems():
+                    if new_leaf._map_no_split(key, value):
+                        # Adding this key would cause a split, so we know we
+                        # don't need to collapse
+                        return self
+        # So far, everything fits. Page in the rest of the nodes, and see if it
+        # holds true.
+        if keys:
+            # TODO: Consider looping over a limited set of keys (like 25 or so
+            #       at a time). If we have to read more than 25 we almost
+            #       certainly won't fit them all into a single new LeafNode, so
+            #       reading the extra nodes is a waste.
+            #       This will probably matter more with hash serialised keys,
+            #       as we will get InternalNodes with more references.
+            #       The other argument is that unmap() is uncommon, so we don't
+            #       need to optimize it. But map() with a slightly shorter
+            #       value may happen a lot.
+            stream = store.get_record_stream(keys, 'unordered', True)
+            nodes = []
+            # Fully consume the stream, even if we could determine that we
+            # don't need to continue. We requested the bytes, we may as well
+            # use them
+            for record in stream:
+                node = _deserialise(record.get_bytes_as('fulltext'), record.key)
+                self._items[keys[record.key]] = node
+                nodes.append(node)
+            for node in nodes:
+                if isinstance(node, InternalNode):
+                    # We know we won't fit
+                    return self
+                for key, value in node._items.iteritems():
+                    if new_leaf._map_no_split(key, value):
+                        return self
+
+        # We have gone to every child, and everything fits in a single leaf
+        # node, we no longer need this internal node
+        return new_leaf
 
 
 def _deserialise(bytes, key):

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2008-12-03 23:10:26 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2008-12-04 17:01:09 +0000
@@ -1530,7 +1530,10 @@
             # TODO: Probably needs to know all possible indices for this pack
             # - or maybe list the directory and move all indices matching this
             # name whether we recognize it or not?
-            for suffix in ('.iix', '.six', '.tix', '.rix'):
+            suffixes = ['.iix', '.six', '.tix', '.rix']
+            if self.chk_index is not None:
+                suffixes.append('.cix')
+            for suffix in suffixes:
                 self._index_transport.rename(pack.name + suffix,
                     '../obsolete_packs/' + pack.name + suffix)
 

=== modified file 'bzrlib/tests/test_chk_map.py'
--- a/bzrlib/tests/test_chk_map.py	2008-12-01 04:36:25 +0000
+++ b/bzrlib/tests/test_chk_map.py	2008-12-02 23:44:34 +0000
@@ -70,6 +70,73 @@
         self.assertEqual("chkleaf:\n0\n1\n0\n", self.read_bytes(chk_bytes, root_key))
         return root_key
 
+    def assertMapLayoutEqual(self, map_one, map_two):
+        """Assert that the internal structure is identical between the maps."""
+        map_one._ensure_root()
+        node_one_stack = [map_one._root_node]
+        map_two._ensure_root()
+        node_two_stack = [map_two._root_node]
+        while node_one_stack:
+            node_one = node_one_stack.pop()
+            node_two = node_two_stack.pop()
+            if node_one.__class__ != node_two.__class__:
+                self.assertEqualDiff(map_one._dump_tree(),
+                                     map_two._dump_tree())
+            if isinstance(node_one, InternalNode):
+                # Internal nodes must have identical references
+                self.assertEqual(sorted(node_one._items.keys()),
+                                 sorted(node_two._items.keys()))
+                self.assertEqual(node_one._prefix, node_two._prefix)
+                node_one_stack.extend(node_one._iter_nodes(map_one._store))
+                node_two_stack.extend(node_two._iter_nodes(map_two._store))
+            else:
+                # Leaf nodes must have identical contents
+                self.assertEqual(node_one._items, node_two._items)
+
+    def assertCanonicalForm(self, chkmap):
+        """Assert that the chkmap is in 'canonical' form.
+
+        We do this by adding all of the key value pairs from scratch, both in
+        forward order and reverse order, and assert that the final tree layout
+        is identical.
+        """
+        items = list(chkmap.iteritems())
+        map_forward = chk_map.CHKMap(None, None)
+        map_forward._root_node.set_maximum_size(chkmap._root_node.maximum_size)
+        for key, value in items:
+            map_forward.map(key, value)
+        self.assertMapLayoutEqual(map_forward, chkmap)
+        map_reverse = chk_map.CHKMap(None, None)
+        map_reverse._root_node.set_maximum_size(chkmap._root_node.maximum_size)
+        for key, value in reversed(items):
+            map_reverse.map(key, value)
+        self.assertMapLayoutEqual(map_reverse, chkmap)
+
+    def test_assert_map_layout_equal(self):
+        store = self.get_chk_bytes()
+        map_one = CHKMap(store, None)
+        map_one._root_node.set_maximum_size(20)
+        map_two = CHKMap(store, None)
+        map_two._root_node.set_maximum_size(20)
+        self.assertMapLayoutEqual(map_one, map_two)
+        map_one.map('aaa', 'value')
+        self.assertRaises(AssertionError,
+            self.assertMapLayoutEqual, map_one, map_two)
+        map_two.map('aaa', 'value')
+        self.assertMapLayoutEqual(map_one, map_two)
+        # Split the tree, so we ensure that internal nodes and leaf nodes are
+        # properly checked
+        map_one.map('aab', 'value')
+        self.assertIsInstance(map_one._root_node, InternalNode)
+        self.assertRaises(AssertionError,
+            self.assertMapLayoutEqual, map_one, map_two)
+        map_two.map('aab', 'value')
+        self.assertMapLayoutEqual(map_one, map_two)
+        map_one.map('aac', 'value')
+        self.assertRaises(AssertionError,
+            self.assertMapLayoutEqual, map_one, map_two)
+        self.assertCanonicalForm(map_one)
+
     def test_from_dict_empty(self):
         chk_bytes = self.get_chk_bytes()
         root_key = CHKMap.from_dict(chk_bytes, {})
@@ -120,6 +187,8 @@
                              (None, ('bba',), 'target2'),
                              (None, ('bbb',), 'common')])
         root_key1 = chkmap1._save()
+        self.assertCanonicalForm(chkmap1)
+
         chkmap2 = CHKMap(chk_bytes, None)
         chkmap2._root_node.set_maximum_size(10)
         chkmap2.apply_delta([(None, ('bbb',), 'common'),
@@ -128,6 +197,7 @@
         root_key2 = chkmap2._save()
         self.assertEqualDiff(chkmap1._dump_tree(), chkmap2._dump_tree())
         self.assertEqual(root_key1, root_key2)
+        self.assertCanonicalForm(chkmap2)
 
     def test_stable_splitting(self):
         store = self.get_chk_bytes()
@@ -136,13 +206,15 @@
         chkmap._root_node.set_maximum_size(30)
         chkmap.map(('aaa',), 'v')
         self.assertEqualDiff("'' LeafNode None\n"
-                             "      ('aaa',) 'v'",
+                             "      ('aaa',) 'v'\n",
                              chkmap._dump_tree())
         chkmap.map(('aab',), 'v')
         self.assertEqualDiff("'' LeafNode None\n"
                              "      ('aaa',) 'v'\n"
-                             "      ('aab',) 'v'",
+                             "      ('aab',) 'v'\n",
                              chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+
         # Creates a new internal node, and splits the others into leaves
         chkmap.map(('aac',), 'v')
         self.assertEqualDiff("'' InternalNode None\n"
@@ -151,8 +223,10 @@
                              "  'aab' LeafNode None\n"
                              "      ('aab',) 'v'\n"
                              "  'aac' LeafNode None\n"
-                             "      ('aac',) 'v'",
+                             "      ('aac',) 'v'\n",
                              chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+
         # Splits again, because it can't fit in the current structure
         chkmap.map(('bbb',), 'v')
         self.assertEqualDiff("'' InternalNode None\n"
@@ -164,8 +238,19 @@
                              "    'aac' LeafNode None\n"
                              "      ('aac',) 'v'\n"
                              "  'b' LeafNode None\n"
-                             "      ('bbb',) 'v'",
+                             "      ('bbb',) 'v'\n",
                              chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+
+    def test_map_splits_with_longer_key(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 1 key per LeafNode
+        chkmap._root_node.set_maximum_size(10)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aaaa',), 'v')
+        self.assertCanonicalForm(chkmap)
+        self.assertIsInstance(chkmap._root_node, InternalNode)
 
     def test_deep_splitting(self):
         store = self.get_chk_bytes()
@@ -176,7 +261,7 @@
         chkmap.map(('aaaaabaa',), 'v')
         self.assertEqualDiff("'' LeafNode None\n"
                              "      ('aaaaaaaa',) 'v'\n"
-                             "      ('aaaaabaa',) 'v'",
+                             "      ('aaaaabaa',) 'v'\n",
                              chkmap._dump_tree())
         chkmap.map(('aaabaaaa',), 'v')
         chkmap.map(('aaababaa',), 'v')
@@ -186,7 +271,7 @@
                              "      ('aaaaabaa',) 'v'\n"
                              "  'aaab' LeafNode None\n"
                              "      ('aaabaaaa',) 'v'\n"
-                             "      ('aaababaa',) 'v'",
+                             "      ('aaababaa',) 'v'\n",
                              chkmap._dump_tree())
         chkmap.map(('aaabacaa',), 'v')
         chkmap.map(('aaabadaa',), 'v')
@@ -202,7 +287,7 @@
                              "    'aaabac' LeafNode None\n"
                              "      ('aaabacaa',) 'v'\n"
                              "    'aaabad' LeafNode None\n"
-                             "      ('aaabadaa',) 'v'",
+                             "      ('aaabadaa',) 'v'\n",
                              chkmap._dump_tree())
         chkmap.map(('aaababba',), 'v')
         chkmap.map(('aaababca',), 'v')
@@ -223,7 +308,7 @@
                              "    'aaabac' LeafNode None\n"
                              "      ('aaabacaa',) 'v'\n"
                              "    'aaabad' LeafNode None\n"
-                             "      ('aaabadaa',) 'v'",
+                             "      ('aaabadaa',) 'v'\n",
                              chkmap._dump_tree())
         # Now we add a node that should fit around an existing InternalNode,
         # but has a slightly different key prefix, which causes a new
@@ -249,8 +334,247 @@
                              "      'aaabac' LeafNode None\n"
                              "      ('aaabacaa',) 'v'\n"
                              "      'aaabad' LeafNode None\n"
-                             "      ('aaabadaa',) 'v'",
-                             chkmap._dump_tree())
+                             "      ('aaabadaa',) 'v'\n",
+                             chkmap._dump_tree())
+
+    def test_map_collapses_if_size_changes(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(30)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'very long value that splits')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "  'aab' LeafNode None\n"
+                             "      ('aab',) 'very long value that splits'\n",
+                             chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+        # Now changing the value to something small should cause a rebuild
+        chkmap.map(('aab',), 'v')
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aab',) 'v'\n",
+                             chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+
+    def test_map_double_deep_collapses(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 3 small keys per LeafNode
+        chkmap._root_node.set_maximum_size(40)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'very long value that splits')
+        chkmap.map(('abc',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aa' InternalNode None\n"
+                             "    'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "    'aab' LeafNode None\n"
+                             "      ('aab',) 'very long value that splits'\n"
+                             "  'ab' LeafNode None\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+        chkmap.map(('aab',), 'v')
+        self.assertCanonicalForm(chkmap)
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aab',) 'v'\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+
+    def test_stable_unmap(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(30)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'v')
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aab',) 'v'\n",
+                             chkmap._dump_tree())
+        # Creates a new internal node, and splits the others into leaves
+        chkmap.map(('aac',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "  'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n"
+                             "  'aac' LeafNode None\n"
+                             "      ('aac',) 'v'\n",
+                             chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+        # Now lets unmap one of the keys, and assert that we collapse the
+        # structures.
+        chkmap.unmap(('aac',))
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aab',) 'v'\n",
+                             chkmap._dump_tree())
+        self.assertCanonicalForm(chkmap)
+
+    def test_unmap_double_deep(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 3 keys per LeafNode
+        chkmap._root_node.set_maximum_size(40)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aaab',), 'v')
+        chkmap.map(('aab',), 'very long value')
+        chkmap.map(('abc',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aa' InternalNode None\n"
+                             "    'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aaab',) 'v'\n"
+                             "    'aab' LeafNode None\n"
+                             "      ('aab',) 'very long value'\n"
+                             "  'ab' LeafNode None\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+        # Removing the 'aab' key should cause everything to collapse back to a
+        # single node
+        chkmap.unmap(('aab',))
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aaab',) 'v'\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+
+    def test_unmap_double_deep_non_empty_leaf(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 3 keys per LeafNode
+        chkmap._root_node.set_maximum_size(40)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'long value')
+        chkmap.map(('aabb',), 'v')
+        chkmap.map(('abc',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aa' InternalNode None\n"
+                             "    'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "    'aab' LeafNode None\n"
+                             "      ('aab',) 'long value'\n"
+                             "      ('aabb',) 'v'\n"
+                             "  'ab' LeafNode None\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+        # Removing the 'aab' key should cause everything to collapse back to a
+        # single node
+        chkmap.unmap(('aab',))
+        self.assertEqualDiff("'' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "      ('aabb',) 'v'\n"
+                             "      ('abc',) 'v'\n",
+                             chkmap._dump_tree())
+
+    def test_unmap_with_known_internal_node_doesnt_page(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 3 keys per LeafNode
+        chkmap._root_node.set_maximum_size(30)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'v')
+        chkmap.map(('aac',), 'v')
+        chkmap.map(('abc',), 'v')
+        chkmap.map(('acd',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aa' InternalNode None\n"
+                             "    'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "    'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n"
+                             "    'aac' LeafNode None\n"
+                             "      ('aac',) 'v'\n"
+                             "  'ab' LeafNode None\n"
+                             "      ('abc',) 'v'\n"
+                             "  'ac' LeafNode None\n"
+                             "      ('acd',) 'v'\n",
+                             chkmap._dump_tree())
+        # Save everything to the map, and start over
+        chkmap = CHKMap(store, chkmap._save())
+        # Mapping an 'aa' key loads the internal node, but should not map the
+        # 'ab' and 'ac' nodes
+        chkmap.map(('aad',), 'v')
+        self.assertIsInstance(chkmap._root_node._items['aa'], InternalNode)
+        self.assertIsInstance(chkmap._root_node._items['ab'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['ac'], tuple)
+        # Unmapping 'acd' can notice that 'aa' is an InternalNode and not have
+        # to map in 'ab'
+        chkmap.unmap(('acd',))
+        self.assertIsInstance(chkmap._root_node._items['aa'], InternalNode)
+        self.assertIsInstance(chkmap._root_node._items['ab'], tuple)
+
+    def test_unmap_without_fitting_doesnt_page_in(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(20)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "  'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n",
+                             chkmap._dump_tree())
+        # Save everything to the map, and start over
+        chkmap = CHKMap(store, chkmap._save())
+        chkmap.map(('aac',), 'v')
+        chkmap.map(('aad',), 'v')
+        chkmap.map(('aae',), 'v')
+        chkmap.map(('aaf',), 'v')
+        # At this point, the previous nodes should not be paged in, but the
+        # newly added nodes would be
+        self.assertIsInstance(chkmap._root_node._items['aaa'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aab'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aaf'], LeafNode)
+        # Now unmapping one of the new nodes will use only the already-paged-in
+        # nodes to determine that we don't need to do more.
+        chkmap.unmap(('aaf',))
+        self.assertIsInstance(chkmap._root_node._items['aaa'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aab'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aae'], LeafNode)
+
+    def test_unmap_pages_in_if_necessary(self):
+        store = self.get_chk_bytes()
+        chkmap = CHKMap(store, None)
+        # Should fit 2 keys per LeafNode
+        chkmap._root_node.set_maximum_size(20)
+        chkmap.map(('aaa',), 'v')
+        chkmap.map(('aab',), 'v')
+        chkmap.map(('aac',), 'v')
+        self.assertEqualDiff("'' InternalNode None\n"
+                             "  'aaa' LeafNode None\n"
+                             "      ('aaa',) 'v'\n"
+                             "  'aab' LeafNode None\n"
+                             "      ('aab',) 'v'\n"
+                             "  'aac' LeafNode None\n"
+                             "      ('aac',) 'v'\n",
+                             chkmap._dump_tree())
+        # Save everything to the map, and start over
+        chkmap = CHKMap(store, chkmap._save())
+        chkmap.map(('aad',), 'v')
+        # At this point, the previous nodes should not be paged in, but the
+        # newly added node would be
+        self.assertIsInstance(chkmap._root_node._items['aaa'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aab'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aac'], tuple)
+        self.assertIsInstance(chkmap._root_node._items['aad'], LeafNode)
+        # Unmapping the new node will check the existing nodes to see if they
+        # would fit, and find out that they do not
+        chkmap.unmap(('aad',))
+        self.assertIsInstance(chkmap._root_node._items['aaa'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aab'], LeafNode)
+        self.assertIsInstance(chkmap._root_node._items['aac'], LeafNode)
 
     def test_iter_changes_empty_ab(self):
         # Asking for changes between an empty dict to a dict with keys returns
@@ -506,7 +830,7 @@
             "      ('aab',) 'value2'",
             "  'b' LeafNode sha1:67f15d1dfa451d388ed08ff17b4f9578ba010d01",
             "      ('bbb',) 'value3'",
-            ]), chkmap._dump_tree())
+            ""]), chkmap._dump_tree())
 
     def test__dump_tree_in_progress(self):
         chkmap = self._get_map({("aaa",): "value1", ("aab",): "value2"},
@@ -523,7 +847,7 @@
             "      ('aab',) 'value2'",
             "  'b' LeafNode None",
             "      ('bbb',) 'value3'",
-            ]), chkmap._dump_tree())
+            ""]), chkmap._dump_tree())
 
 
 class TestLeafNode(TestCaseWithStore):
@@ -859,10 +1183,8 @@
         # children.
         result = node.unmap(chkmap._store, ('k23',))
         # check the pointed-at object within node - k2 should now point at the
-        # k22 leaf (which should not even have been paged in).
-        ptr = node._items['k2']
-        self.assertIsInstance(ptr, tuple)
-        child = _deserialise(self.read_bytes(chkmap._store, ptr), ptr)
+        # k22 leaf (which has been paged in to see if we can collapse the tree)
+        child = node._items['k2']
         self.assertIsInstance(child, LeafNode)
         self.assertEqual(1, len(child))
         self.assertEqual({('k22',): 'bar'},
@@ -1017,7 +1339,7 @@
             "    'aab' LeafNode sha1:10567a3bfcc764fb8d8d9edaa28c0934ada366c5\n"
             "      ('aab',) 'new'\n"
             "  'c' LeafNode sha1:263208de2fce0a8f9db614c1ca39e8f6de8b3802\n"
-            "      ('c',) 'common'",
+            "      ('c',) 'common'\n",
             CHKMap(self.get_chk_bytes(), target)._dump_tree())
         # The key for the internal aa node
         aa_key = ('sha1:2ce01860338a614b93883a5bbeb89920137ac7ef',)



More information about the bazaar-commits mailing list