Rev 3807: Getting rid of Inv.copy() and changing the LRU into a FIFO drops the in http://bzr.arbash-meinel.com/branches/bzr/brisbane/merge_dev

John Arbash Meinel john at arbash-meinel.com
Tue Dec 9 04:27:12 GMT 2008


At http://bzr.arbash-meinel.com/branches/bzr/brisbane/merge_dev

------------------------------------------------------------
revno: 3807
revision-id: john at arbash-meinel.com-20081209042646-2ncxnd5hxio03et8
parent: john at arbash-meinel.com-20081207182513-00r47hr8fevohotb
parent: john at arbash-meinel.com-20081208183149-0m60m72x4pw378ib
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: merge_dev
timestamp: Mon 2008-12-08 22:26:46 -0600
message:
  Getting rid of Inv.copy() and changing the LRU into a FIFO drops the
  time down into 2m19s. (Down from 4m originally.)
  Also hacked up the Inventory.add() to also be minimal.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/inventory.py            inventory.py-20050309040759-6648b84ca2005b37
  bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
  bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
  bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
  bzrlib/xml5.py                 xml5.py-20080328030717-t9guwinq8hom0ar3-1
  bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
    ------------------------------------------------------------
    revno: 3801.1.3
    revision-id: john at arbash-meinel.com-20081208183149-0m60m72x4pw378ib
    parent: john at arbash-meinel.com-20081207184233-fu65des8q4jhr8ig
    parent: john at arbash-meinel.com-20081208183041-9r88rfoms0y8cr5b
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: debug_hacks
    timestamp: Mon 2008-12-08 12:31:49 -0600
    message:
      Merge the XML entry cache.
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
      bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
      bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
        ------------------------------------------------------------
        revno: 3735.28.110
        revision-id: john at arbash-meinel.com-20081208183041-9r88rfoms0y8cr5b
        parent: john at arbash-meinel.com-20081208183004-gaxfel9xzzwrc093
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: xml_cache
        timestamp: Mon 2008-12-08 12:30:41 -0600
        message:
          If we are going to thrash the inventory entry cache, increase its size.
        modified:
          bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
        ------------------------------------------------------------
        revno: 3735.28.109
        revision-id: john at arbash-meinel.com-20081208183004-gaxfel9xzzwrc093
        parent: john at arbash-meinel.com-20081208182757-2rls8q1ri36ub6e9
        parent: john at arbash-meinel.com-20081208182300-u1qmnxafwt2rr5dz
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: xml_cache
        timestamp: Mon 2008-12-08 12:30:04 -0600
        message:
          Merge the lru cache changes.
        modified:
          NEWS                           NEWS-20050323055033-4e00b5db738777ff
          bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
          bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
            ------------------------------------------------------------
            revno: 3735.137.1
            revision-id: john at arbash-meinel.com-20081208182300-u1qmnxafwt2rr5dz
            parent: pqm at pqm.ubuntu.com-20081205181554-ofrdnafloc43bxkh
            committer: John Arbash Meinel <john at arbash-meinel.com>
            branch nick: lru_cache
            timestamp: Mon 2008-12-08 12:23:00 -0600
            message:
              Add LRUCache.resize(), and change the init arguments for LRUCache.
              
              The old name was a bit confusing, and caused LRUSizeCache to re-use variables in
              a confusing way with LRUCache.
              
              
              Also, this changes the default cleanup size to be 80% of max_size. This should
              be better, as it means we get a little bit of room when adding keys,
              rather than having to cleanup after every add, we can instead do it in
              batches.
            modified:
              NEWS                           NEWS-20050323055033-4e00b5db738777ff
              bzrlib/lru_cache.py            lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
              bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
        ------------------------------------------------------------
        revno: 3735.28.108
        revision-id: john at arbash-meinel.com-20081208182757-2rls8q1ri36ub6e9
        parent: pqm at pqm.ubuntu.com-20081205181554-ofrdnafloc43bxkh
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: xml_cache
        timestamp: Mon 2008-12-08 12:27:57 -0600
        message:
          Add an InventoryEntry cache to the xml deserializer.
        modified:
          bzrlib/xml8.py                 xml5.py-20050907032657-aac8f960815b66b1
    ------------------------------------------------------------
    revno: 3801.1.2
    revision-id: john at arbash-meinel.com-20081207184233-fu65des8q4jhr8ig
    parent: john at arbash-meinel.com-20081207174338-vx3sm4yzedmk338j
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: debug_hacks
    timestamp: Sun 2008-12-07 12:42:33 -0600
    message:
      Don't execute pack ops twice.
    modified:
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3801.1.1
    revision-id: john at arbash-meinel.com-20081207174338-vx3sm4yzedmk338j
    parent: john at arbash-meinel.com-20081207174043-xjxckelplsqdey73
    parent: john at arbash-meinel.com-20081203043230-f0riipwg6wkjr6ae
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: merge_dev
    timestamp: Sun 2008-12-07 11:43:38 -0600
    message:
      Merge in the debug_hacks.
    modified:
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3791.1.16
    revision-id: john at arbash-meinel.com-20081203043230-f0riipwg6wkjr6ae
    parent: john at arbash-meinel.com-20081203041138-ecssp5m0mqxjgzhu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 22:32:30 -0600
    message:
      Hack in some other code, so we can determine how much compression we get.
      
      This just tracks the 'old size' of all the packs that are getting combined versus the
      'new size' of the newly created pack file.
    modified:
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
    ------------------------------------------------------------
    revno: 3791.1.15
    revision-id: john at arbash-meinel.com-20081203041138-ecssp5m0mqxjgzhu
    parent: john at arbash-meinel.com-20081203035643-0x4npc4s8mh8nqlh
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: chk_map
    timestamp: Tue 2008-12-02 22:11:38 -0600
    message:
      Add size information to the mutter when -Dpack is used.
      
      Also fix a bug in -Dpack when the repository doesn't support chk_bytes.
    modified:
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
-------------- next part --------------
=== modified file 'NEWS'
--- a/NEWS	2008-12-07 17:40:43 +0000
+++ b/NEWS	2008-12-08 18:31:49 +0000
@@ -9,6 +9,11 @@
 --------------
 
   CHANGES:
+  
+   * ``LRUCache(after_cleanup_size)`` was renamed to
+     ``after_cleanup_count`` and the old name deprecated. The new name is
+     used for clarity, and to avoid confusion with
+     ``LRUSizeCache(after_cleanup_size)``. (John Arbash Meinel)
 
   NEW FEATURES:
 

=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py	2008-12-07 17:40:43 +0000
+++ b/bzrlib/inventory.py	2008-12-09 04:26:46 +0000
@@ -1137,6 +1137,16 @@
             parent.children[entry.name] = entry
         return self._add_child(entry)
 
+    def _add_one_entry_no_checks(self, entry, last_parent):
+        assert last_parent is not None
+        if last_parent.file_id == entry.parent_id:
+            parent = last_parent
+        else:
+            parent = self._byid[entry.parent_id]
+        parent.children[entry.name] = entry
+        self._byid[entry.file_id] = entry
+        return parent
+
     def add_path(self, relpath, kind, file_id=None, parent_id=None):
         """Add entry from a path.
 

=== modified file 'bzrlib/lru_cache.py'
--- a/bzrlib/lru_cache.py	2008-10-14 20:19:06 +0000
+++ b/bzrlib/lru_cache.py	2008-12-08 18:23:00 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2006 Canonical Ltd
+# Copyright (C) 2006, 2008 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -17,25 +17,26 @@
 """A simple least-recently-used (LRU) cache."""
 
 from collections import deque
-import gc
+
+from bzrlib import symbol_versioning
 
 
 class LRUCache(object):
     """A class which manages a cache of entries, removing unused ones."""
 
-    def __init__(self, max_cache=100, after_cleanup_size=None):
-        self._max_cache = max_cache
-        if after_cleanup_size is None:
-            self._after_cleanup_size = self._max_cache
-        else:
-            self._after_cleanup_size = min(after_cleanup_size, self._max_cache)
-
-        self._compact_queue_length = 4*self._max_cache
-
+    def __init__(self, max_cache=100, after_cleanup_count=None,
+                 after_cleanup_size=symbol_versioning.DEPRECATED_PARAMETER):
+        if symbol_versioning.deprecated_passed(after_cleanup_size):
+            symbol_versioning.warn('LRUCache.__init__(after_cleanup_size) was'
+                                   ' deprecated in 1.11. Use'
+                                   ' after_cleanup_count instead.',
+                                   DeprecationWarning)
+            after_cleanup_count = after_cleanup_size
         self._cache = {}
         self._cleanup = {}
         self._queue = deque() # Track when things are accessed
         self._refcount = {} # number of entries in self._queue for each key
+        self._update_max_cache(max_cache, after_cleanup_count)
 
     def __contains__(self, key):
         return key in self._cache
@@ -89,11 +90,13 @@
         """Clear the cache until it shrinks to the requested size.
 
         This does not completely wipe the cache, just makes sure it is under
-        the after_cleanup_size.
+        the after_cleanup_count.
         """
         # Make sure the cache is shrunk to the correct size
-        while len(self._cache) > self._after_cleanup_size:
+        while len(self._cache) > self._after_cleanup_count:
             self._remove_lru()
+        # No need to compact the queue at this point, because the code that
+        # calls this would have already triggered it based on queue length
 
     def __setitem__(self, key, value):
         """Add a value to the cache, there will be no cleanup function."""
@@ -150,6 +153,23 @@
         while self._cache:
             self._remove_lru()
 
+    def resize(self, max_cache, after_cleanup_count=None):
+        """Change the number of entries that will be cached."""
+        self._update_max_cache(max_cache,
+                               after_cleanup_count=after_cleanup_count)
+
+    def _update_max_cache(self, max_cache, after_cleanup_count=None):
+        self._max_cache = max_cache
+        if after_cleanup_count is None:
+            self._after_cleanup_count = self._max_cache * 8 / 10
+        else:
+            self._after_cleanup_count = min(after_cleanup_count, self._max_cache)
+
+        self._compact_queue_length = 4*self._max_cache
+        if len(self._queue) > self._compact_queue_length:
+            self._compact_queue()
+        self.cleanup()
+
 
 class LRUSizeCache(LRUCache):
     """An LRUCache that removes things based on the size of the values.
@@ -175,20 +195,15 @@
             The function should take the form "compute_size(value) => integer".
             If not supplied, it defaults to 'len()'
         """
-        # This approximates that texts are > 0.5k in size. It only really
-        # effects when we clean up the queue, so we don't want it to be too
-        # large.
-        LRUCache.__init__(self, max_cache=int(max_size/512))
-        self._max_size = max_size
-        if after_cleanup_size is None:
-            self._after_cleanup_size = self._max_size
-        else:
-            self._after_cleanup_size = min(after_cleanup_size, self._max_size)
-
         self._value_size = 0
         self._compute_size = compute_size
         if compute_size is None:
             self._compute_size = len
+        # This approximates that texts are > 0.5k in size. It only really
+        # effects when we clean up the queue, so we don't want it to be too
+        # large.
+        self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
+        LRUCache.__init__(self, max_cache=max(int(max_size/512), 1))
 
     def add(self, key, value, cleanup=None):
         """Add a new value to the cache.
@@ -229,3 +244,16 @@
         """Remove an entry, making sure to maintain the invariants."""
         val = LRUCache._remove(self, key)
         self._value_size -= self._compute_size(val)
+
+    def resize(self, max_size, after_cleanup_size=None):
+        """Change the number of bytes that will be cached."""
+        self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
+        max_cache = max(int(max_size/512), 1)
+        self._update_max_cache(max_cache)
+
+    def _update_max_size(self, max_size, after_cleanup_size=None):
+        self._max_size = max_size
+        if after_cleanup_size is None:
+            self._after_cleanup_size = self._max_size * 8 / 10
+        else:
+            self._after_cleanup_size = min(after_cleanup_size, self._max_size)

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py	2008-12-07 18:25:13 +0000
+++ b/bzrlib/repofmt/pack_repo.py	2008-12-09 04:26:46 +0000
@@ -418,9 +418,15 @@
                 '../packs/' + self.name + '.pack')
         self._state = 'finished'
         if 'pack' in debug.debug_flags:
+            try:
+                size = self.pack_transport.stat(self.name + '.pack').st_size
+                size /= 1024.*1024
+            except errors.TransportNotPossible:
+                size = -1
             # XXX: size might be interesting?
-            mutter('%s: create_pack: pack renamed into place: %s%s->%s%s t+%6.3fs',
-                time.ctime(), self.upload_transport.base, self.random_name,
+            mutter('%s: create_pack: pack renamed into place (%.3fMB): %s%s->%s%s'
+                   ' t+%6.3fs',
+                time.ctime(), size, self.upload_transport.base, self.random_name,
                 self.pack_transport, self.name,
                 time.time() - self.start_time)
 
@@ -815,10 +821,17 @@
                 rev_count = len(self.revision_ids)
             else:
                 rev_count = 'all'
-            mutter('%s: create_pack: creating pack from source packs: '
+            size = 0
+            for a_pack in self.packs:
+                try:
+                    size += a_pack.pack_transport.stat(a_pack.name + '.pack').st_size
+                except errors.TransportNotPossible:
+                    pass
+            size /= 1024.*1024
+            mutter('%s: create_pack: creating pack from source packs (%.3fMB): '
                 '%s%s %s revisions wanted %s t=0',
-                time.ctime(), self._pack_collection._upload_transport.base, new_pack.random_name,
-                plain_pack_list, rev_count)
+                time.ctime(), size, self._pack_collection._upload_transport.base,
+                new_pack.random_name, plain_pack_list, rev_count)
         self._copy_revision_texts()
         self._copy_inventory_texts()
         self._copy_text_texts()
@@ -1376,8 +1389,17 @@
             'containing %d revisions. Packing %d files into %d affecting %d'
             ' revisions', self, total_packs, total_revisions, num_old_packs,
             num_new_packs, num_revs_affected)
-        self._execute_pack_operations(pack_operations)
-        mutter('Auto-packing repository %s completed', self)
+        old_size, new_size = self._execute_pack_operations(pack_operations)
+        if old_size is None:
+            old_size = -1
+        else:
+            old_size /= (1024.0*1024)
+        if new_size is None:
+            new_size = -1
+        else:
+            new_size /= (1024.0*1024)
+        mutter('Auto-packing repository %s completed %.3fMB => %.3fMB',
+               self.transport.base, old_size, new_size)
         return True
 
     def _execute_pack_operations(self, pack_operations, _packer_class=Packer):
@@ -1387,19 +1409,32 @@
         :param _packer_class: The class of packer to use (default: Packer).
         :return: None.
         """
+        new_size = 0
         for revision_count, packs in pack_operations:
             # we may have no-ops from the setup logic
             if len(packs) == 0:
                 continue
-            _packer_class(self, packs, '.autopack').pack()
+            new_pack = _packer_class(self, packs, '.autopack').pack()
+            try:
+                new_size += new_pack.pack_transport.stat(new_pack.name + '.pack').st_size
+            except errors.TransportNotPossible:
+                new_size = None
             for pack in packs:
                 self._remove_pack_from_memory(pack)
         # record the newly available packs and stop advertising the old
         # packs
+        if new_size is None:
+            old_size = None
+        else:
+            old_size = 0
+            for revision_count, packs in pack_operations:
+                for a_pack in packs:
+                    old_size += a_pack.pack_transport.stat(a_pack.name + '.pack').st_size
         self._save_pack_names(clear_obsolete_packs=True)
         # Move the old packs out of the way now they are no longer referenced.
         for revision_count, packs in pack_operations:
             self._obsolete_packs(packs)
+        return old_size, new_size
 
     def lock_names(self):
         """Acquire the mutex around the pack-names index.

=== modified file 'bzrlib/tests/test_lru_cache.py'
--- a/bzrlib/tests/test_lru_cache.py	2008-10-14 20:19:06 +0000
+++ b/bzrlib/tests/test_lru_cache.py	2008-12-08 18:23:00 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2006 Canonical Ltd
+# Copyright (C) 2006, 2008 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -38,7 +38,7 @@
 
     def test_overflow(self):
         """Adding extra entries will pop out old ones."""
-        cache = lru_cache.LRUCache(max_cache=1)
+        cache = lru_cache.LRUCache(max_cache=1, after_cleanup_count=1)
 
         cache['foo'] = 'bar'
         # With a max cache of 1, adding 'baz' should pop out 'foo'
@@ -113,7 +113,7 @@
         self.assertEqual([(2, 20), (2, 25)], cleanup_called)
 
     def test_len(self):
-        cache = lru_cache.LRUCache(max_cache=10)
+        cache = lru_cache.LRUCache(max_cache=10, after_cleanup_count=10)
 
         cache[1] = 10
         cache[2] = 20
@@ -140,8 +140,8 @@
         # We hit the max
         self.assertEqual(10, len(cache))
 
-    def test_cleanup_shrinks_to_after_clean_size(self):
-        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=3)
+    def test_cleanup_shrinks_to_after_clean_count(self):
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=3)
 
         cache.add(1, 10)
         cache.add(2, 20)
@@ -156,15 +156,16 @@
         self.assertEqual(3, len(cache))
 
     def test_after_cleanup_larger_than_max(self):
-        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=10)
-        self.assertEqual(5, cache._after_cleanup_size)
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=10)
+        self.assertEqual(5, cache._after_cleanup_count)
 
     def test_after_cleanup_none(self):
-        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=None)
-        self.assertEqual(5, cache._after_cleanup_size)
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=None)
+        # By default _after_cleanup_size is 80% of the normal size
+        self.assertEqual(4, cache._after_cleanup_count)
 
     def test_cleanup(self):
-        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=2)
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=2)
 
         # Add these in order
         cache.add(1, 10)
@@ -214,7 +215,7 @@
         self.assertIs(obj, cache.get(3, obj))
 
     def test_keys(self):
-        cache = lru_cache.LRUCache(max_cache=5)
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=5)
 
         cache[1] = 2
         cache[2] = 3
@@ -225,6 +226,52 @@
         cache[6] = 7
         self.assertEqual([2, 3, 4, 5, 6], sorted(cache.keys()))
 
+    def test_after_cleanup_size_deprecated(self):
+        obj = self.callDeprecated([
+            'LRUCache.__init__(after_cleanup_size) was deprecated in 1.11.'
+            ' Use after_cleanup_count instead.'],
+            lru_cache.LRUCache, 50, after_cleanup_size=25)
+        self.assertEqual(obj._after_cleanup_count, 25)
+
+    def test_resize_smaller(self):
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=4)
+        cache[1] = 2
+        cache[2] = 3
+        cache[3] = 4
+        cache[4] = 5
+        cache[5] = 6
+        self.assertEqual([1, 2, 3, 4, 5], sorted(cache.keys()))
+        cache[6] = 7
+        self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+        # Now resize to something smaller, which triggers a cleanup
+        cache.resize(max_cache=3, after_cleanup_count=2)
+        self.assertEqual([5, 6], sorted(cache.keys()))
+        # Adding something will use the new size
+        cache[7] = 8
+        self.assertEqual([5, 6, 7], sorted(cache.keys()))
+        cache[8] = 9
+        self.assertEqual([7, 8], sorted(cache.keys()))
+
+    def test_resize_larger(self):
+        cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=4)
+        cache[1] = 2
+        cache[2] = 3
+        cache[3] = 4
+        cache[4] = 5
+        cache[5] = 6
+        self.assertEqual([1, 2, 3, 4, 5], sorted(cache.keys()))
+        cache[6] = 7
+        self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+        cache.resize(max_cache=8, after_cleanup_count=6)
+        self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+        cache[7] = 8
+        cache[8] = 9
+        cache[9] = 10
+        cache[10] = 11
+        self.assertEqual([3, 4, 5, 6, 7, 8, 9, 10], sorted(cache.keys()))
+        cache[11] = 12 # triggers cleanup back to new after_cleanup_count
+        self.assertEqual([6, 7, 8, 9, 10, 11], sorted(cache.keys()))
+
 
 class TestLRUSizeCache(tests.TestCase):
 
@@ -232,7 +279,7 @@
         cache = lru_cache.LRUSizeCache()
         self.assertEqual(2048, cache._max_cache)
         self.assertEqual(4*2048, cache._compact_queue_length)
-        self.assertEqual(cache._max_size, cache._after_cleanup_size)
+        self.assertEqual(int(cache._max_size*0.8), cache._after_cleanup_size)
         self.assertEqual(0, cache._value_size)
 
     def test_add_tracks_size(self):
@@ -332,3 +379,37 @@
         cache[2] = 'b'
         cache[3] = 'cdef'
         self.assertEqual([1, 2, 3], sorted(cache.keys()))
+
+    def test_resize_smaller(self):
+        cache = lru_cache.LRUSizeCache(max_size=10, after_cleanup_size=9)
+        cache[1] = 'abc'
+        cache[2] = 'def'
+        cache[3] = 'ghi'
+        cache[4] = 'jkl'
+        # Triggers a cleanup
+        self.assertEqual([2, 3, 4], sorted(cache.keys()))
+        # Resize should also cleanup again
+        cache.resize(max_size=6, after_cleanup_size=4)
+        self.assertEqual([4], sorted(cache.keys()))
+        # Adding should use the new max size
+        cache[5] = 'mno'
+        self.assertEqual([4, 5], sorted(cache.keys()))
+        cache[6] = 'pqr'
+        self.assertEqual([6], sorted(cache.keys()))
+
+    def test_resize_larger(self):
+        cache = lru_cache.LRUSizeCache(max_size=10, after_cleanup_size=9)
+        cache[1] = 'abc'
+        cache[2] = 'def'
+        cache[3] = 'ghi'
+        cache[4] = 'jkl'
+        # Triggers a cleanup
+        self.assertEqual([2, 3, 4], sorted(cache.keys()))
+        cache.resize(max_size=15, after_cleanup_size=12)
+        self.assertEqual([2, 3, 4], sorted(cache.keys()))
+        cache[5] = 'mno'
+        cache[6] = 'pqr'
+        self.assertEqual([2, 3, 4, 5, 6], sorted(cache.keys()))
+        cache[7] = 'stu'
+        self.assertEqual([4, 5, 6, 7], sorted(cache.keys()))
+

=== modified file 'bzrlib/xml5.py'
--- a/bzrlib/xml5.py	2008-04-24 07:22:53 +0000
+++ b/bzrlib/xml5.py	2008-12-09 04:26:46 +0000
@@ -44,13 +44,28 @@
         if data_revision_id is not None:
             revision_id = cache_utf8.encode(data_revision_id)
         inv = inventory.Inventory(root_id, revision_id=revision_id)
+        last = (inv.root.file_id, inv.root, inv.root.children)
+        byid = inv._byid
+        unpack_entry = self._unpack_entry
         for e in elt:
-            ie = self._unpack_entry(e)
-            if ie.parent_id is None:
-                ie.parent_id = root_id
-            inv.add(ie)
+            ie = unpack_entry(e)
+            parent_id = ie.parent_id
+            if parent_id is None:
+                ie.parent_id = parent_id = root_id
+            if last[0] == parent_id:
+                next = last
+            else:
+                parent = byid[parent_id]
+                next = (parent_id, parent, parent.children)
+            next[2][ie.name] = ie
+            byid[ie.file_id] = ie
+            last = next
         if revision_id is not None:
             inv.root.revision = revision_id
+        if len(inv) > xml8._entry_cache._max_cache:
+            new_len = len(inv) * 1.2
+            trace.note('Resizing inventory cache to %s', new_len)
+            _entry_cache.resize(new_len)
         return inv
 
     def _check_revisions(self, inv):

=== modified file 'bzrlib/xml8.py'
--- a/bzrlib/xml8.py	2008-04-24 07:22:53 +0000
+++ b/bzrlib/xml8.py	2008-12-09 04:26:46 +0000
@@ -21,7 +21,9 @@
     cache_utf8,
     errors,
     inventory,
+    lru_cache,
     revision as _mod_revision,
+    trace,
     )
 from bzrlib.xml_serializer import SubElement, Element, Serializer
 from bzrlib.inventory import ROOT_ID, Inventory, InventoryEntry
@@ -38,6 +40,8 @@
     "<":"&lt;",
     ">":"&gt;",
     }
+# A cache of InventoryEntry objects
+_entry_cache = lru_cache.LRUCache(10*1024)
 
 
 def _ensure_utf8_re():
@@ -352,44 +356,65 @@
         for e in elt:
             ie = self._unpack_entry(e)
             inv.add(ie)
+        if len(inv) > _entry_cache._max_cache:
+            new_len = len(inv) * 1.2
+            trace.note('Resizing inventory cache to %s', new_len)
+            _entry_cache.resize(new_len)
         return inv
 
-    def _unpack_entry(self, elt):
+    def _unpack_entry(self, elt, _entry_cache=_entry_cache,
+                      _raw_cache=_entry_cache._cache):
+        elt_get = elt.get
+        file_id = elt_get('file_id')
+        revision = elt_get('revision')
+        # Check and see if we have already unpacked this exact entry
+        key = (file_id, revision)
+        try:
+            # Using the raw cache (basically FIFO instead of LRU) saves 30s
+            raw_ie = _raw_cache[key]
+        except KeyError:
+            pass
+        else:
+            # calling .copy() only on directorie saves 15s
+            if raw_ie.kind == 'directory':
+                return raw_ie.copy()
+            return raw_ie
+
         kind = elt.tag
         if not InventoryEntry.versionable_kind(kind):
             raise AssertionError('unsupported entry kind %s' % kind)
 
         get_cached = _get_utf8_or_ascii
 
-        parent_id = elt.get('parent_id')
+        file_id = get_cached(file_id)
+        if revision is not None:
+            revision = get_cached(revision)
+        parent_id = elt_get('parent_id')
         if parent_id is not None:
             parent_id = get_cached(parent_id)
-        file_id = get_cached(elt.get('file_id'))
 
         if kind == 'directory':
             ie = inventory.InventoryDirectory(file_id,
-                                              elt.get('name'),
+                                              elt_get('name'),
                                               parent_id)
         elif kind == 'file':
             ie = inventory.InventoryFile(file_id,
-                                         elt.get('name'),
+                                         elt_get('name'),
                                          parent_id)
-            ie.text_sha1 = elt.get('text_sha1')
-            if elt.get('executable') == 'yes':
+            ie.text_sha1 = elt_get('text_sha1')
+            if elt_get('executable') == 'yes':
                 ie.executable = True
-            v = elt.get('text_size')
+            v = elt_get('text_size')
             ie.text_size = v and int(v)
         elif kind == 'symlink':
             ie = inventory.InventoryLink(file_id,
-                                         elt.get('name'),
+                                         elt_get('name'),
                                          parent_id)
-            ie.symlink_target = elt.get('symlink_target')
+            ie.symlink_target = elt_get('symlink_target')
         else:
             raise errors.UnsupportedInventoryKind(kind)
-        revision = elt.get('revision')
-        if revision is not None:
-            revision = get_cached(revision)
         ie.revision = revision
+        _entry_cache[key] = ie
 
         return ie
 



More information about the bazaar-commits mailing list