[MERGE] Better LRUCache implementation

Tue Mar 24 05:23:39 GMT 2009

John Arbash Meinel wrote:

> I realized that my submission still has some "assert" statement, which
> I'll certainly clean out before I submit.

Please. There's also an "import pdb; ..." in _remove_node which needs to go.
I'd be tempted to remove the asserts altogether in _record_access as that
code path ought to be hit often, but I'll let you make the call on that.

bb:tweak

Some other tweaks ...

>      def add(self, key, value, cleanup=None):
>          """Add a new value to the cache.
>  
> @@ -61,20 +137,29 @@
>                          'value' sohuld be cleaned up.
>          """

The docstrings for the add methods needs some tweaks:

1. "..., call cleanup, passing it the key and value ..."
2. s/sohuld/should/
3. It refers to the "queue" when "cache" would be better?

The docstring for LRUSizeCache() also needs a tweak: it says that
the values must support len() when in fact the compute_size parameter
can be used instead.

> -        # This approximates that texts are > 0.5k in size. It only really
> -        # effects when we clean up the queue, so we don't want it to be too
> -        # large.
>          self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
>          LRUCache.__init__(self, max_cache=max(int(max_size/512), 1))

This int(max_size/512) magic appears twice in the code and it's a complete
mystery to me. I gather that comment was trying to explain it in one of
those places. For the sake of code clarity, I'd appreciate it if that magic
was moved into a method that explained the basis for the heuristic.

>          value_len = self._compute_size(value)
>          if value_len >= self._after_cleanup_size:
> +            if node is not None:
> +                # The new value is too large, we won't be replacing the value,
> +                # just removing the node completely
> +                self._remove_node(node)
>              return

Hmm. I was uncomfortable about this at first but I think your logic here
is sound. I wonder if it's worthy of a mutter() so we get a clue that the
cache size needs consideration/tuning?

> -        self.assertEqual([1, 2, 3, 4, 5, 2, 5, 3, 2], list(cache._queue))
> -        self.assertEqual({1:1, 2:3, 3:2, 4:1, 5:2}, cache._refcount)
> -
> -        # Compacting should save the last position
> -        cache._compact_queue()
> -        self.assertEqual([1, 4, 5, 3, 2], list(cache._queue))
> -        self.assertEqual({1:1, 2:1, 3:1, 4:1, 5:1}, cache._refcount)
> +        self.assertEqual([2, 3, 5, 4, 1], [n.key for n in cache._walk_lru()])
>  

I'm happy with your changes to the tests but test_compact_preserves_last_access_order()
needs renaming if the test is no longer testing that. :-)

Otherwise, well done on this.

Ian C.