[MERGE] Better LRUCache implementation

John Arbash Meinel john at arbash-meinel.com
Tue Mar 24 14:28:41 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John Arbash Meinel wrote:
> 
>> I realized that my submission still has some "assert" statement, which
>> I'll certainly clean out before I submit.
> 
> Please. There's also an "import pdb; ..." in _remove_node which needs to go.
> I'd be tempted to remove the asserts altogether in _record_access as that
> code path ought to be hit often, but I'll let you make the call on that.

Yeah, I did.

> 
> bb:tweak
> 

...

> 
>> -        # This approximates that texts are > 0.5k in size. It only really
>> -        # effects when we clean up the queue, so we don't want it to be too
>> -        # large.
>>          self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
>>          LRUCache.__init__(self, max_cache=max(int(max_size/512), 1))
> 
> This int(max_size/512) magic appears twice in the code and it's a complete
> mystery to me. I gather that comment was trying to explain it in one of
> those places. For the sake of code clarity, I'd appreciate it if that magic
> was moved into a method that explained the basis for the heuristic.
> 

So it actually doesn't matter anymore, because it isn't used. Originally
 we had a 'deque()' that tracked the number of accesses, which would
then be 'flushed' periodically. (based approximately on max_size*4). I
think I can actually set it to 0 now, I'll check on that.

>>          value_len = self._compute_size(value)
>>          if value_len >= self._after_cleanup_size:
>> +            if node is not None:
>> +                # The new value is too large, we won't be replacing the value,
>> +                # just removing the node completely
>> +                self._remove_node(node)
>>              return
> 
> Hmm. I was uncomfortable about this at first but I think your logic here
> is sound. I wonder if it's worthy of a mutter() so we get a clue that the
> cache size needs consideration/tuning?

I went ahead and added it. I also realized that we need to call
cleanup() if it exists at this point. (The use case for cleanup is stuff
like Branch.unlock(), and I think it is best to assume that if you have
called add() the cleanup will always fire.)

> 
>> -        self.assertEqual([1, 2, 3, 4, 5, 2, 5, 3, 2], list(cache._queue))
>> -        self.assertEqual({1:1, 2:3, 3:2, 4:1, 5:2}, cache._refcount)
>> -
>> -        # Compacting should save the last position
>> -        cache._compact_queue()
>> -        self.assertEqual([1, 4, 5, 3, 2], list(cache._queue))
>> -        self.assertEqual({1:1, 2:1, 3:1, 4:1, 5:1}, cache._refcount)
>> +        self.assertEqual([2, 3, 5, 4, 1], [n.key for n in cache._walk_lru()])
>>  
> 
> I'm happy with your changes to the tests but test_compact_preserves_last_access_order()
> needs renaming if the test is no longer testing that. :-)
> 
> Otherwise, well done on this.
> 
> Ian C.
> 

Thanks,
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknI7hkACgkQJdeBCYSNAAP5OACfVtt43jCm8pYACmZT/nnjZ8Rg
72MAnjycVJYjoKsPJGv7IurSWK39brf7
=etgc
-----END PGP SIGNATURE-----



More information about the bazaar mailing list