[MERGE] Better LRUCache implementation

John Arbash Meinel john at arbash-meinel.com
Sun Mar 22 22:27:52 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> In some of my memory consumption and time profiling, I've seen LRUCache
> actually being a bit excessive. I've been thinking about rewriting it in
> pyrex, but I realized the biggest win is to just change the data
> structure a bit. Instead of using a couple dictionaries, this changes it
> to use a dictionary mapped to a double linked list (for cheap lookup to
> any object, and cheap removal and inserting at the head of the list).
> 
> In my testing, it both significantly improves memory consumption (dicts
> are a bit memory hungry, as is keeping a deque() of accesses). It is
> also considerably faster.
> 
> To test this, I did "repo.revision_trees([all_of_bzrtools])" with
> various objects as the _inventory_entry_cache. Here are the results:
> 
> $ FIFOCache()
> WorkingSize   46848KB   PeakWorking   46848KB
> real    0m4.691s
> 
> $ dict()
> WorkingSize   47608KB   PeakWorking   47608KB
> real    0m4.640s
> 
> $ old LRUCache
> WorkingSize   53676KB   PeakWorking   54164KB
> real    0m5.402s
> 
> $ new LRUCache
> WorkingSize   46952KB   PeakWorking   47452KB
> real    0m5.020s
> 
> 

I realized that my submission still has some "assert" statement, which
I'll certainly clean out before I submit. I also wanted to point out
another data point. For doing the same "revision_trees(all_revs)" w/
bzr.dev:

$ FIFOCache()
WorkingSize  149236KB   PeakWorking  375000KB
real    7m52.688s

$ new LRUCache()
WorkingSize  141972KB   PeakWorking  373624KB
real    7m47.646s

So even though we know that LRUCache.__getitem__ is a bit slower than
FIFOCache.__getitem__, it seems the better cache coherency ends up
winning a tiny amount over the whole time. (If you don't remember,
because of how we access inventory items, FIFO tends to completely empty
itself and start over when it fills up, while LRU would continue to have
the most relevant items.)

Anyway, it is nice to see, because before, the LRUCache overhead would
sometimes make it slower than not having a cache at all... (It really
isn't *that* slow to deserialize XML inventories, we just need a way to
not have to do the entire tree every time.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknGu2cACgkQJdeBCYSNAAMvNACgnv9NPsC4IyJxzsL1G6wZ39CC
+8gAniUXd9j82mkVFu5BT4CDV8P8uBTN
=NZlF
-----END PGP SIGNATURE-----



More information about the bazaar mailing list