[MERGE] Better LRUCache implementation
John Arbash Meinel
john at arbash-meinel.com
Sun Mar 22 22:27:52 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> In some of my memory consumption and time profiling, I've seen LRUCache
> actually being a bit excessive. I've been thinking about rewriting it in
> pyrex, but I realized the biggest win is to just change the data
> structure a bit. Instead of using a couple dictionaries, this changes it
> to use a dictionary mapped to a double linked list (for cheap lookup to
> any object, and cheap removal and inserting at the head of the list).
>
> In my testing, it both significantly improves memory consumption (dicts
> are a bit memory hungry, as is keeping a deque() of accesses). It is
> also considerably faster.
>
> To test this, I did "repo.revision_trees([all_of_bzrtools])" with
> various objects as the _inventory_entry_cache. Here are the results:
>
> $ FIFOCache()
> WorkingSize 46848KB PeakWorking 46848KB
> real 0m4.691s
>
> $ dict()
> WorkingSize 47608KB PeakWorking 47608KB
> real 0m4.640s
>
> $ old LRUCache
> WorkingSize 53676KB PeakWorking 54164KB
> real 0m5.402s
>
> $ new LRUCache
> WorkingSize 46952KB PeakWorking 47452KB
> real 0m5.020s
>
>
I realized that my submission still has some "assert" statement, which
I'll certainly clean out before I submit. I also wanted to point out
another data point. For doing the same "revision_trees(all_revs)" w/
bzr.dev:
$ FIFOCache()
WorkingSize 149236KB PeakWorking 375000KB
real 7m52.688s
$ new LRUCache()
WorkingSize 141972KB PeakWorking 373624KB
real 7m47.646s
So even though we know that LRUCache.__getitem__ is a bit slower than
FIFOCache.__getitem__, it seems the better cache coherency ends up
winning a tiny amount over the whole time. (If you don't remember,
because of how we access inventory items, FIFO tends to completely empty
itself and start over when it fills up, while LRU would continue to have
the most relevant items.)
Anyway, it is nice to see, because before, the LRUCache overhead would
sometimes make it slower than not having a cache at all... (It really
isn't *that* slow to deserialize XML inventories, we just need a way to
not have to do the entire tree every time.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknGu2cACgkQJdeBCYSNAAMvNACgnv9NPsC4IyJxzsL1G6wZ39CC
+8gAniUXd9j82mkVFu5BT4CDV8P8uBTN
=NZlF
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list