[MERGE] Better LRUCache implementation

John Arbash Meinel john at arbash-meinel.com
Sun Mar 22 19:49:50 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In some of my memory consumption and time profiling, I've seen LRUCache
actually being a bit excessive. I've been thinking about rewriting it in
pyrex, but I realized the biggest win is to just change the data
structure a bit. Instead of using a couple dictionaries, this changes it
to use a dictionary mapped to a double linked list (for cheap lookup to
any object, and cheap removal and inserting at the head of the list).

In my testing, it both significantly improves memory consumption (dicts
are a bit memory hungry, as is keeping a deque() of accesses). It is
also considerably faster.

To test this, I did "repo.revision_trees([all_of_bzrtools])" with
various objects as the _inventory_entry_cache. Here are the results:

$ FIFOCache()
WorkingSize   46848KB   PeakWorking   46848KB
real    0m4.691s

$ dict()
WorkingSize   47608KB   PeakWorking   47608KB
real    0m4.640s

$ old LRUCache
WorkingSize   53676KB   PeakWorking   54164KB
real    0m5.402s

$ new LRUCache
WorkingSize   46952KB   PeakWorking   47452KB
real    0m5.020s


The two big wins to notice are the total memory size, and that we save
400ms versus the previous implementation. Note that FIFOCache is just as
fast as dict, which is what it was designed for (FIFO is a subclass of
__dict__ to get the fasted __getitem__ time I could). The peak memory
size with a dict is bigger, because we never flush.

So it still costs ~300ms to use the LRUCache over using the FIFOCache,
so there is still room for improvement. If I was writing it in C/Pyrex,
I would combine the dict + linked list into a single data structure,
which would allow allocated all needed linked list items in a big slab,
instead of allocating/deallocating each time we add/remove an item. We
can still do that, but this took me 1-2hrs for a good improvement, and I
don't have to re-implement dict() :).

I'm not 100% sure where the size gains come from. But not maintaining a
large access queue probably helps, as does not having 3 dictionaries.
(key=>value, key=>cleanup, key=>refcount_in_queue).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknGll4ACgkQJdeBCYSNAAOQ6ACg0LKcRRFfAhpxQemudLIfKrqP
/e8AoMASjj8y3Hp5fLoBuP0TLgr0Kut1
=jSbF
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lru_cache_linked_list.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090322/227bd67e/attachment-0001.diff 


More information about the bazaar mailing list