[MERGE] Better LRUCache implementation
John Arbash Meinel
john at arbash-meinel.com
Sun Mar 22 19:49:50 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
In some of my memory consumption and time profiling, I've seen LRUCache
actually being a bit excessive. I've been thinking about rewriting it in
pyrex, but I realized the biggest win is to just change the data
structure a bit. Instead of using a couple dictionaries, this changes it
to use a dictionary mapped to a double linked list (for cheap lookup to
any object, and cheap removal and inserting at the head of the list).
In my testing, it both significantly improves memory consumption (dicts
are a bit memory hungry, as is keeping a deque() of accesses). It is
also considerably faster.
To test this, I did "repo.revision_trees([all_of_bzrtools])" with
various objects as the _inventory_entry_cache. Here are the results:
$ FIFOCache()
WorkingSize 46848KB PeakWorking 46848KB
real 0m4.691s
$ dict()
WorkingSize 47608KB PeakWorking 47608KB
real 0m4.640s
$ old LRUCache
WorkingSize 53676KB PeakWorking 54164KB
real 0m5.402s
$ new LRUCache
WorkingSize 46952KB PeakWorking 47452KB
real 0m5.020s
The two big wins to notice are the total memory size, and that we save
400ms versus the previous implementation. Note that FIFOCache is just as
fast as dict, which is what it was designed for (FIFO is a subclass of
__dict__ to get the fasted __getitem__ time I could). The peak memory
size with a dict is bigger, because we never flush.
So it still costs ~300ms to use the LRUCache over using the FIFOCache,
so there is still room for improvement. If I was writing it in C/Pyrex,
I would combine the dict + linked list into a single data structure,
which would allow allocated all needed linked list items in a big slab,
instead of allocating/deallocating each time we add/remove an item. We
can still do that, but this took me 1-2hrs for a good improvement, and I
don't have to re-implement dict() :).
I'm not 100% sure where the size gains come from. But not maintaining a
large access queue probably helps, as does not having 3 dictionaries.
(key=>value, key=>cleanup, key=>refcount_in_queue).
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknGll4ACgkQJdeBCYSNAAOQ6ACg0LKcRRFfAhpxQemudLIfKrqP
/e8AoMASjj8y3Hp5fLoBuP0TLgr0Kut1
=jSbF
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lru_cache_linked_list.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090322/227bd67e/attachment-0001.diff
More information about the bazaar
mailing list