[storm] Limit cache size by memory usage?

Thu Jul 30 05:05:06 BST 2009

On Thu, Jul 30, 2009 at 1:29 AM, Gustavo Niemeyer<gustavo at niemeyer.net> wrote:
>> Generally, trying to manage memory yourself inside an application can
>> work against you. See the ArchitectNotes for Varnish for a general
>> overview of why and how:
>
> Yeah, I'm kind of concerned about it too, since there are so many
> details in the way (Python's internal memory buffers, C library
> allocation behavior, the application's use of objects, etc).
> Promising an auto-tweaking cache which behaves poorly would be worse
> than advising developers to tweak their cache size to fit their needs.

I agree that auto-tweaking caches that behave poorly are not a good
idea :-) I'm fishing for ideas on how to get one that doesn't behave
poorly, or at least better than my first experiment. It may be
impossible.

I do feel that if I had a half decent implementation and could tell a
program 'Use up to 2GB of RAM', that would be useful for almost all of
the code we deploy. Our current mechanism is to make an educated guess
as to the cache size and forget about it unless performance or RAM
consumption is an issue (multiple bits of code sharing a server means
swapping is bad). Even if went to the trouble of calculating the ideal
cache size, soon it would no longer be ideal as our data is constantly
changing - the optimal cache size for importing a pofile with small
translations is different to the optimal cache size for importing a
pofile with lengthy translations because the objects are radically
different in size.

>> For something like Storm though, I suspect having an externally
>> managed cache which has a proven track record would only do you good.
>> For this reason, I would suggest creating/investigating a
>> 'MemcachedCache' if such thing doesn't exist yet.
>
> That's something to be considered indeed, but let's please not mix
> these two conversations together.  The current cache mechanism
> prevents objects from being deallocated, effectively saving the cost
> of instantiation, and that's a very different scenario than preventing
> the database from being hit by caching data in a memory mapped
> database.

Yes - serializing and deserializing objects from memcache may not be
much faster than serializing and deserializing the object from the
relational database, and you have to pay the overhead storing objects
in the memcache cache even if it turns out you never need the object
again. Too a lesser extent, it is the same for having a huge cache and
letting the OS swap things to disk.

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/