[storm] Limit cache size by memory usage?

Wed Aug 5 08:30:12 BST 2009

On Wed, Aug 5, 2009 at 1:27 PM, James Henstridge<james at jamesh.id.au> wrote:
> On Thu, Jul 30, 2009 at 12:05 PM, Stuart
> Bishop<stuart.bishop at canonical.com> wrote:
>> On Thu, Jul 30, 2009 at 1:29 AM, Gustavo Niemeyer<gustavo at niemeyer.net> wrote:
>>>> Generally, trying to manage memory yourself inside an application can
>>>> work against you. See the ArchitectNotes for Varnish for a general
>>>> overview of why and how:
>>>
>>> Yeah, I'm kind of concerned about it too, since there are so many
>>> details in the way (Python's internal memory buffers, C library
>>> allocation behavior, the application's use of objects, etc).
>>> Promising an auto-tweaking cache which behaves poorly would be worse
>>> than advising developers to tweak their cache size to fit their needs.
>>
>> I agree that auto-tweaking caches that behave poorly are not a good
>> idea :-) I'm fishing for ideas on how to get one that doesn't behave
>> poorly, or at least better than my first experiment. It may be
>> impossible.
>
> Perhaps it would be possible to implement some kind of profiling cache
> implementation would be worthwhile then: something that would record
> cache hits and misses, and some utilities to analyse that data.

Hmm... yes. That might be a better approach.

>> I do feel that if I had a half decent implementation and could tell a
>> program 'Use up to 2GB of RAM', that would be useful for almost all of
>> the code we deploy. Our current mechanism is to make an educated guess
>> as to the cache size and forget about it unless performance or RAM
>> consumption is an issue (multiple bits of code sharing a server means
>> swapping is bad). Even if went to the trouble of calculating the ideal
>> cache size, soon it would no longer be ideal as our data is constantly
>> changing - the optimal cache size for importing a pofile with small
>> translations is different to the optimal cache size for importing a
>> pofile with lengthy translations because the objects are radically
>> different in size.
>
> Note that a larger cache isn't always going to be better: some
> operations scan over the list of alive objects (e.g ResultSet.set), so
> keeping more objects alive could slow those operations down.  If you
> end up caching objects that never get used again, then you'd probably
> be better off with a smaller cache.

I wasn't aware of that. That would certainly screw up using an external cache like memcached.

> And as I said in the previous email, a moderately complex Storm
> application will likely have many caches, so having each cache make
> decisions independently will likely give poor results (i.e. one cache
> ends up really large and the others really small).

Yes - it needs to be treated as a single big cache. My first attempt did this by bumping all caches (not just the one that triggered the event). A profiling approach would be able to profile the individual caches though, so a Store with a high hit rate gets a larger cache than the Store with a terrible hit rate.

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/storm/attachments/20090805/8eb7131d/attachment.pgp