[storm] Limit cache size by memory usage?
James Henstridge
james at jamesh.id.au
Wed Jul 29 10:38:35 BST 2009
On Mon, Jul 27, 2009 at 6:31 PM, Stuart
Bishop<stuart.bishop at canonical.com> wrote:
> So at the moment, we have to guess our cache sizes pretty much by
> educated guesses and trial and error. Ideally I'd have a cache that
> tunes itself, adapting to my changing data. I was experimenting with
> adjusting the cache size based on memory usage. Unfortunately, it
> looks like Python or maybe the OS is not aggressive enough at freeing
> up RAM so the approach for my first experiment failed (I get spikes
> cache size growth followed by lengthy lulls sitting at the minimum
> size).
One thing that might be affecting things here is the pymalloc
allocator. For block sizes that it handles, allocations are made
within arenas. Those arenas can't be freed unless all objects
allocated inside them are freed (and for Python < 2.5, they are never
released).
So checking the overall process data size won't necessarily give you a
good idea of how much of that memory is in use.
> Can anyone think of any alternative approaches? I'm thinking my next
> attempt when I'm bored will be a GenerationalCache that keeps bumping
> up a global cache size when the cache is bumped until the memory limit
> is reached, at which point it is fixed for the life of the program.
> This should work for many applications, although will of course fail
> for applications that delay loading their large objects until later in
> the run time. I also thought of just growing the cache until I detect
> swapping, but when I have multiple scripts running on the same server
> that doesn't work.
If there was just one cache, then that might work okay. But for many
Storm based applications there will be multiple caches (one per
store). In such a case, you'd want to ensure that one store's cache
didn't expand to fill the memory limit and then leave no room for the
others.
James.
More information about the storm
mailing list