[storm] Performance Problems using List and Pickle properties

Dan Halbert halbert at halwitz.org
Tue Oct 27 15:09:17 GMT 2009


I'd like to follow up on a string of messages earlier this month between Jürgen Kartnaller and Thomas Hervé about a performance problem with MutableValueVariables.
See https://lists.ubuntu.com/archives/storm/2009-October/001173.html .

I tracked down a very similar problem yesterday, and then discovered the email string today. However, Thomas' patch (http://bazaar.launchpad.net/~therve/storm/mutable-variables-flush-leak/revision/330) didn't work for me. My similar test program is below.

In my test, my instrumented get_state() is called an increasing number of times. I think this is due to previously fetched Storm objects with a List() remaining in the cache. I see this both with and without the patch (applied to 0.15).  I never saw an "object-deleted" event go by, even after adding a "del" as indicated below.

Should I be doing something else, or is this a slightly different case of the problem?

My thought after studying this (but before seeing the emails) is that I might like Storm to provide Tuple and TupleVariable classes in addition to List and ListVariable. Since the Tuple would be immutable, it would not have to be checked for changes when a flush() occurs.

Thanks,
Dan

#--------------------------------------------------------------------

from storm.locals import *
from storm.variables import ListVariable

class T1(Storm):
    __storm_table__ = 't1'
    id = Int(primary=True)
    id_in_array = List(default_factory=list)

SIZE = 20

def fill(store):
    store.execute("create temporary table t1 (id INTEGER PRIMARY KEY, id_in_array INTEGER[])")
    for i in xrange(SIZE):
        # Fill the table without using any Storm objects
        store.execute("INSERT INTO t1 values(?, ?)", (i, [i]))
    store.commit()

def fetch(store):
    for i in xrange(SIZE):
        t1 = store.find(T1, T1.id == i).one()     # each query is different
        print t1.id, t1.id_in_array
        #del t1  # does not help

if __name__ == '__main__':
    # Instrument ListVariable.get_state
    def wrap(f):
        def instrumented_get_state(self):
            print self._value,
            return f(self)
        return instrumented_get_state

    ListVariable.get_state = wrap(ListVariable.get_state)
    
    db = create_database('postgres://someuser@localhost/somedb')
    store = Store(db)
    fill(store)    # Set up for the test.
    fetch(store)









More information about the storm mailing list