[storm] Cache for nonexistent result

Ivan Zakrevskyi ivan.zakrevskyi at rebelmouse.com
Fri Jan 16 16:55:20 UTC 2015


Hi, all. Thanks for answer. I'll try to explain.

Try to get existent object.

In [2]: store.get(StTwitterProfile, (1,3))
base.py:50 =>
u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id =
%s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 3)'
Out[2]: <users.orm.TwitterProfile at 0x7f1e93b6d450>

In [3]: store.get(StTwitterProfile, (1,3))
Out[3]: <users.orm.TwitterProfile at 0x7f1e93b6d450>

In [4]: store.get(StTwitterProfile, (1,3))
Out[4]: <users.orm.TwitterProfile at 0x7f1e93b6d450>

You can see, that storm made only one query.

Ok, now try get nonexistent twitter profile for given context:

In [5]: store.get(StTwitterProfile, (10,3))
base.py:50 =>
u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id =
%s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'

In [6]: store.get(StTwitterProfile, (10,3))
base.py:50 =>
u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id =
%s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'

In [7]: store.get(StTwitterProfile, (10,3))
base.py:50 =>
u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id =
%s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'

Storm sends a query to the database each time.

For example, we have a some util:

def myutil(user_id, *args, **kwargs):
    context_id =
get_context_from_mongodb_redis_memcache_environment_etc(user_id, *args,
**kwargs)
    twitter_profile = store.get(TwitterProfile, (context_id, user_id))
    return twitter_profile.some_attr

In this case, Storm will send a query to the database each time.

The similar situation for non-existent relation.

In [20]: u = store.get(StUser, 10)
base.py:50 =>
u'(0.001) SELECT ... FROM user WHERE user.id = %s LIMIT 1; args=(10,)'


In [22]: u.profile
base.py:50 =>
u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT
1; args=(10,)'

In [23]: u.profile
base.py:50 =>
u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT
1; args=(10,)'

In [24]: u.profile
base.py:50 =>
u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT
1; args=(10,)'

I've created a temporary patch, to reduce number of DB queries (see
bellow). But I am sure that a solution can be more elegant (on library
level).


class NonexistentCache(list):

    _size = 1000

    def add(self, val):
        if val in self:
            self.remove(val)
        self.insert(0, val)
        if len(self) > self._size:
            self.pop()


class Store(StoreOrig):

    def __init__(self, database, cache=None):
        StoreOrig.__init__(self, database, cache)
        self.nonexistent_cache = NonexistentCache()

    def get(self, cls, key, exists=False):
        """Get object of type cls with the given primary key from the
database.

        This method is patched to cache nonexistent values to reduce number
of DB-queries.
        If the object is alive the database won't be touched.

        @param cls: Class of the object to be retrieved.
        @param key: Primary key of object. May be a tuple for composed keys.

        @return: The object found with the given primary key, or None
            if no object is found.
        """

        if self._implicit_flush_block_count == 0:
            self.flush()

        if type(key) != tuple:
            key = (key,)

        cls_info = get_cls_info(cls)

        assert len(key) == len(cls_info.primary_key)

        primary_vars = []
        for column, variable in zip(cls_info.primary_key, key):
            if not isinstance(variable, Variable):
                variable = column.variable_factory(value=variable)
            primary_vars.append(variable)

        primary_values = tuple(var.get(to_db=True) for var in primary_vars)

        # Patched
        alive_key = (cls_info.cls, primary_values)
        obj_info = self._alive.get(alive_key)
        if obj_info is not None and not obj_info.get("invalidated"):
            return self._get_object(obj_info)

        if obj_info is None and not exists and alive_key in
self.nonexistent_cache:
            return None
        # End of patch

        where = compare_columns(cls_info.primary_key, primary_vars)

        select = Select(cls_info.columns, where,
                        default_tables=cls_info.table, limit=1)

        result = self._connection.execute(select)
        values = result.get_one()
        if values is None:
            # Patched
            self.nonexistent_cache.add(alive_key)
            # End of patch
            return None
        return self._load_object(cls_info, result, values)

    def get_multi(self, cls, keys, exists=False):
        """Get objects of type cls with the given primary key from the
database.

        If the object is alive the database won't be touched.

        @param cls: Class of the object to be retrieved.
        @param key: Collection of primary key of objects (that may be a
tuple for composed keys).

        @return: The object found with the given primary key, or None
            if no object is found.
        """
        result = {}
        missing = {}
        if self._implicit_flush_block_count == 0:
            self.flush()

        for key in keys:
            key_orig = key
            if type(key) != tuple:
                key = (key,)

            cls_info = get_cls_info(cls)

            assert len(key) == len(cls_info.primary_key)

            primary_vars = []
            for column, variable in zip(cls_info.primary_key, key):
                if not isinstance(variable, Variable):
                    variable = column.variable_factory(value=variable)
                primary_vars.append(variable)

            primary_values = tuple(var.get(to_db=True) for var in
primary_vars)

            alive_key = (cls_info.cls, primary_values)
            obj_info = self._alive.get(alive_key)
            if obj_info is not None and not obj_info.get("invalidated"):
                result[key_orig] = self._get_object(obj_info)
                continue

            if obj_info is None and not exists and alive_key in
self.nonexistent_cache:
                result[key_orig] = None
                continue

            missing[primary_values] = key_orig

        if not missing:
            return result

        wheres = []
        for i, column in enumerate(cls_info.primary_key):
            wheres.append(In(cls_info.primary_key[i], tuple(v[i] for v in
missing)))
        where = And(*wheres) if len(wheres) > 1 else wheres[0]

        for obj in self.find(cls, where):
            key_orig = missing.pop(tuple(var.get(to_db=True) for var in
get_obj_info(obj).get("primary_vars")))
            result[key_orig] = obj

        for primary_values, key_orig in missing.items():
            self.nonexistent_cache.add((cls, primary_values))
            result[key_orig] = None

        return result

    def reset(self):
        StoreOrig.reset(self)
        del self.nonexistent_cache[:]



2015-01-16 9:03 GMT+02:00 Free Ekanayaka <free at 64studio.com>:

> Hi Ivan
>
> On Thu, Jan 15, 2015 at 10:23 PM, Ivan Zakrevskyi <
> ivan.zakrevskyi at rebelmouse.com> wrote:
>
>> Hi all.
>>
>> Storm has excellent caching behavior, but stores in Store._alive only
>> existent objects. If object does not exists for some key, storm makes
>> DB-query again and again.
>>
>> Are you planning add caching for keys of nonexistent objects to prevent
>> DB-query?
>>
>
> If an object doesn't exist in the cache it meas that either it was not yet
> loaded at all,  or it was loaded but it's now mark as "invalidated" (for
> example the transaction in which it was loaded fresh has terminated).
>
> So I'm note sure what you mean in you question, but I don't think anything
> more that could be cached (in terms of key->object values).
>
> Cheers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/storm/attachments/20150116/47adb8cb/attachment.html>


More information about the storm mailing list