[storm] why weakrefdict for cache?

Mon Sep 3 18:07:32 BST 2007

> I'll tell you why we currently dont have a strong-referencing "dirty"
> list...its because our session detects "dirty" changes at flush time. 
(...)
> comparison operation on datatypes that are known to be "mutable".   so
> even if we do reinstate the strong dirty list and the weakrefed identity
> map, that case would still remain as a caveat.

Ah, right.. it took me a while to get that stuff working as well.  The
way we implemented mutable types in Storm is by offering variables the
chance to register themselves in a "object-deleted" event, which gets
called when the object is being deallocated (we use the callback in the
weakref).  E.g. PickleVariable does the following:

    self.event.hook("object-deleted", self._detect_changes)

> well i think the "weak referencing" option is probably not widely used,
> people just know to expunge/clear objects from the session which they
> dont need.  we went with Hibernate's example in this area as "not that
> big a deal".

Gotcha.  Thanks for explaining.

> We never "gave up", as far as "caching" we've never "begun" that.  I
> dont really consider the Session's identity map to be much of a "cache";
> while we do use it as a cache in cases where we need to locate an object
> by primary key (such as lazy-loading a many-to-one attribute), i would
> consider a "more flexible" cache to be a second level cache which is a
(...)
> page caching or "sub-template" caching which is something Mako/Pylons
> supports.

Understood.  I have the same opinion.

> Our ORM's system of loading objects for a particular query still needs
> to store the full results of that query in a single in-memory
> collection; since we support queries which add left outer joins of
> additional objects to be loaded as part of a collection, we cant just
> load a row, create an instance for it, then throw it away; the next row
> might also represent the same instance which needs to be "uniqued"
(...)

Yeah, we have a similar feature in Storm, so I see what you mean.

> dont have any of these requirements.   Though as it turns out, DBAPIs
> like psycopg2 already buffer all the rows of a result set by default so
> theres a lot more "load it all into memory" going on than people might
> think anyway.

Right.. but won't the maximum number of items kept in memory be
cursor.arraysize, if you are using cursor.fetchmany?

> More commonly, people who are representing thousands of objects will
> only be displaying a subset of those on a single page, and only need to
> load a range of objects, and our "eager loading" does support the usage
> of LIMIT and OFFSET in such a way that you limit the "primary" entities
> but still get the full list of "collection entities" associated with
> them.  This is another area where we've looked at Hibernate, seen that
> theres no problem with their "non-streamed" approach, so for now its
> "good enough", with the door open to improve upon it if needed.

Understood. Thanks for the nice explanations.

-- 
Gustavo Niemeyer
http://niemeyer.net