[storm] Issue with find(), group by, and Single aggregates error

Wed Nov 10 11:30:52 GMT 2010

On 10/11/2010 12:16, Ian Booth wrote:
> Hi Tom
>
> Thanks for the reply. I had done some more poking around and had come to
> a similar conclusion. What I am doing now is defining a subclass of
> ResultSet and overriding the __len__ to do something similar to what you
> suggest. One issue I have come up against is that for this to work, I
> have to set the _result_set_factory of the store instance to be my
> subclass. Pseudo code:
>
> store = getStoretoUse()
> store._result_set_factory = MyDerivedResultSet
> store.find(xxxx)
>
> There's another reason why I need to use a derived ResultSet (to do with
> transforming the loaded data into objects) but that's beyond the scope
> of this email.

Hum, I'd be interested to see that :).

> The above is the only way I could find to get the storm infrastructure
> to use a user specified subclass of ResultSet when performing a find().
> It works but the store is a singleton and so I have to remember to reset
> the _result_set_factory variable back to the default storm ResultSet
> when I am done. There's a couple of ways to do this: use a try/finally
> or a context manager etc. This strategy is of course not thread safe but
> I think everything is single threaded so it works.

Storm is not "thread-safe" in the sense that you can't use a Store from 
multiple threads anyway.

> Plus there's still
> places in the the Store implementation which calls ResultSet() directly
> so any _result_set_factory override will be ignored.

Hum, the only place I see is ResultSet._set_expr, which should probably 
use self.__class__ instead.

> For my case, I am
> using the store to only perform queries to populate a view so it all works.

Store._result_set_factory is definitely not a public interface, so I 
don't encourage you to do that. Of course, there is not better way for now.

> What would perhaps be better is to allow the user to specify a ResultSet
> implementation to be used whenever find() is called rather than using an
> instance variable on the store object. The user specified ResultSet
> would be tied to the specific find operation, not the Store instance.

Passing more arguments to find would be tricky from a compatibility 
point of view. Maybe we could create a context manager to do that, with 
a public API to customize _result_set_factory.

> Thoughts? Are there any alternatives to what I am currently doing? Is my
> idea to specify a ResultSet implementation using something other than a
> Store instance variable valid? I'm only new to storm so I could be
> missing something obvious :-)

Customizing ResultSet has never been something we though would be 
useful. For the problem with count and group by, it seems that it's a 
simple missing feature which should be fixed in Storm itself. I don't 
know about you other use case, so I don't want to dismiss it, but big 
applications were written without needing to customize ResultSet.

-- 
Thomas