[storm] Issue with find(), group by, and Single aggregates error

Wed Nov 10 11:16:56 GMT 2010

Hi Tom

Thanks for the reply. I had done some more poking around and had come to
a similar conclusion. What I am doing now is defining a subclass of
ResultSet and overriding the __len__ to do something similar to what you
suggest. One issue I have come up against is that for this to work, I
have to set the _result_set_factory of the store instance to be my
subclass. Pseudo code:

store = getStoretoUse()
store._result_set_factory = MyDerivedResultSet
store.find(xxxx)

There's another reason why I need to use a derived ResultSet (to do with
transforming the loaded data into objects) but that's beyond the scope
of this email.

The above is the only way I could find to get the storm infrastructure
to use a user specified subclass of ResultSet when performing a find().
It works but the store is a singleton and so I have to remember to reset
the _result_set_factory variable back to the default storm ResultSet
when I am done. There's a couple of ways to do this: use a try/finally
or a context manager etc. This strategy is of course not thread safe but
I think everything is single threaded so it works. Plus there's still
places in the the Store implementation which calls ResultSet() directly
so any _result_set_factory override will be ignored. For my case, I am
using the store to only perform queries to populate a view so it all works.

What would perhaps be better is to allow the user to specify a ResultSet
implementation to be used whenever find() is called rather than using an
instance variable on the store object. The user specified ResultSet
would be tied to the specific find operation, not the Store instance.

Thoughts? Are there any alternatives to what I am currently doing? Is my
idea to specify a ResultSet implementation using something other than a
Store instance variable valid? I'm only new to storm so I could be
missing something obvious :-)

On 10/11/10 18:02, Thomas Hervé wrote:
> On 09/11/2010 23:39, Ian Booth wrote:
> 
>> result = store.using(xxxx).find(yyyy).group_by(zzzz)
>> return result
>>
>> This however causes an error:
>>
>> storm FeatureError: Single aggregates aren't supported after a GROUP BY
>> clause
>>
>> So it seems that using a view with navigation links on the table results
>> in an attempt to call count() on the resultset from the find() and this
>> is failing. Tracing the sql and executing this sql directly against the
>> db works just fine and produces the right data. But when using storm to
>> try and do the same thing it doesn't work.
> 
> Hi Ian,
> 
> The problem is that we can't guess which query to run when calling 
> count. In the normal scenario, you have a query like that:
> 
> SELECT a, b FROM c WHERE x;
> 
> Storm generates:
> 
> SELECT COUNT(*) FROM c WHERE x;
> 
> But when a GROUP BY is involved, this mechanism doesn't work anymore. 
> What Storm could do is:
> 
> SELECT COUNT(*) FROM (SELECT a, b FROM c GROUP BY a, b) AS tmp_
> 
> It's not the default behavior for now. You may be able to simulate it 
> using the get_select_expr method of ResultSet, and manually building a 
> count.
>