[storm] Return subselect results

James Henstridge james at jamesh.id.au
Fri Oct 1 04:59:34 BST 2010


On Mon, Sep 20, 2010 at 10:11 PM, Shaun McCance <shaunm at gnome.org> wrote:
> Hi folks,
>
> I'm trying to figure out how to return the result of a
> subselect in a ResultSet. The Storm tutorial shows how
> to use a subselect for an IS IN expression. I want to
> return the result of a SELECT COUNT(*). Here's the SQL:
>
> select Branch.ident,
>  (select count(*) from Branch as Translation
>    where Translation.type = "Translation"
>    and Translation.parent_ident = Branch.ident)
>  from Branch
>  join SetModule on SetModule.pred_ident = Branch.parent_ident
>  where SetModule.subj_ident = "/set/gnome-3-0-desktop"
>  and Branch.type = "Document";
>
> This SQL works fine, and returns the expected results.
> Here's what I naively expected would work in Storm:
>
> Translation = ClassAlias(Branch)
> tables = store.using(Branch,
>  Join(SetModule, SetModule.pred_ident == Branch.parent_ident))
> tables.find((Branch.ident,
>             Select(Count(Translation.ident),
>               where=And(Translation.parent_ident == Branch.ident,
>                         Translation.type == u'Translation'),
>               tables=Translation)),
>  SetModule.subj_ident == u'/set/gnome-3-0-desktop',
>  Branch.type == u'Document')
>
> But the resultant SQL looks like this:
>
> SELECT Branch.ident, SELECT COUNT(`_1`.ident) FROM ...
>
> It's very nearly correct, except the subselect needs
> to have parentheses around it. I also tried this, at
> the suggestion from somebody on IRC:
>
> tables.find((Branch.ident,
>             Count(Select(Translation.ident, ...)),
>  ...)
>
> The resultant SQL:
>
> SELECT Branch.ident, COUNT((SELECT `_1`.ident FROM ...
>
> This always returns nothing in Storm. If I run the SQL
> Storm generates myself in MySQL, I get this:
>
> ERROR 1242 (21000): Subquery returns more than 1 row
>
> The first approach is what I think ought to work, and
> it comes very close to working, in terms of the SQL it
> generates. (I have no idea how much more effort it is
> to figure out how to marshal that back into something
> Pythonic in Storm.)
>
> I found a similar problem on the mailing list in 2009.
> I have something that works, thanks to James:
>
> tables = store.using(Branch,
>  Join(SetModule, SetModule.pred_ident == Branch.parent_ident),
>  LeftJoin(Translation,
>    Translation.parent_ident = Branch.ident,
>    Translation.type == u'Translation'
>  ))
> tables.find((Branch.ident, Count(translation.ident)),
>  SetModule.subj_ident == u'/set/gnome-3-0-desktop',
>  Branch.type == u'Document'
> ).group_by(Branch.ident)
>
> Same results, but a very different approach in terms
> of the generated SQL. My tests against my database
> show this approach to be about 60% slower than the
> subselect approach.
>
> So what are my options? Is the subselect approach
> something that could work in Storm?

Sorry for not getting round to replying to this.  This looks like a
bug in Storm's expression compiler.  It looks like it needs to adjust
the precedence level when compiling the column list for the Select
object so that parentheses are added.

Try modifying compile_select() in storm/expr.py and add
"state.precedence += 0.5" on the line before it compiles
select.columns, and then reduce it again afterwards.

James.



More information about the storm mailing list