[storm] some questions about Storm (from the perspective of Grok)

Mon Mar 17 04:34:22 GMT 2008

On 16/03/2008, Martijn Faassen <faassen at startifact.com> wrote:
> Stuart Bishop wrote:
>  [snip]
>
> > An application that doen't need to scale to vast amounts of users or massive
>  > quantities of data is a Noddy application by my definition.
>
> I don't know what "Noddy" means, so you can define it as anything you
>  like. :) I take it you don't want Storm to be a fit for Noddy applications?

Noddy is a character from a series of children's books.  Stuart is
most likely using the word to indicate simple/trivial applications (in
the sense that the schema is so simple it never needs changing, or
that the data stored in the DB isn't that important).

This is orthogonal to providing a simple API: It should be possible to
provide a simple API that will allow the schema to evolve along with
your application.

My understanding of Grok is that you take the Zope framework, and
makes good default decisions for many of the options the framework
provides.  Those decisions should be in line with the best practices
for developing Zope apps, so when a developer hits Grok's limits they
don't have to rewrite half their app.

So while I agree that it is good to provide a simple way for Grok apps
to manage an RDBMS schema, it'd be nice if they don't have to rip out
and replace that infrastructure as soon as they have data they care
about and need to change the schema.

>  > That doesn't
>  > mean they are not useful (and this is why I don't use the real world term
>  > here). You don't care about the scalability issues with generated schemas
>  > because scalability isn't an issue. You don't care about integrating with
>  > existing data sources because there are none. You don't care about
>  > maintainability or upgrades because data model required by the application
>  > is so simple it doesn't matter.
>
>
> Quibble: I can see a schema generation system care about maintainability
>  and upgrades, such as apparently RoR is able to do. Just not in the same
>  way as you'd do with a Vast Application.
>
>
>  > Just don't try to retrofit scalability later.
>
>
> Actually, it may still be worthwhile to do this in some cases. Time to
>  market matters, and apart from that, many developers are encouraged when
>  they get something running quickly, even if it's a bit dirty. Such an
>  application may still grow into something scalable later on. The
>  transition will in many cases be painful, but it might nonetheless be
>  worthwhile to go this route for a project.
>
>  [snip expressing advanced features of relational database schemas are
>  hard to express in a Python-driven schema]
>
>  I am not debating that Python-driven schema are what everybody should
>  use, or that they should in all circumstances. I accept the arguments in
>  favor of hand-written schemas. I don't think that schema generation is
>  the unadulterated uselessness/evil that I am picking up from the vibe here.

I agree that it isn't useful to require all schemas/migration scripts
to be pure SQL or pure Python.  I think there are some aspects that
are easier to express in Python and some easier to express in SQL.

>  >> Do you think that SQL *queries* created by tools such as Storm are
>  >> also so inferior to hand-written queries that they are only suitable
>  >> for toy applications? If not, why is query generation already there
>  >> while schema generation is not?
>  >
>  > The SQL queries generated by some ORMs are vastly inferior. Storm and other
>  > ORMs targetting non-Noddy applications are designed to allow formulation of
>  > complex queries with enough control to have them perform efficiently.
>
>
> I find the differences here fascinating. What makes generating queries,
>  with all their complexities, all right and useful but generating schemas
>  something that shouldn't be done? Is generating queries correctly that
>  much easier?
>
>
>  > And if
>  > the Python syntax turns out to not offer fine grained enough control or
>  > access to that proprietary feature you need to access, then you can fall
>  > back to using SQL fragments but still have the results retrieved into your
>  > object model.
>
>
> Yes, same argument applies to schema generation too, though.
>
>  [snip]
>
>  I think I'm dealing with a cultural issue here: it's an important part
>  of the Storm approach to want to write these schemas in SQL. There are
>  of course excellent reasons to do this in many circumstances. I do have
>  the feeling it's like arguing about threading to Twisted developers - it
>  might be instructive but it is not very useful. :)

Rather than focusing on generated schemas vs. the alternatives, lets
look at a concrete use case:

1. I am developing a database backed application.
2. I've released three versions of the application so far, each of
which contain code and database schema changes.
3. I've rolled out three instances of the application: one on version
1 of the code, one on version 2 and one on version 3.  Each instance
now contains data that I can't throw away.
4. I want to upgrade the first two instances to the latest version of the code.

My requirements for the upgrade are:
* Each instance of the application should end up with identical schemas.
* All data is preserved.
* The upgrade should be automated and able to detect what changes need
to be made to the database, rather than requiring the admin to figure
out what changes are needed.
* Preferably upgrading from v1 to v3 should not require me to install
v2 of the software as an intermediate step.

I believe that this use case will be relevant to the majority of
applications fairly early on in their lifetime.  Providing
infrastructure that drives people toward a non-upgradeable schema
seems like a disservice to your users.

James.