[storm] proposal for improvement

Mon Jun 8 15:36:41 BST 2009

Hello,

While I understand the difficulties of duplicating the schema code in
SQL, in the Storm objects and even sometime in the zope.schemas
validators, I see great difficulties in implementing the SQL Data
Definition Language (alter, create, drop...) directly in Storm.

An implementation of the SQL DDL is useful at generation but not very
useful if not also functional and rock solid for schema upgrades.
Database migration through schema upgrades, though less frequent than
data manipulation, is the second top scenario for database use cases.
In an enterprise context, it is usually a critical scenario with an
on-call DBA at night.

Compared to the data manipulation language that Storm covers well
despite the differences over the vendor implementation, I suspect that
defining the subset of features available from several vendors is more
difficult in the case of the DDL. For instance, just for MySQL the
subset of online and offline ALTER operations depends on the storage
backends. Asking Storm to handle such diversity is a big workload.

The distinction between online and offline operations is not
applicable to the data manipulation language (select, insert, ...)
where every operation is online. Offline operations changes the rules
of the game in an enterprise context since it incurs an interruption
of service which needs to be planned and communicated over different
teams, department or even companies. Operations with the DDL are less
standard, and usually have more impact.

I feel that it is possible to implement a simple module that does the
basic table generation but which will come short to many enterprise
expectations which were part of the design of Storm. Whenever one need
performances, features such as indexes and partitioning are needed.
Shouldn't a serious python DDL module handle that too? consistently
over database vendors? The core developers will need help.

Also, how to describe the operation of migration, with no duplication of codes:
1. either both Storm objects (the old and the new) are available to
the module which understands the delta and constructs the alteration
commands
2. or the new objects are compared against the existing database schema
3. or new objects comes with a high level python scripts which handles
the delta of the schema (there is a duplication of code in this case
but it is easier to control)

In case Storm or an independent module implements schema generation
features, shouldn't it address the third source of duplication? the
validation of the input. In this case, I feel the ideal solution would
be to describe the object with the zope.schemas which offers powerful
constraints and invariants, and which would use Storm as a backend. At
this point I would like the SQL code needed for the table generation
or the database upgrade to be generated from the abstract syntact tree
of the zope schemas.

Do you have a simpler solution?

Regards, Jean Daniel Browne

References :

Cubicweb is a semantic web framework which offers python modules
handling schema generation and schema upgrade (they have scripting
functions to handle the schema). It is a recent project and they
haven't tested it against a dbms other than postgreSQL. They generate
the schema directly from the OWL format.
http://www.cubicweb.org/doc/en/A020-tutorial.en.html#define-your-data-schema

Sqlparse is a non-validating SQL parser module:
http://pypi.python.org/pypi/sqlparse/0.1.1

DdlUtils is a small, easy-to-use component for working with Database
Definition (DDL) files: http://db.apache.org/

The page on DDL http://en.wikipedia.org/wiki/Alter_(SQL)

The doc for the alter command in MySQL:
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

The doc for the alter command in PostrgeSQL:
http://www.postgresql.org/docs/8.3/static/sql-altertable.html