[storm] How can I make my own "store"?

Tue Sep 29 01:39:26 BST 2009

On Fri, Sep 25, 2009 at 9:52 PM, Jason Baker <jbaker at zeomega.com> wrote:
> On Wed, Sep 23, 2009 at 10:09 AM, James Henstridge <james at jamesh.id.au>
> wrote:
>>
>> If you're instantiating a new object, then Storm will issue an INSERT to
>> add the new row.
>
> This is actually what we're trying to prevent.  We want to make an object
> that has the same interface as the store, but will save objects to a CSV
> file to be loaded by SQL*Loader later instead of having an INSERT statement
> generated.
> Perhaps some code snippets will help illustrate what we're trying to do
> (note that this is just the relevant parts of course):
> class BulkStore(object):
>     def add(self, entity, has_identity_property=True):
>         """
>         Add an object to this store.  Will take no action if the object has
>         been added since the last flush operation.
>         """
>         entity_store = Store.of(entity)
>         if entity_store is not None and entity_store is not self:
>             msg = '%s is part of another store.' % repr(entity)
>             raise WrongStoreError, msg
>         elif entity_store is None:
>             # <snip> - do whatever this store
>             # should do on adding an object
>             self.set_as_store(entity)
>     def set_as_store(self, entity):
>         """
>         Set this saver as the storm store.
>         """
>         obj_info = get_obj_info(entity)
>         obj_info['store'] = self
>     def add_flush_order(self, *args, **kwargs):
>         """This is a stub method that is for compatibility with the storm
>            store.  Currently, this does nothing."""
> The add_flush_order method is just there because it will be called
> automatically sometimes.  For the most part, we're ignoring it because we're
> determining our own flush ordering (although it would be nice if storm had a
> function that we can call to determine flush ordering the same way the store
> does).
> In my testing thus far, this approach seems to be working flawlessly.
>  However, I get the feeling that this is something that you guys hadn't
> planned for and may give problems at some point.

I wonder if you would be better off hooking in with a custom Database
class rather than a custom Store here?  Here is a sketch of a possible
implementation:

1. the Database implementation's constructor takes another Database
instance as an argument.
2. the Database.connect() implementation does the following:
 * call connect() on the wrapped Database instance to create a connection
 * create a custom Connection() instance, passing in that connection.
 * copy the compile and param_mark attributes from the wrapped
connection to the wrapper
 * return the wrapper connection.
3. In the wrapper connection class, override execute():
 * if the statement is an Insert() instance, save the data off to CSV.
 You'll probably also need to set statement.primary_variables to some
marker values you can detect in other Insert() statements, and won't
be valid primary keys in the DB.
 * if the statement is an Update() or Delete(), raise an exception
(unless you know some way to handle this).
 * otherwise, pass the call through to the wrapped connection.

If you create a store for one of these Database instances and then add
objects to it, you should be able to trap all the inserts.  I'm not
sure how you'd handle primary keys exactly, but I guess that would be
a problem if you were subclassing Store anyway, right?

James.