[storm] Type checking

Stuart Bishop stuart at stuartbishop.net
Tue Feb 17 09:11:32 GMT 2009


On Tue, Feb 17, 2009 at 2:18 AM, Jamu Kakar <jkakar at kakar.ca> wrote:
> Hi,
>
> On Mon, Feb 16, 2009 at 7:03 PM, Ben Wilber <benwilber at gmail.com> wrote:
>> This can, however, get overly complicated with complex types like
>> datetimes and whatnot, but for something reasonably simple, like
>> Unicode or Int, I feel defining the type once, in the model
>> definition, and then trapping on impossible conversion exceptions
>> should be sufficient.  But you're right that this is something pretty
>> easy to do outside Storm.  In fact, I've started subclassing the Storm
>> types to do the conversion as was suggested earlier, which is a pretty
>> simple fix.
>
> I'm on the "the default behaviour should not include automatic
> coercion" side of the fence; however, I do think that many people
> want what you want.  I wonder about providing this as an optional
> feature of the column types.  For example, we could have a coerce
> keyword:
>
> id = Int(allow_none=False, coerce=True)
>
> Another option would be to provide a global setting (or maybe even
> per Store), something like:
>
> storm.set_automatic_coercion(True)
>
> or
>
> store.set_automatic_coercion(True)
>
> I think of those options, I like the coerce keyword option the best.
> What do you think?

In most cases, coercion is just digging your hole deeper. If you have
wired up the wrong form validation function to the wrong database
column, I'd like to know now rather than when it ends up on production
triggering exceptions from real world input. I can't imagine a useful
case where I would want this, and the previous arguments just don't
make sense to me. Coercion is sweeping potential bugs under the carpet
(and even Int -> Unicode conversion can lead to data loss, such as
loss of leading zeros, and needs to be trapped), and in many
environments would lead to an increase in LOC since you have so many
more cases to test. Even if a database column is being used to store
multiple datatypes (eg. a key/type/value table), you still need to be
explicit about what values can be stored because you need to ensure
nothing gets stored that your application doesn't know how to cope
with - you just end up with 'Dear <CustomerName at 0x123456>' or 'Dear
False' because Storm happily coerced stuff to a Unicode string for
you, or code that explodes on production because sometimes the browser
sends data in an unexpected character set, and your byte string ->
Unicode conversion fails or worse you end up storing corrupt data.
There would be more work involved proving that there are no data loss
bugs introduced by the automatic coercion than just being explicit.

A coerce argument might be useful in some cases, but I tend to think
it should be a callable to actually do the coercion rather than an
everything/nothing toggle. You could use it to convert byte strings to
Unicode for instance if you know what the encoding will be or know you
have to use a heuristic to guess.

I do think Storm should remain Pythonic.

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/



More information about the storm mailing list