[storm] This is a noob question: storing lists

Gerdus van Zyl gerdusvanzyl at gmail.com
Fri May 8 10:30:00 BST 2009


Well for generating a schema automagically I normally just use Mako
templates (http://www.makotemplates.org/) to generate the python code.
Then you can use for loops, etc.

And again querying a 400 column table is going to be difficult. What
kind of values are the 400 data points? If they are 400 different
measurements of different things then the 400 columns are unavoidable,
otherwise not. Maybe go ask one of the Java users how they would
prefer the data.

~G

On Fri, May 8, 2009 at 11:13 AM, chandramouli s <naruvimama at gmail.com> wrote:
> Yes you are right a relational db is an overkill, I am working on
> scientific data and am more or less playing around with the data ( so
> no fixed schema). The right way to do this would be to use pytables
> (much better than flatfiles, and super fast and supports large amount
> of data) but there always remains the ease at which java people could
> browse and query my data and Sqlite seems to be the best option. But
> does anyone know of a way I could generate schema on the fly (like the
> A,B,C... column names in excel). I do not require relational features
> (no update, no deletes) just quick querys to find abnormalities in the
> data and also the redundancy in the main part(id, name) is
> insignificant compared to the 400 or so fields.
>
> 1 John 400 ...features
> 2 mary 400 ... features
> 3 greg 400 ... features
>
> Infact even the names are insignificant I could replace it by a key,
> just to group all the features belonging to one person.
> BTW I am making it for a machine learning project ...
>
> Thank you
> Chandramouli
>
> On Fri, May 8, 2009 at 6:52 AM, Gerdus van Zyl <gerdusvanzyl at gmail.com> wrote:
>> Well if you want to share the db with java people python pickling is
>> out of the question and a list would be problematic since it's not a
>> native sqlite data type. I would suggest the two tables approach as
>> suggested:
>> main->pri_id,name
>> marks->pri_id,mark (400 of these per main)
>>
>> Also maybe a relational database in this case might be overkill but
>> that depends on if you want to query the data(no query=no database)
>> and how large the data will be(large=db), if it will be updated(db
>> easier), etc.
>>
>> ~G



More information about the storm mailing list