[storm] This is a noob question: storing lists
chandramouli s
naruvimama at gmail.com
Fri May 8 10:13:14 BST 2009
Yes you are right a relational db is an overkill, I am working on
scientific data and am more or less playing around with the data ( so
no fixed schema). The right way to do this would be to use pytables
(much better than flatfiles, and super fast and supports large amount
of data) but there always remains the ease at which java people could
browse and query my data and Sqlite seems to be the best option. But
does anyone know of a way I could generate schema on the fly (like the
A,B,C... column names in excel). I do not require relational features
(no update, no deletes) just quick querys to find abnormalities in the
data and also the redundancy in the main part(id, name) is
insignificant compared to the 400 or so fields.
1 John 400 ...features
2 mary 400 ... features
3 greg 400 ... features
Infact even the names are insignificant I could replace it by a key,
just to group all the features belonging to one person.
BTW I am making it for a machine learning project ...
Thank you
Chandramouli
On Fri, May 8, 2009 at 6:52 AM, Gerdus van Zyl <gerdusvanzyl at gmail.com> wrote:
> Well if you want to share the db with java people python pickling is
> out of the question and a list would be problematic since it's not a
> native sqlite data type. I would suggest the two tables approach as
> suggested:
> main->pri_id,name
> marks->pri_id,mark (400 of these per main)
>
> Also maybe a relational database in this case might be overkill but
> that depends on if you want to query the data(no query=no database)
> and how large the data will be(large=db), if it will be updated(db
> easier), etc.
>
> ~G
>
> On Thu, May 7, 2009 at 8:18 PM, chandramouli s <naruvimama at gmail.com> wrote:
>> Yes what I was looking for is a way to store the values(400 fields) in
>> 400 columns with just an ID column to identify the record. In the
>> absence of a convenient method to do that (automatically generate 400
>> column names) I decided that storing it as a list would be convenient.
>> Yes pickling also might work, but ideally I would like to generate 400
>> columns or store it as a list. I intend to share the data with people
>> coding in Java but a sqlite db seemed more convenient than a flatfile
>> (to explore the data or to verify parts of it or correct it).
>
More information about the storm
mailing list