Off-Topic: Parse an html file and transfer the text found

NoOp glgxg at sbcglobal.net
Wed Aug 6 16:16:04 UTC 2008


On 08/05/2008 07:35 PM, John Toliver wrote:
> I have a CD which came with a textbook I use for school.  The CD is a
> list of commonly prescribed drugs.  I am entering these drugs one by
> one into a database I've created.  I thought about it and since the
> files are html files called in a number of ways via what looks like
> javascript, I was thinking that I could build a script using some
> language, maybe PERL or python and program it to parse the html file
> and transfer it to the my hsqldb, and place the information into the
> proper fields in the database.
> 
> So my question to start is which language should I use to pull the
> data out of an html file?  Is perl better for this application, or is
> python better or some other language?
> 
> I'm probably going to need to brush up on my regular expressions for
> this but that's ok too.
> 
> Any thoughts would be appreciated...
> 

Have you tried opening the html files in Calc (OpenOffice.org)? Give it
a try; you may find that the files are structured sufficiently to parse
the drug names in an orderly fashion & then use that spreadsheet to
directly create a Base (OOo) database.






More information about the ubuntu-users mailing list