Off-Topic: Parse an html file and transfer the text found
NoOp
glgxg at sbcglobal.net
Wed Aug 6 16:16:04 UTC 2008
On 08/05/2008 07:35 PM, John Toliver wrote:
> I have a CD which came with a textbook I use for school. The CD is a
> list of commonly prescribed drugs. I am entering these drugs one by
> one into a database I've created. I thought about it and since the
> files are html files called in a number of ways via what looks like
> javascript, I was thinking that I could build a script using some
> language, maybe PERL or python and program it to parse the html file
> and transfer it to the my hsqldb, and place the information into the
> proper fields in the database.
>
> So my question to start is which language should I use to pull the
> data out of an html file? Is perl better for this application, or is
> python better or some other language?
>
> I'm probably going to need to brush up on my regular expressions for
> this but that's ok too.
>
> Any thoughts would be appreciated...
>
Have you tried opening the html files in Calc (OpenOffice.org)? Give it
a try; you may find that the files are structured sufficiently to parse
the drug names in an orderly fashion & then use that spreadsheet to
directly create a Base (OOo) database.
More information about the ubuntu-users
mailing list