Off-Topic: Parse an html file and transfer the text found
Derek Broughton
news at pointerstop.ca
Wed Aug 6 14:37:14 UTC 2008
John Toliver wrote:
> So my question to start is which language should I use to pull the
> data out of an html file? Is perl better for this application, or is
> python better or some other language?
Yes :-) Any language that has tools for parsing HTML that you're
comfortable with would be good. If the files are guaranteed valid XHTML,
you probably have even more choices probably, but certainly Perl or Python
should be fine, and I'd use Python.
>
> I'm probably going to need to brush up on my regular expressions for
> this but that's ok too.
That's why if they're XHTML, it's easier - because then the files should
parse with an XML parser and be really easy to extract the meaningful data
from.
--
derek
More information about the ubuntu-users
mailing list