Off-Topic: Parse an html file and transfer the text found

John Toliver john.toliver at gmail.com
Thu Aug 7 02:17:07 UTC 2008


On Wed, Aug 6, 2008 at 19:37, Derek Broughton <news at pointerstop.ca> wrote:
> NoOp wrote:
>> Have you tried opening the html files in Calc (OpenOffice.org)? Give it
>> a try; you may find that the files are structured sufficiently to parse
>> the drug names in an orderly fashion & then use that spreadsheet to
>> directly create a Base (OOo) database.
I will try the calc idea first just because it's intriguing.

Really the pure data I can see with the help of some syntax
highlighting so the actual text is there.  It's in an html file.  I
was just thinking that I could create some combination of regular
expresion trained to look at the brackets used by the javascript or
html, and use that to "see" the text and then store it in a neat
position in a file to be grabbed later.

As for religious wars, hey, I like a good constructive argument as
much as any the last thing I had in mind was ticking off coders from
opposing camps over javascript vs. perl.  Honestly I think the better
you understand languages in general, some languages just are better
for certain things while others are better for others, they are all
'tools' aren't they?  I'll admit I'd love to figure out one language
that does it all but I don't think that exists..... anyway....

I'll let you know what I come up with in trying to open it in calc,
and then afterwards, perhaps a NICE PEACEFUL consideration of the pros
/ cons of using one over the other.


Thanks all for the responses.  The group is like a swiss army knife
for getting help with different problems.

As for top posting, sorry.  Gmail put's replies on top and sometime I
forget to move it before I send.
-- 

I've discovered the key to success is to never give up. You either
learn the right way, or you run out of ways to do it wrong. A win/win
situation!




More information about the ubuntu-users mailing list