extract info form web pages

Derek Broughton news at pointerstop.ca
Thu Mar 22 12:21:53 UTC 2007


Dimitri Mallis wrote:

> no, its for a university 3rd year computer science project.
> i thought wget only downloads the whole website as in it makes a mirror of
> it on my hard drive which i dont want to do exactly, but ill man wget
> incase you are talking about somethings else.
> 
> i was hoping for some script were i could type the URL, the key words & it
> would extract information into a new page on my hard drive...

If it's well-formed XHTML (unlikely, few pages are even well-formed HTML)
then you can use XSLT or various XML parsers to extract the info.
-- 
derek





More information about the ubuntu-users mailing list