extract info form web pages
Derek Broughton
news at pointerstop.ca
Thu Mar 22 12:21:53 UTC 2007
Dimitri Mallis wrote:
> no, its for a university 3rd year computer science project.
> i thought wget only downloads the whole website as in it makes a mirror of
> it on my hard drive which i dont want to do exactly, but ill man wget
> incase you are talking about somethings else.
>
> i was hoping for some script were i could type the URL, the key words & it
> would extract information into a new page on my hard drive...
If it's well-formed XHTML (unlikely, few pages are even well-formed HTML)
then you can use XSLT or various XML parsers to extract the info.
--
derek
More information about the ubuntu-users
mailing list