how to find out dead links
Derek Broughton
derek at pointerstop.ca
Mon Nov 16 02:35:35 UTC 2009
Loïc Grenié wrote:
> 2009/11/14 Derek Broughton <derek at pointerstop.ca>:
>> Loïc Grenié wrote:
>>
>>> 2009/11/14 Eugeneapolinary Ju <eugeneapolinary81 at yahoo.com>:
>>>> wget -r -p -U Firefox "http://www.somesite.com/" 2>&1 | grep 404 >
>>>> 404.txt
>>>>
>>>>
>>>> why come 404.txt is 0 Byte? how to put the STDOUT to a file with wget?
>>>
>>> Have you tried
>>>
>>> wget -r -p -U Firefox "http://www.somesite.com/"
>>>
>>> There is no 404 message (at least here). To be more precise, there is
>>> no 404 message because there is no web server that can output the
>>> 404 message. A web page can fail for (at least) three different reasons:
>>
>> I imagine that "somesite.com" was an example, likely because his actual
>> site isn't accessible to the Internet.
>>
>> The real problem is:
>>
>> $ wget http://localhost/test.htm
>> --2009-11-14 10:43:23-- http://localhost/test.htm
>> Resolving localhost... 127.0.0.1, ::1
>> Connecting to localhost|127.0.0.1|:80... connected.
>> HTTP request sent, awaiting response... 404 Not Found
>> 2009-11-14 10:43:23 ERROR 404: Not Found.
>>
>>
>> In this case, 404 is ONLY a status, and not a page.
>
> Of course, but the status is delivered by a web server.
> We'll need a better understanding of what the first user
> wants: detect non-existing sites or non-existing pages
> on an existing site (or both).
Why does it matter? My point is that if the site exists, you _still_ won't
get a page. So you need to be checking the server responses, not the
contents of a page.
--
derek
More information about the ubuntu-users
mailing list