Question about wget
Bill
beau at billbeau.net
Mon Oct 25 19:44:22 UTC 2010
On 10/25/2010 9:01 AM, Preston Hagar wrote:
> On Sat, Oct 23, 2010 at 8:53 PM, Anthony Papillion<papillion at gmail.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hello Everyone,
>>
>> I know this doesn't specifically have to do with Ubuntu but I hope
>> someone here can offer their Kung-foo to help me solve this.
>>
>> I need to grab all images from a specific category on Craigslist (not
>> used for spam but rather research). The category is located at
>>
>> http://tulsa.craigslist.com/m4w
>>
>> When I browse the category with my browser, there are images there. Yet
>> when I issue the command
>>
>> wget -r -l2 --no-parent -A.jpg http://tulsa.craigslist.com/m4w
>>
>> The result is what looks like the entire folder structure of the site
>> but all in empty folders.
>>
>> Am I doing something wrong?
>> Can anyone offer help on how I might structure this?
>>
>> Thanks,
>> Anthony
>
> It doesn't work because Craig's List uses Javascript to load their
> images with CSS, so there aren't "normal" image tags. Wget just sees
> the text-only, non-css version of the site and doesn't even see that
> there are image tags in there. Take of the -A.jpg and look at the
> file downloaded and you will see what I am talking about.
>
> That said, even if this is to be used "for research", you are
> violating Section 7.u of Craig's List terms of service:
>
> http://www.craigslist.org/about/terms.of.use
>
> and could owe them $3000 USD for each day you use your script per
> Section 19.f of their Terms of Service.
>
> I am not a lawyer though nor am I in anyway affiliated with Craigslist.
>
> Preston
>
Why wouldn't you just use the URL for the images like
http://images.craigslist.org/*.jpg
More information about the ubuntu-users
mailing list