Question about wget

Preston Hagar prestonh at gmail.com
Mon Oct 25 16:01:31 UTC 2010


On Sat, Oct 23, 2010 at 8:53 PM, Anthony Papillion <papillion at gmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Everyone,
>
> I know this doesn't specifically have to do with Ubuntu but I hope
> someone here can offer their Kung-foo to help me solve this.
>
> I need to grab all images from a specific category on Craigslist (not
> used for spam but rather research). The category is located at
>
> http://tulsa.craigslist.com/m4w
>
> When I browse the category with my browser, there are images there. Yet
> when I issue the command
>
> wget -r -l2 --no-parent -A.jpg http://tulsa.craigslist.com/m4w
>
> The result is what looks like the entire folder structure of the site
> but all in empty folders.
>
> Am I doing something wrong?
> Can anyone offer help on how I might structure this?
>
> Thanks,
> Anthony


It doesn't work because Craig's List uses Javascript to load their
images with CSS, so there aren't "normal" image tags.  Wget just sees
the text-only, non-css version of the site and doesn't even see that
there are image tags in there.  Take of the -A.jpg and look at the
file downloaded and you will see what I am talking about.

That said, even if this is to be used "for research", you are
violating Section 7.u of Craig's List terms of service:

http://www.craigslist.org/about/terms.of.use

and could owe them $3000 USD for each day you use your script per
Section 19.f of their Terms of Service.

I am not a lawyer though nor am I in anyway affiliated with Craigslist.

Preston




More information about the ubuntu-users mailing list