How to do a bulk download of password-protected data?
Peter Flynn
peter at silmaril.ie
Thu May 10 21:36:01 UTC 2018
On 10/05/18 17:27, Kevin O'Gorman wrote:
> I've had to do this manually many times. If it weren't for the
> password-protection, it would be easy with wget or rsync. As it is, I
> have to log in to whatever site contains the data with my web browser,
> then click on each and every one of what is sometimes over one hundred
> different links.
>
> I understand that wget has some capability of sending cookies, and that
> that's how password-related credentials are presented, but I do not know
> how to set it up. Are there any howto's youtubes, or the like about this?
wget can send a web username/password in the URL in the normal way. It
shouldn't need to do anything with cookies.
wget http://username:password@site.com/dir/file
So start at the top of your site with whatever index page it uses, and
then use the tools of your choice to extract all the link filenames as a
loop and use wget to get them.
wget -O - http://username:password@site.com/dir |\
tidy5 -i -n -c -asxml - |\
lxprintf -e a "%s\n" . | while read file; do \
wget http://username:password@site.com/dir/$file; \
done
Depending on how the names are exposed you may need to do more massaging
to get filenames in a format you can append to a URI.
tidy5 is the HTML5 version of HTML Tidy, from W3C
lxprintf is part of the LTxml2 toolkit from Edinburgh
///Peter
More information about the ubuntu-users
mailing list