How to do a bulk download of password-protected data?

Peter Flynn peter at silmaril.ie
Thu May 10 21:36:01 UTC 2018


On 10/05/18 17:27, Kevin O'Gorman wrote:
> I've had to do this manually many times.  If it weren't for the 
> password-protection, it would be easy with wget or rsync.  As it is, I 
> have to log in to whatever site contains the data with my web browser, 
> then click on each and every one of what is sometimes over one hundred 
> different links.
> 
> I understand that wget has some capability of sending cookies, and that 
> that's how password-related credentials are presented, but I do not know 
> how to set it up.  Are there any howto's youtubes, or the like about this?

wget can send a web username/password in the URL in the normal way. It 
shouldn't need to do anything with cookies.

      wget http://username:password@site.com/dir/file

So start at the top of your site with whatever index page it uses, and 
then use the tools of your choice to extract all the link filenames as a 
loop and use wget to get them.

      wget -O - http://username:password@site.com/dir |\
          tidy5 -i -n -c -asxml - |\
          lxprintf -e a "%s\n" . | while read file; do \
              wget http://username:password@site.com/dir/$file; \
          done

Depending on how the names are exposed you may need to do more massaging 
to get filenames in a format you can append to a URI.

tidy5 is the HTML5 version of HTML Tidy, from W3C
lxprintf is part of the LTxml2 toolkit from Edinburgh

///Peter




More information about the ubuntu-users mailing list