Scripting / one liner help

Wed Aug 10 18:42:36 UTC 2011

On 08/10/2011 09:47 AM, Hal Burgiss wrote:
> On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at comcast.net 
> <mailto:p.echols at comcast.net>> wrote:
>
>     I am looking for thoughts on how I might extract image names from
>     an html document.
>
>     The document started as a Word document with nothing but images,
>     one per page, randomly named.  It was saved as html using libre
>     office, so I now have the images separate.  I have a script that
>     will process them through imagemagik to clean them up, reduce to
>     from full color to b/w and make them into a pdf.  But the pages
>     are out of order because the images are randomly named.
>
>     What I'd like to do is have something read the html file in order
>     and either feed the names of the JPGs to the script in order or
>     just spit them out to a file that I can feed to the script.  The
>     html source has all the images listed sequentially without line
>     breaks.  Each tag is the same except for the image name and looks
>     like this:
>     <IMG SRC="source_html_m1463afff.jpg" NAME="graphics3" ALIGN=BOTTOM
>     WIDTH=575 HEIGHT=790 BORDER=0>
>
>
> See if this gets close to extracting the image names ...
>
> grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh
>
>

Thanks Hal,

my script starts with "for i in *jpg" and then works each file 
individually.  So I tried that line without the pipe to 
whatever_script.sh, hoping for a list of files to be output to the 
terminal.  That seemed to output the string of tags but without the 
double quotes around the image names.  Is that what it should have done?

Thanks