Scripting / one liner help

Patton Echols p.echols at comcast.net
Wed Aug 10 16:29:59 UTC 2011


I am looking for thoughts on how I might extract image names from an 
html document.

The document started as a Word document with nothing but images, one per 
page, randomly named.  It was saved as html using libre office, so I now 
have the images separate.  I have a script that will process them 
through imagemagik to clean them up, reduce to from full color to b/w 
and make them into a pdf.  But the pages are out of order because the 
images are randomly named.

What I'd like to do is have something read the html file in order and 
either feed the names of the JPGs to the script in order or just spit 
them out to a file that I can feed to the script.  The html source has 
all the images listed sequentially without line breaks.  Each tag is the 
same except for the image name and looks like this:
<IMG SRC="source_html_m1463afff.jpg" NAME="graphics3" ALIGN=BOTTOM 
WIDTH=575 HEIGHT=790 BORDER=0>

Thanks for any thoughts.




More information about the ubuntu-users mailing list