Scripting / one liner help
Patton Echols
p.echols at comcast.net
Wed Aug 10 16:29:59 UTC 2011
I am looking for thoughts on how I might extract image names from an
html document.
The document started as a Word document with nothing but images, one per
page, randomly named. It was saved as html using libre office, so I now
have the images separate. I have a script that will process them
through imagemagik to clean them up, reduce to from full color to b/w
and make them into a pdf. But the pages are out of order because the
images are randomly named.
What I'd like to do is have something read the html file in order and
either feed the names of the JPGs to the script in order or just spit
them out to a file that I can feed to the script. The html source has
all the images listed sequentially without line breaks. Each tag is the
same except for the image name and looks like this:
<IMG SRC="source_html_m1463afff.jpg" NAME="graphics3" ALIGN=BOTTOM
WIDTH=575 HEIGHT=790 BORDER=0>
Thanks for any thoughts.
More information about the ubuntu-users
mailing list