Scripting / one liner help

Hal Burgiss hal at
Wed Aug 10 16:47:51 UTC 2011

On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at>wrote:

> I am looking for thoughts on how I might extract image names from an html
> document.
> The document started as a Word document with nothing but images, one per
> page, randomly named.  It was saved as html using libre office, so I now
> have the images separate.  I have a script that will process them through
> imagemagik to clean them up, reduce to from full color to b/w and make them
> into a pdf.  But the pages are out of order because the images are randomly
> named.
> What I'd like to do is have something read the html file in order and
> either feed the names of the JPGs to the script in order or just spit them
> out to a file that I can feed to the script.  The html source has all the
> images listed sequentially without line breaks.  Each tag is the same except
> for the image name and looks like this:
> <IMG SRC="source_html_m1463afff.**jpg" NAME="graphics3" ALIGN=BOTTOM
See if this gets close to extracting the image names ...

grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' |

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ubuntu-users mailing list