Scripting / one liner help

Hal Burgiss hal at burgiss.net
Wed Aug 10 16:47:51 UTC 2011


On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at comcast.net>wrote:

> I am looking for thoughts on how I might extract image names from an html
> document.
>
> The document started as a Word document with nothing but images, one per
> page, randomly named.  It was saved as html using libre office, so I now
> have the images separate.  I have a script that will process them through
> imagemagik to clean them up, reduce to from full color to b/w and make them
> into a pdf.  But the pages are out of order because the images are randomly
> named.
>
> What I'd like to do is have something read the html file in order and
> either feed the names of the JPGs to the script in order or just spit them
> out to a file that I can feed to the script.  The html source has all the
> images listed sequentially without line breaks.  Each tag is the same except
> for the image name and looks like this:
> <IMG SRC="source_html_m1463afff.**jpg" NAME="graphics3" ALIGN=BOTTOM
> WIDTH=575 HEIGHT=790 BORDER=0>
>
>
See if this gets close to extracting the image names ...

grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh


-- 
Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20110810/54b81296/attachment.html>


More information about the ubuntu-users mailing list