Scripting / one liner help

Wed Aug 10 19:00:54 UTC 2011

2011/8/10 Hal Burgiss <hal at burgiss.net>:
> On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at comcast.net>
> wrote:
>>
>> I am looking for thoughts on how I might extract image names from an html
>> document.
>>
>> The document started as a Word document with nothing but images, one per
>> page, randomly named.  It was saved as html using libre office, so I now
>> have the images separate.  I have a script that will process them through
>> imagemagik to clean them up, reduce to from full color to b/w and make them
>> into a pdf.  But the pages are out of order because the images are randomly
>> named.
>>
>> What I'd like to do is have something read the html file in order and
>> either feed the names of the JPGs to the script in order or just spit them
>> out to a file that I can feed to the script.  The html source has all the
>> images listed sequentially without line breaks.  Each tag is the same except
>> for the image name and looks like this:
>> <IMG SRC="source_html_m1463afff.jpg" NAME="graphics3" ALIGN=BOTTOM
>> WIDTH=575 HEIGHT=790 BORDER=0>
>>
>
> See if this gets close to extracting the image names ...
> grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh

I didn't create this thread, but can you please explain that sed
statement? I don't get it… (I'm not a beginner with regular
expressions but I'm definitely not an expert either…)

Kind regards

Johnny Rosenberg
ジョニー・ローゼンバーグ

>
> --
> Hal
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
>