Scripting / one liner help
Johnny Rosenberg
gurus.knugum at gmail.com
Wed Aug 10 19:00:54 UTC 2011
2011/8/10 Hal Burgiss <hal at burgiss.net>:
> On Wed, Aug 10, 2011 at 12:29 PM, Patton Echols <p.echols at comcast.net>
> wrote:
>>
>> I am looking for thoughts on how I might extract image names from an html
>> document.
>>
>> The document started as a Word document with nothing but images, one per
>> page, randomly named. It was saved as html using libre office, so I now
>> have the images separate. I have a script that will process them through
>> imagemagik to clean them up, reduce to from full color to b/w and make them
>> into a pdf. But the pages are out of order because the images are randomly
>> named.
>>
>> What I'd like to do is have something read the html file in order and
>> either feed the names of the JPGs to the script in order or just spit them
>> out to a file that I can feed to the script. The html source has all the
>> images listed sequentially without line breaks. Each tag is the same except
>> for the image name and looks like this:
>> <IMG SRC="source_html_m1463afff.jpg" NAME="graphics3" ALIGN=BOTTOM
>> WIDTH=575 HEIGHT=790 BORDER=0>
>>
>
> See if this gets close to extracting the image names ...
> grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh
I didn't create this thread, but can you please explain that sed
statement? I don't get it… (I'm not a beginner with regular
expressions but I'm definitely not an expert either…)
Kind regards
Johnny Rosenberg
ジョニー・ローゼンバーグ
>
> --
> Hal
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
>
More information about the ubuntu-users
mailing list