Scripting / one liner help
p.echols at comcast.net
Wed Aug 10 23:58:46 UTC 2011
On 08/10/2011 02:52 PM, Hal Burgiss wrote:
> On Wed, Aug 10, 2011 at 3:00 PM, Johnny Rosenberg
> <gurus.knugum at gmail.com <mailto:gurus.knugum at gmail.com>> wrote:
> 2011/8/10 Hal Burgiss <hal at burgiss.net <mailto:hal at burgiss.net>>:
> > See if this gets close to extracting the image names ...
> > grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh
> I didn't create this thread, but can you please explain that sed
> statement? I don't get it… (I'm not a beginner with regular
> expressions but I'm definitely not an expert either…)
> Its attempting to capture the string in between:
> SRC=" and the next doublequote: ". The [^"] stops the capture at
> the next double quote. The capture should then include any character
> that is NOT a double quote. If not careful, the expression could get
> "greedy" and start matching other double quotes on the same line.
> This should stop that effect. The \1 is a reference back to the
> capture that is in the parenthesis, in sed syntax, which essentially
> just preserves the captured characters, and ignores the rest. Does
> that make sense?
Thanks for the explanation Hal, unfortunately it is not doing the
"ignores the rest" part It appears that it finds each occurrance of a
file name, then replaces it with the same occurrance, without the " marks.
More information about the ubuntu-users