Scripting / one liner help

Wed Aug 10 23:58:46 UTC 2011

On 08/10/2011 02:52 PM, Hal Burgiss wrote:
>
> On Wed, Aug 10, 2011 at 3:00 PM, Johnny Rosenberg 
> <gurus.knugum at gmail.com <mailto:gurus.knugum at gmail.com>> wrote:
>
>     2011/8/10 Hal Burgiss <hal at burgiss.net <mailto:hal at burgiss.net>>:
>     >
>     > See if this gets close to extracting the image names ...
>     > grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' | whatever_script.sh
>
>     I didn't create this thread, but can you please explain that sed
>     statement? I don't get it… (I'm not a beginner with regular
>     expressions but I'm definitely not an expert either…)
>
>
> Its attempting to capture the string in between:
>
>  SRC="  and the next doublequote: ".  The [^"] stops the capture at 
> the next double quote. The capture should then include any character 
> that is NOT a double quote. If not careful, the expression could get 
> "greedy" and start matching other double quotes on the same line. 
>  This should stop that effect. The \1 is a reference back to the 
> capture that is in the parenthesis, in sed syntax, which essentially 
> just preserves the captured characters, and ignores the rest. Does 
> that make sense?
>
> -- 
> Hal

Thanks for the explanation Hal, unfortunately it is not doing the 
"ignores the rest" part It appears that it finds each occurrance of a 
file name, then replaces it with the same occurrance, without the " marks.