Scripting / one liner help [solved]

Thu Aug 11 03:45:45 UTC 2011

On 08/10/2011 06:44 PM, Jordon Bedwell wrote:
> On Wed, August 10, 2011 5:06 pm, Patton Echols wrote:
>> On 08/10/2011 03:43 PM, Jordon Bedwell wrote:
>>> On Wed, August 10, 2011 2:52 pm, Hal Burgiss wrote:
>>>> Its attempting to capture the string in between:
>>>>
>>>> SRC="  and the next doublequote: ".  The [^"] stops the capture at the
>>>> double quote. The capture should then include any character that is NOT
>>>> a
>>>> double quote. If not careful, the expression could get "greedy" and
>>>> start
>>>> matching other double quotes on the same line.  This should stop that
>>>> effect. The \1 is a reference back to the capture that is in the
>>>> parenthesis, in sed syntax, which essentially just preserves the
>>>> captured
>>>> characters, and ignores the rest. Does that make sense?
>>> Because it should be:
>>>
>>> grep -iPo "<img[^>]+>" file.html | \
>>> sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI'
>>>
>>> [COPY AND PASTE BOTH LINES AT ONCE AND PRESS THE ENTER KEY]
>> Thanks, that works great and solves the immediate problem.  For purposes
>> of my CLE (continuing linux education) I hope you will indulge me in the
>> same question you posed to Hal.  How's it work?  I get the -io grep
>> tags.  The -P enables perl regex?  What part of the grep string is the
>> perl part.
> BRE: grep -io "<img[^>]\+>" index.html. I chose Perl syntax by habit, not
> by need. So to answer your question the "+", for this, Perl and ERE are
> the same. It won't be till later when you start doing some hardcore
> regexps you see the differ between ERE and Perl and others.
>
>> Then I also wonder how the sed statement works.  I am still trying to
>> figure sed (and plain old regex) out.
> \'' is a bash escape for ' so you should read it without \''. It's a BRE
> so think \( is ( in ERE or Perl syntax. /g tells it to do it globally, not
> only act on the first instance it finds and exit and /I tells it to ignore
> the case. \1 (\n) is a backreference which is should have been one of the
> first things you learnt about Regexp's.
>
> Now on to the rest of it:
> sed 's/<img src=['\''"]\([^"'\'']*\).*/\1/gI
> sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI
>
> At this point, for you, these two are the same and a preference by choice,
> the latter being of my own preference the former being chosen by whoever
> likes it.  They both do the same thing right now for you on your usage.
> In later applications where more advanced things happen you will start to
> notice the differences.  To elaborate this:
>
> *IF index.html was a FULL HTML page*
> *THEN: sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI' 1.html>  1.txt
> *IS:* image.jpg [Assuming<img />  is on it's own line with no wrappers]
> *AND:* sed 's/<img src=['\''"]\([^"'\'']*\).*/\1/Ig' 1.html>  1.txt
> *IS:* the same index.html page with those changes done in place.
>
> Since I'm horrible at teaching, in other words the first with -n /p will
> only show the backreferences in that example and the second will replace
> those lines in the file leaving everything else intact.  Do them both on
> your file with>  filename.txt and you will see what I mean instantly.
>
> Somebody else might be better at explaining, I am a doer and and outputter
> not really a teacher, I can show you how to do a lot but when it comes to
> explaining how I did it you're barking up the wrong tree because to me it
> comes out as pro English, to you it comes out as jibberish.  To me it
> comes out as this is how it's done and to you it comes out as "what the
> hell did he just say? he pretty much just said by voice the command and
> gave no explanation of what it does"<<<  Plenty have said that one to me.
>
>
This is great.  Thanks so much for taking the time.  True, it is a 
little opaque to me right now, but I also know how I learn.  And there 
is enough there that I can figure it out with some work.  That's how I 
really learn best.  So I thank you.

--PE