Scripting / one liner help [solved]
p.echols at comcast.net
Thu Aug 11 03:45:45 UTC 2011
On 08/10/2011 06:44 PM, Jordon Bedwell wrote:
> On Wed, August 10, 2011 5:06 pm, Patton Echols wrote:
>> On 08/10/2011 03:43 PM, Jordon Bedwell wrote:
>>> On Wed, August 10, 2011 2:52 pm, Hal Burgiss wrote:
>>>> Its attempting to capture the string in between:
>>>> SRC=" and the next doublequote: ". The [^"] stops the capture at the
>>>> double quote. The capture should then include any character that is NOT
>>>> double quote. If not careful, the expression could get "greedy" and
>>>> matching other double quotes on the same line. This should stop that
>>>> effect. The \1 is a reference back to the capture that is in the
>>>> parenthesis, in sed syntax, which essentially just preserves the
>>>> characters, and ignores the rest. Does that make sense?
>>> Because it should be:
>>> grep -iPo "<img[^>]+>" file.html | \
>>> sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI'
>>> [COPY AND PASTE BOTH LINES AT ONCE AND PRESS THE ENTER KEY]
>> Thanks, that works great and solves the immediate problem. For purposes
>> of my CLE (continuing linux education) I hope you will indulge me in the
>> same question you posed to Hal. How's it work? I get the -io grep
>> tags. The -P enables perl regex? What part of the grep string is the
>> perl part.
> BRE: grep -io "<img[^>]\+>" index.html. I chose Perl syntax by habit, not
> by need. So to answer your question the "+", for this, Perl and ERE are
> the same. It won't be till later when you start doing some hardcore
> regexps you see the differ between ERE and Perl and others.
>> Then I also wonder how the sed statement works. I am still trying to
>> figure sed (and plain old regex) out.
> \'' is a bash escape for ' so you should read it without \''. It's a BRE
> so think \( is ( in ERE or Perl syntax. /g tells it to do it globally, not
> only act on the first instance it finds and exit and /I tells it to ignore
> the case. \1 (\n) is a backreference which is should have been one of the
> first things you learnt about Regexp's.
> Now on to the rest of it:
> sed 's/<img src=['\''"]\([^"'\'']*\).*/\1/gI
> sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI
> At this point, for you, these two are the same and a preference by choice,
> the latter being of my own preference the former being chosen by whoever
> likes it. They both do the same thing right now for you on your usage.
> In later applications where more advanced things happen you will start to
> notice the differences. To elaborate this:
> *IF index.html was a FULL HTML page*
> *THEN: sed -n 's/<img src=['\''"]\([^"'\'']*\).*/\1/pgI' 1.html> 1.txt
> *IS:* image.jpg [Assuming<img /> is on it's own line with no wrappers]
> *AND:* sed 's/<img src=['\''"]\([^"'\'']*\).*/\1/Ig' 1.html> 1.txt
> *IS:* the same index.html page with those changes done in place.
> Since I'm horrible at teaching, in other words the first with -n /p will
> only show the backreferences in that example and the second will replace
> those lines in the file leaving everything else intact. Do them both on
> your file with> filename.txt and you will see what I mean instantly.
> Somebody else might be better at explaining, I am a doer and and outputter
> not really a teacher, I can show you how to do a lot but when it comes to
> explaining how I did it you're barking up the wrong tree because to me it
> comes out as pro English, to you it comes out as jibberish. To me it
> comes out as this is how it's done and to you it comes out as "what the
> hell did he just say? he pretty much just said by voice the command and
> gave no explanation of what it does"<<< Plenty have said that one to me.
This is great. Thanks so much for taking the time. True, it is a
little opaque to me right now, but I also know how I learn. And there
is enough there that I can figure it out with some work. That's how I
really learn best. So I thank you.
More information about the ubuntu-users