Scripting / one liner help
Patton Echols
p.echols at comcast.net
Thu Aug 11 21:05:25 UTC 2011
On 08/10/2011 06:30 PM, Hal Burgiss wrote:
> On Wed, Aug 10, 2011 at 7:58 PM, Patton Echols <p.echols at comcast.net
> <mailto:p.echols at comcast.net>> wrote:
>
> On 08/10/2011 02:52 PM, Hal Burgiss wrote:
>
>
> On Wed, Aug 10, 2011 at 3:00 PM, Johnny Rosenberg
> <gurus.knugum at gmail.com <mailto:gurus.knugum at gmail.com>
> <mailto:gurus.knugum at gmail.com
> <mailto:gurus.knugum at gmail.com>>> wrote:
>
> 2011/8/10 Hal Burgiss <hal at burgiss.net
> <mailto:hal at burgiss.net> <mailto:hal at burgiss.net
> <mailto:hal at burgiss.net>>>:
>
> >
> > See if this gets close to extracting the image names ...
> > grep SRC *html | sed -r 's/SRC="([^"]+)"/\1/ig' |
> whatever_script.sh
>
> I didn't create this thread, but can you please explain
> that sed
> statement? I don't get it… (I'm not a beginner with regular
> expressions but I'm definitely not an expert either…)
>
>
> Thanks for the explanation Hal, unfortunately it is not doing the
> "ignores the rest" part It appears that it finds each occurrance
> of a file name, then replaces it with the same occurrance, without
> the " marks.
>
>
> Sorry something got left out, try ...
>
> grep SRC *html | sed -r 's/.*SRC="([^"]+)".*/\1/ig'
>
>
> --
> Hal
As mentioned in another post to this thread, I have a working solution.
So this for info only.
The original source document has all the image tags on one line w/o
carriage return or newline. So the grep statement captures the whole
line. Then the modified sed statement outputs only the last image file
name.
Using the grep statement suggested by Johnny:
grep -io "<img[^>]\+>"
solves it because grep is spitting out each match, not the entire line.
As I mentioned to Johnny, even though I don't understand all of this, the discussion is helping me learn, so I greatly appreciate it.
--PE
More information about the ubuntu-users
mailing list