Bash =~ operator and tab characters

Johnny Rosenberg gurus.knugum at gmail.com
Sun Apr 24 13:08:55 UTC 2016


2016-04-23 1:38 GMT+02:00 MR ZenWiz <mrzenwiz at gmail.com>:

> On Fri, Apr 22, 2016 at 2:55 PM, Johnny Rosenberg
> <gurus.knugum at gmail.com> wrote:
> > Hi. I'm kind of struggling with this one.
> >
> > I need to check a lot of strings for the following:
> > TAB ( <at least 4 and not more than 10 numbers or minuses> ) TAB
> >
> > I tried a lot of variations, I don't even remember them all, but here's
> one
> > of them:
> > if [[ ${Rows[$i]} =~ \t[0-9\-]{4,}\t ]]; then
> > # Do something
> > fi
> >
> > But so far with no success.
> > It SEEMS like the \t doesn't mean the TAB character, but that's not much
> > more than a guess.
> > What I'm looking for in those strings are a date in parentheses,
> preceded by
> > a TAB and superseded by a TAB. If I now let ↹ represent the TAB character
> > and … represent any characters, here are the possible strings I am
> trying to
> > look for with that if statement:
> > …↹(2016)↹…
> > …↹(2016-04)↹…
> > …↹(2016-04-22)↹…
> >
> > Any thoughts?
> > I have done some searching but I always end up with too many hits, and so
> > far all of those I looked further into were totally irrelevant, as far
> as I
> > could see.
> >
>
> Some thoughts...
>
> I'm not clear if the date you're looking for is in parentheses (as in
> your example) or not - your regex doesn't include them, and they'd
> need to be escaped for the shell to recognize them properly.
>

Sorry, my fault. No, the parentheses should be stripped out in an earlier
step that I didn't mention here, so no parentheses. My bad.
So, possible strings:
…↹2016↹…
…↹2016-04↹…
…↹2016-04-22↹…

Terribly sorry about this.


>
> Could you use a word search (I'm not sure if that's a =~ option, but
> \< \> works in many places to offset the words).
>
> Any special reason not to use [] (test) or grep or egrep.  In most
> cases, I have found them easier to use (and maintain) than regex's.
>

Well, nothing more than I'm used to [[ and that I see people all the time
recommending them instead of […
And even if there are other ways to do it, I'm exploring this way right now
and I hate to give up just because I can solve it with a completely
different method, even if I actually will use another method in the end…
Right now I'm a bit puzzled and I just want to straighten all the question
marks out, so let's just pretend that I have a valid reason to do it this
way only… (c;


>
> Have you tried using ^v<tab> to insert the actual tab character in the
> pattern?  (You'd probably have to quote the pattern this way...)
>

Actually I didn't. I will, though…


>
> Have you quoted the pattern in any way?
>

Yes, and I tried to escape different characters, such as { and more.
If I use regex with sed, I have the choice to use or not to use the -r
option, so in that case I know what I should escape, depending on the -r
option, if it's there or not. With the =~ operator however, I don't know if
I should write the regex like I do with sed with or without the -r option…
So am I supposed to use extended regex with the =~ operator, or just basic
regex? Or something else? '\t' works with sed -r anyway.


By the way, when using sed with the -e option, like sed -r -e
's/something/something else/' -e 's/something/something else/' -e
's/something/something else/', does the -r go for all three expressions or
just the first one? Seems like it goes for the whole line but I'm not
completely sure… I know, totally off topic, this question just popped into
my head right now, I'm not sure why…


>
> Side issue - if you only want 10, you might want to add that as an
> upper bound on the {} portion of the pattern.
>

Yes, sorry for that too. I'm not sure why I didn't include the 10. I think
I was working on two similar issues at the same time and mixed them up a
little…
This is what I meant:
if [[ ${Rows[$i]} =~ \t[0-9\-]{4,10}\t ]]; then
# Do something
fi



OK, now I tried to type the actual TAB characters into the pattern. First I
quoted the pattern with double quotes, but it didn't work. Maybe it looks
for actual quote characters in my string or something, I don't know.
Next I removed the quotes and that was it! It works now!

if [[ ${Rows[$i]} =~ [0-9\-]{4,10} ]]; then
# Do something
fi

Just to be clear, because it doesn't show well above, if I represent space
with _ and tab with ⇥ (↹ looks too busy, I just realised…), the first line
looks like this:
if_[[_${Rows[$i]}_=~_⇥[0-9\-]{4,10}⇥_]]; then

However, I think I will actually try to rewrite my script from scratch in a
completely different way, making the above unnecessary and hopefully the
script much shorter, but that's another story.

At least I learned something from all this, and I like that.

Thanks!


Kind regards

Johnny Rosenberg



> HTH.
> MR
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20160424/0edda0cf/attachment.html>


More information about the ubuntu-users mailing list