Thoughts regular expressions in, for example sed

Paul Smith paul at mad-scientist.net
Sat Jan 12 16:04:50 UTC 2013


On Sat, 2013-01-12 at 15:05 +0100, Johnny Rosenberg wrote:
> Maybe it's time for ”the next version” (total remake) of the whole
> concept of regular expressions and give it a few decades.
> The old ”version” could then be referred to as ”regular exceptions”… 

There already is a de-facto standard advanced regular expression library
available today: PCRE which is the extraction of Perl's regular
expression implementation into a separate library.  These days pretty
much every new program that wants to support regular expressions uses
that.  It's far and away the most powerful and sophisticated RE
implementation we have.

But, there are so many hundreds of thousands of scripts out there that
rely on tools like awk, sed, grep, etc. to work the way they do that
there's literally zero chance that their default syntax will be changed
in non-compatible ways.  I'm not sure you grasp how devastating these
kinds of changes would be and how the widespread the impact would be.
There are sed and awk scripts that were written 20 years ago and haven't
been looked at since, still embedded in critical systems all over the
world.

Not to mention that GNU/Linux is not the only UNIX/POSIX system out
there and so adherence to the POSIX standards is critically important
for people who need to produce portable software--and many of those
people are exactly the same ones who would be making these changes.

So, no, not gonna happen.  The best you could get would be for everyone
to agree to _add_ PCRE (for example) to the RE-using tools, and have
some extra switch or environment variable that could be set to enable
them for those that wanted it.


I'll repeat something I said the other day: you should be learning a
more sophisticated scripting language like Perl.  Perl has all the
capabilities of shell, sed, awk, and just about any other UNIX tool
built in, plus capabilities none of the others can match.  And it has a
regular and consistent syntax, ability to handle unusual filenames
without glitching, etc.  And it is installed, and runs, everywhere.

I've been working on UNIX systems for... a really, really long time
(since before Linus sat down to write his first kernel).  My personal
rule is that if a shell script I'm writing ever gets longer than about
25 lines or so, OR if it ever seems necessary to use awk, I chuck it and
write it in Perl instead.  Sed is fine, but if I need awk I might as
well switch to Perl.





More information about the ubuntu-users mailing list