[ubuntu-florida] MySQL and REGEX

j.e.aneiros jesus.aneiros at gmail.com
Tue Sep 21 15:38:48 BST 2010


"A regular expression is a powerful way of specifying a pattern for a
complex search." (Excerpt from MySQL manual:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html)

The financial institutions uses letters to identify the months of the year
in the following way:

F - January
G - February
H - March
J - April
K - May
M - June
N - July
Q - August
U - September
V - October
X - November
Z - December

So you can have a date in the form F10 which identifies January 2010 or J11
which identifies April 2011.

Let say you have a prices table in a MySQL database and you want to check
that the values in the field contract in that table are correct according to
the following rules:

- The first character in the field should be a letter valid for a month (F -
January, G - February, etc).
- The second and third chars in the field should be a decimal digit (0..9).

You need to specify a query in SQL for MySQL that identifies all the records
that have a wrong contract according to the above rules.

There are several ways of doing this in MySQL, one could be a query like the
following:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q',
'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
OR substr(contract, 3, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

Other query could be:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q',
'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT BETWEEN 0 AND 9
OR substr(contract, 3, 1) NOT BETWEEN 0 AND 9;

As you can see the queries are long so here is where you can use the power
of REGEX (Regular Expressions) that MySQL implements as an extension to the
SQL language.

Using REGEX the query will be:

SELECT * FROM prices
WHERE contract NOT REGEXP '^[FGHJKMNQUVXZ][0-9]{2}';

The REGEX expression is negated (NOT REGEXP) so the query is going to return
values in the field contract of the prices table that don't have at the
beginning (the symbol ^) a letter F or G or H, etc. (the part which is
specified between square brackets, which is called a class) and 2 digits
(the class [0-9] with the number of occurrences, part {2}).

As you can see REGEX provides a concise and flexible way for matching
strings of text against a certain pattern.

Have a good day.

-- 
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/ubuntu-us-fl/attachments/20100921/3a21bc20/attachment.htm 


More information about the Ubuntu-us-fl mailing list