Open Source For You — December 2017

(Steven Felgate) #1
http://www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 77

Insight Developers

String processing of regular expressions
In previous articles in this series we mostly dealt with
regular expressions that processed numbers. For a change,
in this article, we will look at some regular expressions to
process strings. Nowadays, computer science professionals
from India face difficulties in deciding whether to use
American English spelling or the British English spelling
while preparing technical documents. I always get
confused with colour/color, programme/program, centre/
center, pretence/pretense, etc. Let us look at a few simple
techniques to handle situations like this.
For example, the regular expression /colo(?:u)?r/ will match
both the spellings ‘color’ and ‘colour’. The question mark
symbol (? ) is used to denote zero or one occurrence of the
preceding group of characters. The notation (?:u) groups u with
the grouping operator ( ) and the notation ?: makes sure that the
matched substring is not stored into a memory unnecessarily.
So, here a match is obtained with and without the letter u.
What about the spellings ‘programme’ and ‘program’?
The regular expression /program(?:me)?/ will accept both
these spellings. The regular expression /cent(?:re|er)/ will
accept both the spellings, ‘center’ and ‘centre’. Here the pipe
symbol ( | ) is used as an alternation operator.
What about words like ‘biscuit’ and ‘cookie’? In British
English the word ‘biscuit’ is preferred over the word ‘cookie’
and the reverse is the case in American English. The regular
expression /(?:cookie|biscuit)/ will accept both the words —
‘cookie’ and ‘biscuit’. The regular expression /preten[cs]e/ will


match both the spellings, ‘pretence’ and ‘pretense’. Here the
character class operator [ ] is used in the regular expression
pattern to match either the letter c or the letter s.
I have only discussed specific solutions to the problems
mentioned here so as to make the regular expressions very
simple. But with the help of complicated regular expressions
it is possible to solve many of these problems in a more
general way rather than solving individual cases. As
mentioned earlier, C++ also uses ECMAScript style regular
expressions; so any regular expression pattern we have
developed in the article on regular expressions in C++ can be
used in JavaScript without making any modifications.
Just like the pattern followed in the previous articles in this
series, after a brief discussion on the specific programming
language, in this case, JavaScript, we moved on to the use of
the regular expression syntax in that language. This should be
enough for practitioners of JavaScript, who are willing to get
their hands dirty by practising with more regular expressions.
In the next part of this series on regular expressions, we will
discuss the very powerful programming language, Java, a
distant cousin of JavaScript.

By: Deepu Benson
The author is a free software enthusiast and his area of
interest is theoretical computer science. He maintains a
technical blog at http://www.computingforbeginners.blogspot.in.
He can be reached at [email protected].

Would You


Like More


DIY Circuits?

Free download pdf