APPENDIX A Languages and Regular Sets 589
Definition 5. Let A and B be two sets of words:
(a) AB={x.y:xEAandy iB}
(b) A^0 = {A}, andforalln E N,A'+' = A .A'
(c) A* =A^0 U A^1 U A^2 U .. U An U ...
(d) A+ =A'1UA2 U... UAn U...
A • B consists of all words formed by concatenating a word from A and a word from
B. So, A' consists of all words formed by concatenating n words from A together (possibly
all the same word, possibly all different). A* consists of all words formed by concatenating
any number (possibly 0) of words from A together. (The * notation is called the Kleene
star, after the mathematician Stephen Kleene [1909-1994, b. United States].) A+ consists
of all words formed by concatenating one or more words from A together. For any set A,
A+ = AA* (why?), so the operator + is used only to make definitions simpler.
Example 2. Let A ={aa, b}, and let B = {bb, cc, A). Then:
A • B = {aabb, aacc aa, bbb, bcc, b)
A^3 = {aaaaaa, aaaab, aabaa, aabb, baaaa, baab, bbaa, bbb}
A* contains the following strings and infinitely many others:
A(the string in A^0 ),
aa, b (the strings in A^1 )
aaaa aab, baa, bb (the strings in A^2 )
aaaaaa, aaaab, aabaa, aabb, baaaa, baab, bbaa, bbb (the strings in A^3 )
Since the number of tokens that can be generated using Definition 4 is infinite, a pro-
gram must have some way to recognize correctly formed tokens. For this reason, the set
of possible identifiers in most computer languages is required to be a regular set. This
gives the programmer broad latitude in choosing names while still making the scanning
algorithm relatively straightforward. Like the definition of the Fibonacci numbers, the def-
inition of the term regular set is recursive. The pattern is like a proof by induction, with a
Base case replacing the Base step and a Closure rule replacing the Inductive step. (See
Section 2.1.1 for a similar notion with formulas.)
Definition 6. Let E be an alphabet.
(Base case) Any finite subset (including 0) of E* is a regular set.
(Closure rule) If A and B are regular sets, so are A U B, A • B, and A*.
Example 3. An identifier such as a program name or a variable name in a computer
language consists of a letter followed by a string of letters and digits, such as x or ab-
cadabral23testing0. The set of identifiers is a regular set.
Let
Letter = {A, R, C Z, a, b, c ... , z)
Digit = {0, ,,1 4, 5, ,7, 8, 9
Letter and Digit are both finite and, hence, regular. So, Letter U Digit is also regular; it
is the union of two regular sets. So, (Letter U Digit) is regular, since it is the Kleenestar
of a regular set. (Letter U Digit) consists of all finite strings of letters and digits. Finally,
Letter((Letter U Digit)*) is also regular, consisting of all strings formed from a single letter
followed by a finite string of letters and digits-that is, all identifiers.