304 LANGUAGES, AUTOMATA, GRAMMARS [CHAP. 12
Clearly, for any wordsu,v,w, the words(uv)wandu(vw)are identical, they simply consist of the letters
ofu,v,wwritten down one after the other. Also, adjoining the empty word before or after a wordudoes not
change the wordu. That is:
Theorem 12.1: The concatenation operation for words on an alphabetAis associative. The empty wordlis an
identity element for the operation.
(Generally speaking, the operation is not commutative, e.g.,uv=vufor the above wordsuandv.)
Subwords, Initial Segments
Consider any wordu=a 1 a 2 ...anon an alphabetA. Any sequencew=ajaj+ 1 ...akis called asubword
ofu. In particular, the subwordw=a 1 a 2 ...akbeginning with the first letter ofu, is called aninitial segment
ofu. In other words,wis a subword ofuifu=v 1 wv 2 andwis an initial segment ofuifu=wv. Observe that
landuare both subwords ofuvsinceu=lu.
Consider the wordu=abca. The subwords and initial segments ofufollow:
(1) Subwords:l, a, b, c, ab, bc, ca, abc, bca, abca=u
(2) Initial segments:l, a, ab, abc, abca=u.
Observe that the subwordw=aappears in two places inu. The word ac is not a subword ofueven though all
its letters belong tou.
Free Semigroup, Free Monoid
LetFdenote the set of all nonempty words from an alphabetAwith the operation of concatenation. As
noted above, the operation is associative. ThusFis a semigroup; it is called thefree semigroup overAor thefree
semigroup generated byA. One can easily show thatFsatisfies the right and left cancellation laws. However,
Fis not commutative whenAhas more than one element. We will writeFAfor the free semigroup overAwhen
we want to specify the setA.
Now letM=A* be the set of all words fromAincluding the empty wordl. Sincelis an identity element
for the operation of concatenation,Mis a monoid, called thefree monoidoverA.
12.3Languages
AlanguageLover an alphabetAis a collection of words onA. Recall thatA denotes the set of all words
onA. Thus a languageLis simply a subset ofA.
EXAMPLE 12.1 LetA={a, b}. The following are languages overA.
(a)L 1 ={a, ab, ab^2 ,...} (c)L 3 ={ambm|m> 0 }
(b)L 2 ={ambn|m> 0 ,n> 0 } (d)L 4 =bmabn|m≥ 0 ,n≥ 0 }
One may verbally describe these languages as follows.
(a) L 1 consists of all words beginning with anaand followed by zero or moreb’s.
(b)L 2 consists of all words beginning with one or more as followed by one or moreb’s.
(c) L 3 consists of all words beginning with one or moreas and followed by the same number ofb’s.
(d)L 4 consists of all words with exactly onea.