Problem Solving in Automata, Languages, and Complexity

(Dana P.) #1
3.6 Pumping Lemmas for Context-Free Languages 143

means that, in any parse tree under grammar G, every vertex has at most d
children. Thus, a parse tree of depth m has at most d” leaves. (The depth of
a parse tree is the length of the longest path from the root to a leaf.) In other
words, a parse tree having more than dm leaves must have a path longer than
m. and, hence, some nonterminal symbol of G must occur twice in this path.
This simple observation leads to the following pumping lemma:


Lemma 3.43 (Pumping Lemma) Let G be a contest-free grammar. Then,
there exists u constant K > 0, depending on G, such that every string w in
L(G) with IwI > - I< can be decomposed us w = uvx yz satisfying the following
conditions:
(I) v # E or y # tz.
(2) lvxyl < Ii’.
(3) uvnxy’z E L(G) for ull n > - 0.


Proof. Let G = (V, C, R, S), with IV1 = m.. Assume that, for every rule A --+ x
in R, 1x1 < - d. Choose K - - d”+l 9 and let w E L with Iw I > - K. Consider a
shortest derivation for S 5 w. Let T be the parse tree corresponding to this
derivation. In tree T, each internal node is labeled by a nonterminal symbol
and each leaf is labeled by a terminal symbol or E. Clearly, T has Iw I > dm+l
leaves. Thus, T has a path of length at least m. + 1 from the root to aleaf.
Consider the longest path 7r from the root to a leaf in T. Then, the path
7r has at least m + 1 internal nodes, whose labels are nontermial symbols.
Thus, there exists a pair of such nodes having the same label (i.e., the same
nonterminal symbol). We choose such a pair (41, qz) such that the higher node
qr of the pair is the lowest one among the higher node of all such pairs. (A
node tl is higher than a node t2 if tl is closer to the root than t2.) It follows
that the subpath r’ of x from qr to the leaf has only one pair of nodes of the
same label, namely, (ql,qz). Therefore, r’ has length at most m + 1. Let A be
the label of the vertex 41. We now decompose tree T into three parts Tl, T2,
Ts, at the nodes 41 and 42, as shown in Figure 3.19(a). That is, Tl is the tree
T with all the proper descendants of node ql (the higher A) removed; T2 is
the subtree of T rooted at 41, with all the proper descendants of node q2 (the
lower A) removed; and T3 is the subtree of T rooted at q2. Then, the string w
is decomposed into five parts correspondingly as w = uvx yx. More precisely,
u is the string formed by the leaves of the subtree Tl to the left of node ql,
and x is formed by those leaves to the right of 41; v is the string formed by
the leaves of the subtree T2 to the left of node q2, and y is formed by those
leaves to the right of q2; and x is the string of the leaves of the subtree T3.
Now, we verify that this decomposition satisfies our need:
(1) Suppose v = y = E, then w can be obtained by the parse tree T’ which
is the concatenation of Tl and T3 , by identifying the root q2 of T3 with vertex
q1 of Tl (see Figure 3.19(b)). Th is new parse tree corresponds to a shorter
derivation from S to w, contradicting the assumption that T corresponds to
the shortest derivation. Thus, we must have v # E or y # E.
(2) Note that the longest path in T2 U T3 has length < - ~2 + 1 (otherwise,

Free download pdf