Analysis of Algorithms : An Active Learning Approach

(Ron) #1
5.1 STRING MATCHING 125

form of the word “automata”) is a simple machine that has a current state and a
transition function. The transition function examines the state and the next
character of input and then decides on a new state for the automaton. Some
states are labeled as accepting states, and if the automaton is in one of these
when it has finished the input, the input word is said to be accepted.
We can use finite automata to do string matching by having an automata set
up to match just one word, and when we get to the final state, we know we
have found the substring in the text. This technique is very efficient because a
finite automata functions by looking at each input symbol once. This means
that we can do the matching with a finite automaton in no more than T com-
parisons. The problem becomes developing an algorithm to construct a deter-
ministic finite automaton for any possible substring. This is not an easy task,
and although algorithms are available that can do this, they take a lot of time.
Because of this, finite automata are not a good general-purpose solution to
string matching.

■ 5.1.2 Knuth-Morris-Pratt Algorithm
When constructing a finite automaton to look for a substring in a piece of
text, it is easy to build the links that move us from the start state to the final
accepting state, because they are just labeled with the characters of the sub-
string (see Fig. 5.2). The problem occurs when we begin to add additional
links for the other characters that don’t get us to the final state.
The Knuth-Morris-Pratt algorithm is based on finite automata but uses a
simpler method of handling the situation of when the characters don’t match.
In Knuth-Morris-Pratt, we label the states with the symbol that should match
at that point. We then only need two links from each state—one for a success-
ful match and the other for a failure. The success link will take us to the next
node in the chain, and the failure link will take us back to a previous node
based on the word pattern. A sample Knuth-Morris-Pratt automaton for the
substring “ababcb” is given in Fig. 5.3.

The beginning of a■FIGURE 5.2 he l l o
finite automaton
to look for the
substring “hello”

Free download pdf