126 MATCHING ALGORITHMS
Each success link of a Knuth-Morris-Pratt automaton causes the “fetch” of a
new character from the text. Failure links do not get a new character but reuse
the last character fetched. If we reach the final state, we know that we found the
substring. To assure that you understand how this process works, you should try
it with the text string “abababcbab” and the automaton in Fig. 5.3 to see how it
successfully finds the substring. The full algorithm for this process is
subLoc = 1 // current match point in substring
textLoc = 1 // current match point in text
while textLoc ≤ length(text) and subLoc ≤ length(substring) do
if subLoc = 0 or text[ textLoc ] = substring[ subLoc ] then
textLoc = textLoc + 1
subLoc = subLoc + 1
else // no match so follow fail link
subLoc = fail[ subLoc ]
end if
end while
if (subLoc > length(substring)) then
return textLoc - length(substring) + 1 // found a match
else
return 0 // no match
end if
Before we can analyze this process, we need to consider how these fail links
are determined. Notice that we do not need to do anything special for the suc-
cess links because they just move us to the next successive location. The failure
links, however, are calculated by looking at how the substring relates to itself.
For example, if we look at the substring “ababcb,” we see that if we fail when
matching the c, we shouldn’t back up all the way. If we got to character 5 of
ss sssss
f f f f
f f
abacb b
Get
next
character
■FIGURE 5.3
A completed Knuth-
Morris-Pratt
automaton
for the substring
“ababcb”