128 MATCHING ALGORITHMS
be no more than S 2 mismatched character comparisons. This means a total
of 2S 3 matched (fact 1) and mismatched character comparisons. So, the
construction of the links is linear relative to the length of the substring.
Now we consider the matching algorithm. We notice that the while loop
has at most one character comparison per pass. On each pass, either textLoc
andsubLoc are incremented or subLoc is decremented. Because textLoc
starts at 1 and never becomes greater than the length of the text, we know that
there are at most T increments of textLoc and subLoc. The index subLoc
also starts at 1, never becomes negative, and is never increased more than T
times, so it can be decreased at most T times. Putting all of this together tells us
that because there are at most T increments of textLoc and subLoc and T
decreases of subLoc, there are at most 2T comparisons of characters.
So, we see that the Knuth-Morris-Pratt algorithm in total does 2T + 2S 3
character comparisons, which is of order O(T + S). This is an improvement
over the standard algorithm, which is of order O(T*S). Studies of these two
algorithms on natural language text have shown that they operate at roughly
the same level of complexity, although Knuth-Morris-Pratt is slightly better
because it doesn’t back up in the text.
■ 5.1.3 Boyer-Moore Algorithm
The Boyer-Moore algorithm is different from the previous two algorithms in
that it matches the pattern from the right instead of left end. By examining the
pattern we are looking for, we should be able to make better jumps through
the text when a mismatch has occurred.
For example, in Fig. 5.4 (the same match as Fig. 5.1), we first compare the y
with the r and find a mismatch. Because r doesn’t appear in the pattern at all,
we know the pattern can be moved to the right a full four characters (the size
of the pattern). We next compare the y with the h and find a mismatch. This
■FIGURE 5.4
Match of substring
“they” in text “there
they are” (there are
six character
comparisons done
to find the match)
Text: there they are
Pass 1: they
Text: there they are
Pass 2: they
Text: there they are
Pass 3: they