Analysis of Algorithms : An Active Learning Approach

(Ron) #1
5.1 STRING MATCHING 127

the substring, we know that the first four characters matched, and so the “ab”
that matched substring characters 3 and 4 should perhaps match substring
characters 1 and 2 for a successful search. The following algorithm determines
these relationships in the substring:
fail[ 1 ] = 0
for i = 2 to length(substring) do
temp = fail[ i - 1 ]
while (temp > 0) and (substring[ temp ] ≠ substring[ i - 1 ]) do
temp = fail[ temp ]
end while
fail[ i ] = temp + 1
end for


Analysis of Knuth-Morris-Pratt

We first consider the failure link construction phase of the algorithm. If we
look at the while loop, we see that it will run until we find two substring
characters that match. On a cursory look at the process, you see that temp fol-
lows the fail links, and those are in decreasing order. This might lead you to
believe that, for the kth pass of the for loop, the while loop does as many as
k 1 comparisons and this entire process takes about S^2 / 2 comparisons. If
we look closer, we will find a better approximation of the number of compari-
sons. We notice the following facts:


  1. There are at most S 1 times that the ≠ character comparisons will be false.

  2. The fail links are all smaller than their index (i.e., fail[ x] < x for every x)
    because they indicate where to back up to on a failed text character match.

  3. Every time the ≠ comparison is true, temp is decreased because of fact 2.

  4. On the first pass of the for loop, the while is not done because temp =
    fail[1]=0.

  5. The combination of the final statement of the for loop, the increment of i,
    and then the first statement of the next pass of the for loop means that
    temp is incremented by 1 each subsequent pass of the for loop.

  6. Effectively by fact 5, because there are S 2 “subsequent passes” of the for
    loop, temp is incremented S 2 times.

  7. Because fail[1]=0,temp never becomes negative.
    We now know that temp starts at a value of 0 (fact 4), and is incremented no
    more than S 2 times (fact 6). Because each time the characters don’t match
    temp is decreased (fact 3) and temp never becomes negative (fact 7), there can

Free download pdf