Analysis of Algorithms : An Active Learning Approach

(Ron) #1
5.1 STRING MATCHING 135

Analysis

In the following analysis, we will use P to represent the number of characters
in the pattern, T to represent the number of characters in the text, and A to
represent the number of characters in the alphabet.
The calculation of the slide array values does an assignment for each array
location and another for each character of the pattern. This does O(A+P)
assignments.
The calculation of the jump array will at worst compare each character to all
of the ones that appear later in the pattern. From past experience, you should
quickly see that this does O(P^2 ) comparisons. Because there could be an assign-
ment to the jump array for each of these, there are also O(P^2 ) assignments.
The details of an analysis of the number of comparisons done by the match-
ing algorithm are beyond the scope of this book. Studies have shown that for
natural language text with patterns of six or more characters, the match algo-
rithm will compare a character of the pattern with about 40% of the characters
or less. As the length of the pattern increases, the percentage decreases to a
lower value of about 25% of the characters.

5.1.4



  1. Calculate the fail links for the Knuth-Morris-Pratt algorithm for the patterns
    a. ABABBC
    b. ABCABC
    c. CBCBBACA
    d. BBABBCAC

  2. In the text, we indicated that the worst case for the standard algorithm was a
    substring consisting of a string of Xs with one Y at the end and a text con-
    sisting of a string of Xs. If the substring has S characters (S 1 Xs and 1 Y)
    and the text has T characters, we saw that the standard algorithm would take
    S*T character comparisons. What would the fail links look like for the sub-
    string “XXXXY” and how many comparisons would be done to construct
    them? In general, what would the fail links look like for strings of this
    form, and how many comparisons would be done to construct them? How
    many character comparisons would the Knuth-Morris-Pratt algorithm do
    attempting to match the substring and text? (Show all work used to get
    your answer.)


■5.1.4 EXERCISES
Free download pdf