Analysis of Algorithms : An Active Learning Approach

(Ron) #1

136 MATCHING ALGORITHMS



  1. Calculate the slide array values for the Boyer-Moore algorithm for the pat-
    terns below. For simplicity, you should assume an alphabet of {A, B, C, D, E}.
    a. ABBCBB
    b. ACBDCB
    c. CABCDB
    d. BACAACA

  2. Calculate the jump array values for the Boyer-Moore algorithm for the pat-
    terns
    a. ABBCBB
    b. ACBDCB
    c. CABCDB
    d. BACAACA


5.2 Approximate String Matching


It is typical to identify a set of common problems that might have caused mis-
matches between the substring and the text with which it is being matched.
Those differences are that the corresponding characters in the substring and
text are different, the substring has a character the text doesn’t have, or the text
has a character the substring doesn’t have. Typically, typing errors fall into one
of these three types, with a common error of transposed characters being
treated as two character differences of the first type.
We will typically look for a k-approximate match for the substring, where k
represents the maximum number of differences of the kind mentioned in the
previous paragraph. There are a number of possibilities that we will need to
keep track of. For example, what does it mean if the first character of the sub-
string and text do not match? It could mean that there is a mismatch of char-
acters, there is a character missing from the substring, or a character is missing
from the text. If the characters do match, getting a better overall match of the
entire substring may still require that we consider the case of a character miss-
ing from the pattern or the text.
For example, consider the attempt to match the substring “ad” with the text
“read.” The first position has two possible 2-approximate matches. (The a is
changed to an r and the d is changed to an e, or there could be an “re” added
to the front of the string.) There is also a 3-approximate match at the first
Free download pdf