Analysis of Algorithms : An Active Learning Approach

(Ron) #1
5.1 STRING MATCHING 133

to add 7, 8, or 9, respectively. Thinking about this in general shows the amount
we add for a mismatch on the last character of the pattern is the pattern length,
and a mismatch on the first character is twice the length minus 1. This is the
basis for the jump array initialization.
Now, we need to look at a pattern where there is some repetition of charac-
ter sequences that are at the end of the pattern. We need to look at how much
to increase textLoc to move the pattern the correct amount (ignoring the
possibilities handled by the slide array). To do that, imagine we are matching
the pattern with the pattern. Consider the pattern “abcdbc” again. If the last
character doesn’t match, we can just increase textLoc by 1 and start again. If
the last character matched but the second to last didn’t, we can see that because
both c characters are preceded by b characters, we need to move the pattern
completely past where we started. If the last two characters match, but the
third to last doesn’t, we increase textLoc by 5, which will line up the “bc”
that matches the last two pattern characters with the second and third pattern
characters.
The calculation of the jump array is handled by the following algorithm.
The first loop handles the initialization of the jump array. The second loop
updates the array based on end characters that are repeated earlier. The third
(and fourth) loop adjusts the maximum moves of the front (end) of the jump
array based on where the pattern repetitions were found. The complete algo-
rithm is

// initialize jump to the largest possible movement
for i = 1 to length(pattern) do
jump[ i ] = 2 * length(pattern) - i
end for


// match the end of the pattern with characters earlier in the pattern
test = length(pattern)
target = length(pattern) + 1
while test > 0 do
link[test] = target
while target ≤ length(pattern) and pattern[test] ≠ pattern[target] do
jump[target] = MINIMUM( jump[target], length(pattern)-test )
target = link[target]
end while

Free download pdf