5.1 STRING MATCHING 129
time because the h does appear in the pattern, we move the pattern only two
characters to the right so that the h characters line up. We then begin the
match from the right side and find a complete match for the pattern. In the
Boyer-Moore algorithm, we did 6 character comparisons verses 13 for the first
simple algorithm.
There is a problem with this one improvement. If you look at Fig. 5.5, you
see that we will match the k, n, and i, but the t of tinkle doesn’t match the h of
think. With just the above change, we would then slide the pattern to the right
one character so that the t characters line up. The problem is that because we
matched the “ink” part of the pattern, shifting just one character will cause a
quick mismatch that we could predict will occur.
The Boyer-Moore algorithm will process the pattern in two ways. In the
first, we will calculate a slide value that will tell us how much the pattern has to
be shifted to line up the mismatched text character with where it next appears
in the pattern. In the second, we will calculate a jump value that is based on
character sequences at the end of the pattern that also appear earlier. We first
look at the matching process that uses these values before we look at how to
calculate them.
The Match Algorithm
We have described generally how the slide and jump arrays will be used. In the
slide array, we have the amount that the pattern can be shifted in the text to
line up the mismatched character if possible. In the jump array, we have the
amount that the pattern can be shifted to line up previously matched characters
with another matching place in the pattern. There are two possibilities when
the pattern character and the text character don’t match. The slide array could
indicate a larger move than the jump array, or the jump array could indicate a
larger move than the slide array. (The possibility that they are the same is easiest
because they are both indicating the same shift.) What do these two possibili-
ties tell us? If the slide array is larger, it means that the mismatched character
■FIGURE 5.5
A problem with
sliding
Text: the tinkle of a bell
Pattern: think
Text: the tinkle of a bell
Pattern: think