228 10 Transforming with Traditional Programming Languages
while (<>) {
chomp;
if (/Motif #[0-9]+:/) {
print "A motif has been found!\n";
}
}
Program 10.10 Using pattern matching to find all data of one kind in a file
The next task is to find where every motif begins, not just the first one.
This is done by modifying the pattern so that it matches any number rather
than just the number 1 as in program 10.10.
The pattern now has[0-9]+where it used to have the number 1. Brack-
eted expressions in a pattern definecharacter classes. This character class will
match any character between 0 and 9. The plus sign after the character class
means that the line must have one or more characters in this class. Any
character or character class can be followed by aquantifier:
+ One or more (i.e., at least one)
* Zero or more (i.e., any number of them)
? Zero or one (i.e., optional character)
Quantifiers can also be used to specify exactly how many times a character
must occur as well as a range of occurrences. This is done by placing the
number of times or the range in braces after the character or character class.
Character classes and quantifiers are specified using characters (such as
brackets, braces, etc.) just as everything is specified in Perl. However, this
means that these characters are special within a pattern. They are called
metacharacters. When used within a pattern they do not match themselves.
The metacharacters are backward slash, vertical bar, parentheses, brackets,
braces, circumflex, dollar sign, asterisk, plus sign, question mark, and period.
If a pattern should match one of the metacharacters, then use a backward
slash. For example,\?means match the question mark character rather than
quantify the preceding character or character class.
The next task is to obtain the motif number. In principle, one could get
this number by using thesplitandsubstrfunctions, but there is a much
easier way. When Perl matches a pattern, it keeps track of what succeeded
in matching those parts of the pattern that are in parentheses. In this case,