Java 7 for Absolute Beginners

(nextflipdebug5) #1
CHAPTER 15 ■ GENERICS AND REGULAR EXPRESSIONS

Now that we have a testing program and know how to use it, we need to focus on the regular
expression syntax that Java supports. Regular expression syntax is almost a language unto itself, so we'll
focus on the basics and some of the more commonly used advanced bits. The whole thing is worthy of a
book (and such books exist).
Our simple test case uses a string literal. A string literal is just a piece of text. In the example we just
ran, "Sam" is a string literal. "Spade" is another string literal. If we replace "Sam" with "Spade," we get the
following output in the console:


Found a match for Spade beginning at 4 and ending at 9


We won't be able to accomplish much with just string literals. We can find all the instances of a
particular string, but we can't find anything that matches a pattern. To create a pattern, we have to dive
into the key component of regular expressions—metacharacters.
Metacharacters are characters that create patterns. Rather than represent a single literal character, a
metacharacter represents a set of characters. Some metacharacters work by themselves, while other
metacharacters are meaningless in the absence of other metacharacters. Table 15-1 describes the
metacharacters supported by the Java regular expression syntax.


Table 15-1. Java Regular Expression Metacharacters


Metacharacter Description


( Starts a subpattern (a pattern within the larger pattern). For example compan(y|ies)
lets you match either “company” or “companies”.


Also starts the definition of a group. (Dog) treats those three characters as a single
unit for other regular expression operators.

[ Starts a set of characters. For example, [A-Z] would match any upper-case
character. A[A-Z]Z would match “AAZ”, “ABZ”, and so on to “AZZ”.


{ Starts a match count specifier. For example, s{3} would match three s characters in
a row: sss. Pas{3} would match “Passs”.


\ Starts an escape sequence, so that you can match a literal instance of a
metacharacter. For example, if you needed to match the periods in a paragraph,
you'd use . (that is, a backslash and a period). The period character (.) is itself a
regular expression metacharacter, so you must escape it to find the actual periods.
Similarly, to find an actual backslash character, you must escape the escape
character, thus: \


^ Matches the start of the string. ^A finds any line that begins with “A”. ^[0-9] finds
any line that begins with a digit. ^[0-9]{2} finds any line that begins with two digits.
^[0-9]+ matches any line that begins with a number of any size.


Inside of a range, ^ is the negation character. [^abc] matches any character other
than a, b, or c. [^abc]at matches “rat” and “sat” and “eat” (and many others) but not
“bat” or “cat” (or “aat”).


  • Used within range expressions, such as [0-9], which would match any digit.

Free download pdf