You can use regular expressions to ask if strings match a pattern and pick out parts of strings using a rich
expression language. First you will learn what regular expressions are. Then you will learn how to compile
and use them.
13.3.1. Regular Expressions
A full description of regular expressions is complex and many other works describe them. So we will not
attempt a complete tutorial, but instead will simply give some examples of the most commonly used features.
(A full reference alone would take several pages.) A list of resources for understanding regular expressions is
in "Further Reading" on page 758.
Regular expressions search in character sequences, as defined by java.lang.CharSequence,
implemented by String and StringBuilder. You can implement it yourself if you want to provide new
sources.
A regular expression defines a pattern that can be applied to a character sequence to search for matches. The
simplest form is something that is matched exactly; the pattern xyz matches the string xyzzy but not the
string plugh. Wildcards make the pattern more general. For example,. (dot) matches any single character,
so the pattern .op matches both hop and pop, and matches zero or more of the thing before it, so xyz
matches xy, xyz, and xyzzy.
Other useful wildcards include simple sets (p[aeiou]p matches pop and pup but not pgp, while [a-z]
matches any single lowercase letter); negations ([^aeiou] matches anything that is not a single lowercase
vowel); predefined sets (\d matches any digit; \s any whitespace character); and boundaries (^twisty
matches the word "twisty" only at the beginning of a line; \balike matches "alike" only after a word
boundary, that is, at the beginning of a word).
Special symbols for particular characters include \t for tab; \n for newline; \a for the alert (bell) character;
\e for escape; and \ for backslash itself. Any character that would otherwise have a special meaning can be
preceded by a \ to remove that meaning; in other words \c always represents the character c. This is how,
for example, you would match a in an expressionby using \.
Special symbols start with the \ character, which is also the character used to introduce an escape character.
This means, for example, that in the string expression "\balike", the actual pattern will consist of a
backspace character followed by the word "alike", while "\s" would not be a pattern for whitespace but
would cause a compile-time error because \s is not valid escape character. To use the special symbols within
a string expression the leading \ must itself be escaped using \, so the example strings become
"\balike" and "\s", respectively. To include an actual backslash in a pattern it has to be escaped
twice, using four backslash characters: "\\". Each backslash pair becomes a single backslash within the
string, resulting in a single backslash pair being included in the pattern, which is then interpreted as a single
backslash character.
Regular expressions can also capture parts of the string for later use, either inside the regular expression itself
or as a means of picking out parts of the string. You capture parts of the expression inside parentheses. For
example, the regular expression (.)-(.)-\2-\1 matches x-yup-yup-x or ñ-å-å-ñ or any other
similar string because \1 matches the group (.) and \2 matches the group (.).[1] Groups are numbered
from one, in order of the appearance of their opening parenthesis.
[1] The .* means "zero or more characters," because. means "any character" and * means
"zero or more of the thing I follow," so together they mean "zero or more of any character."