Regular expressions are simply strings that define patterns to be matched against other
strings. Supply a pattern and a string and ask whether the string matches your pattern.
After a match, parts of the string matched by parts of the pattern are made available to
your script. That is, matches not only give a yes/no answer, but also can pick out
substrings as well.
Regular expression pattern strings can be complicated (let’s be honest—they can be
downright gross to look at). But once you get the hang of them, they can replace larger
handcoded string search routines—a single pattern string generally does the work of
dozens of lines of manual string scanning code and may run much faster. They are a
concise way to encode the expected structure of text and extract portions of it.
The re Module
In Python, regular expressions are not part of the syntax of the Python language itself,
but they are supported by the re standard library module that you must import to use.
The module defines functions for running matches immediately, compiling pattern
strings into pattern objects, matching these objects against strings, and fetching
matched substrings after a match. It also provides tools for pattern-based splitting,
replacing, and so on.
The re module implements a rich regular expression pattern syntax that tries to be close
to that used to code patterns in the Perl language (regular expressions are a feature of
Perl worth emulating). For instance, re supports the notions of named groups; char-
acter classes; and nongreedy matches—regular expression pattern operators that match
as few characters as possible (other operators always match the longest possible sub-
string). The re module has also been optimized repeatedly, and in Python 3.X supports
matching for both bytes byte-strings and str Unicode strings. The net effect is that
Python’s pattern support uses Perl-like patterns, but is invoked with a different top-
level module interface.
I need to point out up front that regular expressions are complex tools that cannot be
covered in depth here. If this area sparks your interest, the text Mastering Regular Ex-
pressions, written by Jeffrey E. F. Friedl (O’Reilly), is a good next step to take. We won’t
be able to cover pattern construction itself well enough here to turn you into an expert.
Once you learn how to code patterns, though, the top-level interface for performing
matches is straightforward. In fact, they are so easy to use that we’ll jump right into
some live examples before getting into more details.
First Examples
There are two basic ways to kick off matches: through top-level function calls and via
methods of precompiled pattern objects. The latter precompiled form is quicker if you
will be applying the same pattern more than once—to all lines in a text file, for instance.
1416 | Chapter 19: Text and Language