Working with Regular Expressions in Your Python Scripts
It helps to actually see regular expressions in use to get a feel for how to use them in your own Python
scripts. Just looking at the quirky formats doesn’t help much; seeing some examples of how regular
expressions can match real data can help clear things up!
Try It Yourself: Use a Regular Expression
Follow these steps to implement a simple phone number validator script by using
regular expressions:
- Determine what regular expression pattern would match the data you’re trying to
look for. For phone numbers in the United States, there are four common ways to
display a phone number:
(123)456-7890
(123) 456-7890
123-456-7890
123.456.7890
This leaves four possibilities for how a customer can enter a phone number in a
form. The regular expression must be robust enough to be able to handle any
situation.
When building a regular expression, it’s best to start on the left side and build the
pattern to match the characters you might run into. In this example, there may or may
not be a left parenthesis in the phone number. You can match this by using the
following pattern:
^(?
The caret indicates the beginning of the data. Since the left parenthesis is a special
character, you must escape it to search for it as the character itself. The question
mark indicates that the left parenthesis may or may not appear in the data to match.
Next comes the three-digit area code. In the United States, area codes start with the
number 2 through 9. (No area codes start with the digits 0 or 1.) To match the area
code, you use this pattern:
[2-9][0-9]{2}
This requires that the first character be a digit between 2 and 9, followed by any two
digits. After the area code, the ending right parenthesis may or may not be there:
)?
After the area code there can be a space, no space, a dash, or a dot. You can group
these by using a character group along with the pipe symbol:
(| |-|.)
The very first pipe symbol appears immediately after the left parenthesis to match the
no-space condition. You must use the escape character for the dot; otherwise, it will
take on its special meaning and match any character.
Next comes the three-digit phone exchange number, which doesn’t require anything
special:
[0-9]{3}