Python Programming for Raspberry Pi, Sams Teach Yourself in 24 Hours

(singke) #1

regular expression pattern that matches multiple contiguous spaces, like this:


Click here to view code image


>>> re.search(' ', 'This line has too many spaces')
<_sre.SRE_Match object at 0x015F9988>
>>>

The line with two spaces between words matches the regular expression pattern. This is a great way
to catch spacing problems in text files!


Special Characters


As you use text strings in your regular expression patterns, there’s something you need to be aware of:
There are a few exceptions when defining text characters in a regular expression. Regular expression
patterns assign a special meaning to a few characters. If you try to use these characters in your text
pattern, you won’t get the results you were expecting.


Regular expressions recognize these special characters:


. * [ ] ^ $ { } \ +? | ( )


As you work your way through this hour, you’ll find out what these special characters do in a regular
expression. For now, though, just remember that you can’t use these characters by themselves in your
text pattern.


If you want to use one of the special characters as a text character, you need to escape it. To escape a
special character, you add another character in front of it to indicate to the regular expression engine
to interpret the next character as a normal text character. The special character that does this is the
backslash characters ().


In Python, as you’ve learned, backslashes also have special meaning in string values. To get around
this, if you want to use the backslash character with a special character, you can create a raw string
value, using the r nomenclature:


r'textstring'

For example, if you want to search for a dollar sign in your text, just precede it with a backslash
character, like this:


Click here to view code image


>>> re.search(r'\$', 'The cost is $4.00')
<_sre.SRE_Match object at 0x015F9918>
>>>

You can use raw text strings for your regular expressions, even if they don’t contain any backslashes.
Some coders just get in the habit of always using the raw text strings.


Anchor Characters


As shown in the “Plain Text” section a little earlier this hour, by default when you specify a regular
expression pattern, the pattern can appear anywhere in the data stream and be a match. There are two
special characters you can use to anchor a pattern to either the beginning or the end of lines in the data
stream: ^ and $.


Starting at the Beginning

Free download pdf