Python Programming for Raspberry Pi, Sams Teach Yourself in 24 Hours

(singke) #1

in order for the pattern to match.


Combining Anchors


There are a couple common situations when you can combine the start and end anchors on the same
line. In the first situation, suppose you want to look for a line of data that contains only a specific text
pattern, as in this example:


Click here to view code image


>>> re.search('^this is a test$', 'this is a test')
<_sre.SRE_Match object at 0x015F9918>
>>> re.search('^this is a test$', 'I said this is a test')
>>>

The second situation may seem a little odd at first, but it is extremely useful. By combining both
anchors together in a pattern with no text, you can filter empty strings. Look at this example:


Click here to view code image


>>> re.search('^$', 'This is a test string')
>>> re.search('^$', "")
<_sre.SRE_Match object at 0x015F99F8>
>>>

The defined regular expression pattern looks for text that has nothing between the start and end of the
line. Because blank lines contain no text between the two newline characters, they match the regular
expression pattern. This is an effective way to remove blank lines from documents.


The Dot Character


The dot special character is used to match any single character except a newline character. The dot
character must match some character, though; if there’s no character in the place of the dot, the pattern
will fail.


Let’s take a look at a few examples of using the dot character in a regular expression pattern:


Click here to view code image


>>> re.search('.at', 'The cat is sleeping')
<_sre.SRE_Match object at 0x015F9988>
>>> re.search('.at', 'That is heavy')
<_sre.SRE_Match object at 0x015F99F8>
>>> re.search('.at', 'He is at the store')
<_sre.SRE_Match object at 0x015F9988>
>>> re.search('.at', 'at the top of the hour')
>>>

The third test here is a little tricky. Notice that you match the at, but there’s no character in front to
match the dot character. Ah, but there is! In regular expressions, spaces count as characters, so the
space in front of the at matches the pattern. The last test proves this by putting the at in the front of
the line and failing to match the pattern.


Character Classes


The dot special character is great for matching a character position against any character, but what if
you want to limit what characters to match? This is called a character class in regular expressions.


You can define a class of characters that would match a position in a text pattern. If one of the

Free download pdf