[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

in the middle ((.*)), and allows the final word to begin with an upper- or lowercase
letter ([Ww]); as you can see, patterns can handle wide variations in data:


>>> patt = '[ \t]*Hello[ \t]+(.*)[Ww]orld'
>>> line = ' Hello spamworld'
>>> mobj = re.match(patt, line)
>>> mobj.group(1)
'spam'

Notice that we matched a str pattern to a str string in the last listing. We can also
match bytes to bytes in order to handle data such as encoded text, but we cannot mix
the two string types (a constraint which is true in Python in general—Python wouldn’t
have the encoding information needed to know how to convert between the raw bytes
and the Unicode text):


>>> patt = b'[ \t]*Hello[ \t]+(.*)[Ww]orld' # both as bytes works too
>>> line = b' Hello spamworld' # and returns bytes groups
>>> re.match(patt, line).group(1) # but cannot mix str/bytes
b'spam'

>>> re.match(patt, ' Hello spamworld')
TypeError: can't use a bytes pattern on a string-like object

>>> re.match('[ \t]*Hello[ \t]+(.*)[Ww]orld', line)
TypeError: can't use a string pattern on a bytes-like object

In addition to the tools these examples demonstrate, there are methods for scanning
ahead to find a match (search), scanning to find all matches (findall), splitting and
replacing on patterns, and so on. All have analogous module and precompiled call
forms. The next section turns to a few examples to demonstrate more of the basics.


String Operations Versus Patterns


Notice how the preceding example skips optional whitespace and allows for uppercase
or lowercase letters. This underscores why you may want to use patterns in the first
place—they support more general kinds of text than string object methods can. Here’s
another case in point: we’ve seen that string methods can split on and replace a sub-
string, but they don’t suffice if the delimiter might be more than one alternative:


>>> 'aaa--bbb--ccc'.split('--')
['aaa', 'bbb', 'ccc']
>>> 'aaa--bbb--ccc'.replace('--', '...') # string methods use fixed strings
'aaa...bbb...ccc'

>>> 'aaa--bbb==ccc'.split(['--', '=='])
TypeError: Can't convert 'list' object to str implicitly
>>> 'aaa--bbb==ccc'.replace(['--', '=='], '...')
TypeError: Can't convert 'list' object to str implicitly

Patterns can do similar work, but also can handle alternatives directly, by virtue of their
pattern matching syntax. In the following, the syntax --|== matches either string -- or
string ==; the syntax [-=] matches either the character - or = (a character set); and the


1418 | Chapter 19: Text and Language

Free download pdf