[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

print(matchobj.start())


selection sets


print(re.search(" *A.C[DE][D-F][^G-ZE]G\t+ ?", "..ABCDEFG\t..").start())


alternatives: R1|R2 means R1 or R2


print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").start()) # test each char
print(re.search("(?:A|X)(?:B|Y)(?:C|Z)D", "..AYCD..").start()) # same, not saved
print(re.search("A|XB|YC|ZD", "..AYCD..").start()) # matches just A!
print(re.search("(A|XB|YC|ZD)YCD", "..AYCD..").start()) # just first char


word boundaries


print(re.search(r"\bABCD", "..ABCD ").start()) # \b means word boundary
print(re.search(r"ABCD\b", "..ABCD ").start()) # use r'...' to escape '\'


Notice again that there are different ways to kick off a match with re: by calling module
search functions and by making compiled pattern objects. In either event, you can hang
on to the resulting match object or not. All the print call statements in this script show
a result of 2 —the offset where the pattern was found in the string. In the first test, for
example, A.C. matches the ABCD at offset 2 in the search string (i.e., after the first xx):


C:\...\PP4E\Lang> python re-basic.py
2
...8 more 2s omitted...

Next, in Example 19-4, parts of the pattern strings enclosed in parentheses delimit
groups; the parts of the string they matched are available after the match.


Example 19-4. PP4E\Lang\re-groups.py


"""
groups: extract substrings matched by REs in '()' parts
groups are denoted by position, but (?PR) can also name them
"""


import re


patt = re.compile("A(.)B(.)C(.)") # saves 3 substrings
mobj = patt.match("A0B1C2") # each '()' is a group, 1..n
print(mobj.group(1), mobj.group(2), mobj.group(3)) # group() gives substring


patt = re.compile("A(.)B(.)C(.*)") # saves 3 substrings
mobj = patt.match("A000B111C222") # groups() gives all groups
print(mobj.groups())


print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").groups())
print(re.search("(?PA|X)(?PB|Y)(?PC|Z)D", "..AYCD..").groupdict())


patt = re.compile(r"[\t ]#\sdefine\s([a-z0-9_])\s(.)")
mobj = patt.search(" # define spam 1 + 2 + 3") # parts of C #define
print(mobj.groups()) # \s is whitespace


In the first test here, for instance, the three (.) groups each match a single character,
but they retain the character matched; calling group pulls out the character matched.


1426 | Chapter 19: Text and Language

← Previous
Free download pdf