[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

print(matchobj.start())

selection sets

print(re.search(" *A.C[DE][D-F][^G-ZE]G\t+ ?", "..ABCDEFG\t..").start())

alternatives: R1|R2 means R1 or R2

print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").start()) # test each char
print(re.search("(?:A|X)(?:B|Y)(?:C|Z)D", "..AYCD..").start()) # same, not saved
print(re.search("A|XB|YC|ZD", "..AYCD..").start()) # matches just A!
print(re.search("(A|XB|YC|ZD)YCD", "..AYCD..").start()) # just first char

word boundaries

print(re.search(r"\bABCD", "..ABCD ").start()) # \b means word boundary
print(re.search(r"ABCD\b", "..ABCD ").start()) # use r'...' to escape '\'

Notice again that there are different ways to kick off a match with re: by calling module
search functions and by making compiled pattern objects. In either event, you can hang
on to the resulting match object or not. All the print call statements in this script show
a result of 2 —the offset where the pattern was found in the string. In the first test, for
example, A.C. matches the ABCD at offset 2 in the search string (i.e., after the first xx):

C:\...\PP4E\Lang> python re-basic.py 2 ...8 more 2s omitted...

Next, in Example 19-4, parts of the pattern strings enclosed in parentheses delimit
groups; the parts of the string they matched are available after the match.

Example 19-4. PP4E\Lang\re-groups.py

"""
groups: extract substrings matched by REs in '()' parts
groups are denoted by position, but (?PR) can also name them
"""

import re

patt = re.compile("A(.)B(.)C(.)") # saves 3 substrings
mobj = patt.match("A0B1C2") # each '()' is a group, 1..n
print(mobj.group(1), mobj.group(2), mobj.group(3)) # group() gives substring

patt = re.compile("A(.)B(.)C(.*)") # saves 3 substrings
mobj = patt.match("A000B111C222") # groups() gives all groups
print(mobj.groups())

print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").groups())
print(re.search("(?PA|X)(?PB|Y)(?PC|Z)D", "..AYCD..").groupdict())

patt = re.compile(r"[\t ]#\sdefine\s([a-z0-9_])\s(.)")
mobj = patt.search(" # define spam 1 + 2 + 3") # parts of C #define
print(mobj.groups()) # \s is whitespace

In the first test here, for instance, the three (.) groups each match a single character,
but they retain the character matched; calling group pulls out the character matched.

1426 | Chapter 19: Text and Language

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

selection sets

alternatives: R1|R2 means R1 or R2

word boundaries

Get our desktop app

Company

Features

Documentation

Resources