print(matchobj.start())
selection sets
print(re.search(" *A.C[DE][D-F][^G-ZE]G\t+ ?", "..ABCDEFG\t..").start())
alternatives: R1|R2 means R1 or R2
print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").start()) # test each char
print(re.search("(?:A|X)(?:B|Y)(?:C|Z)D", "..AYCD..").start()) # same, not saved
print(re.search("A|XB|YC|ZD", "..AYCD..").start()) # matches just A!
print(re.search("(A|XB|YC|ZD)YCD", "..AYCD..").start()) # just first char
word boundaries
print(re.search(r"\bABCD", "..ABCD ").start()) # \b means word boundary
print(re.search(r"ABCD\b", "..ABCD ").start()) # use r'...' to escape '\'
Notice again that there are different ways to kick off a match with re: by calling module
search functions and by making compiled pattern objects. In either event, you can hang
on to the resulting match object or not. All the print call statements in this script show
a result of 2 —the offset where the pattern was found in the string. In the first test, for
example, A.C. matches the ABCD at offset 2 in the search string (i.e., after the first xx):
C:\...\PP4E\Lang> python re-basic.py
2
...8 more 2s omitted...
Next, in Example 19-4, parts of the pattern strings enclosed in parentheses delimit
groups; the parts of the string they matched are available after the match.
Example 19-4. PP4E\Lang\re-groups.py
"""
groups: extract substrings matched by REs in '()' parts
groups are denoted by position, but (?P
"""
import re
patt = re.compile("A(.)B(.)C(.)") # saves 3 substrings
mobj = patt.match("A0B1C2") # each '()' is a group, 1..n
print(mobj.group(1), mobj.group(2), mobj.group(3)) # group() gives substring
patt = re.compile("A(.)B(.)C(.*)") # saves 3 substrings
mobj = patt.match("A000B111C222") # groups() gives all groups
print(mobj.groups())
print(re.search("(A|X)(B|Y)(C|Z)D", "..AYCD..").groups())
print(re.search("(?PA|X)(?PB|Y)(?P
patt = re.compile(r"[\t ]#\sdefine\s([a-z0-9_])\s(.)")
mobj = patt.search(" # define spam 1 + 2 + 3") # parts of C #define
print(mobj.groups()) # \s is whitespace
In the first test here, for instance, the three (.) groups each match a single character,
but they retain the character matched; calling group pulls out the character matched.
1426 | Chapter 19: Text and Language