[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

locate and extract bracketed text anywhere in a string, even pairs with optional text
between:

>>> '<spam>/<ham>/<eggs>'.find('ham') # find substring offset 8 >>> re.findall('<(.*?)>', '<spam>/<ham>/<eggs>') # find all matches/groups ['spam', 'ham', 'eggs'] >>> re.findall('<(.*?)>', '<spam> / <ham><eggs>') ['spam', 'ham', 'eggs']

>>> re.findall('<(.*?)>/?<(.*?)>', '<spam>/<ham> ... <eggs><cheese>') [('spam', 'ham'), ('eggs', 'cheese')] >>> re.search('<(.*?)>/?<(.*?)>', 'todays menu: <spam>/<ham>...<eggs><s>').groups() ('spam', 'ham')

Especially when using findall, the (?s) operator comes in handy to force. to match
end-of-line characters in multiline text; without it. matches everything except lines
ends. The following searches look for two adjacent bracketed strings with arbitrary text
between, with and without skipping line breaks:

>>> re.findall('<(.*?)>.*<(.*?)>', '<spam> \n <ham>\n<eggs>') # stop at \n [] >>> re.findall('(?s)<(.*?)>.*<(.*?)>', '<spam> \n <ham>\n<eggs>') # greedy [('spam', 'eggs')] >>> re.findall('(?s)<(.*?)>.*?<(.*?)>', '<spam> \n <ham>\n<eggs>') # nongreedy [('spam', 'ham')]

To make larger patterns more mnemonic, we can even associate names with matched
substring groups in using the <?P) pattern syntax and fetch them by name after
matches, though this is of limited utility for findall. The next tests look for strings of
“word” characters (\w) separated by a /—this isn’t much more than a string split, but
parts are named, and search and findall both scan ahead:

>>> re.search('(?P<part1>\w*)/(?P<part2>\w*)', '...aaa/bbb/ccc]').groups() ('aaa', 'bbb') >>> re.search('(?P<part1>\w*)/(?P<part2>\w*)', '...aaa/bbb/ccc]').groupdict() {'part1': 'aaa', 'part2': 'bbb'}

>>> re.search('(?P<part1>\w*)/(?P<part2>\w*)', '...aaa/bbb/ccc]').group(2) 'bbb' >>> re.search('(?P<part1>\w*)/(?P<part2>\w*)', '...aaa/bbb/ccc]').group('part2') 'bbb'

>>> re.findall('(?P<part1>\w*)/(?P<part2>\w*)', '...aaa/bbb ccc/ddd]') [('aaa', 'bbb'), ('ccc', 'ddd')]

Finally, although basic string operations such as slicing and splits are sometimes
enough, patterns are much more flexible. The following uses [^ ] to match any char-
acter not following the ^, and escapes a dash within a [] alternative set using - so it’s
not taken to be a character set range separator. It runs equivalent slices, splits, and
matches, along with a more general match that the other two cannot approach:

>>> line = 'aaa bbb ccc' >>> line[:3], line[4:7], line[8:11] # slice data at fixed offsets

1420 | Chapter 19: Text and Language

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Get our desktop app

Company

Features

Documentation

Resources