[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Example 19-6. PP4E\Lang\cheader.py

"Scan C header files to extract parts of #define and #include lines"

import sys, re
pattDefine = re.compile( # compile to pattobj
'^#[\t ]define[\t ]+(\w+)[\t ](.*)') # "# define xxx yyy..."

\w like [a-zA-Z0-9_]

pattInclude = re.compile(
'^#[\t ]*include[\t ]+<"') # "# include ..."

def scan(fileobj):
count = 0
for line in fileobj: # scan by lines: iterator
count += 1
matchobj = pattDefine.match(line) # None if match fails
if matchobj:
name = matchobj.group(1) # substrings for (...) parts
body = matchobj.group(2)
print(count, 'defined', name, '=', body.strip())
continue
matchobj = pattInclude.match(line)
if matchobj:
start, stop = matchobj.span(1) # start/stop indexes of (...)
filename = line[start:stop] # slice out of line
print(count, 'include', filename) # same as matchobj.group(1)

if len(sys.argv) == 1:
scan(sys.stdin) # no args: read stdin
else:
scan(open(sys.argv[1], 'r')) # arg: input filename

To test, let’s run this script on the text file in Example 19-7.

Example 19-7. PP4E\Lang\test.h

#ifndef TEST_H
#define TEST_H

#include <stdio.h>
#include <lib/spam.h>

include "Python.h"

#define DEBUG
#define HELLO 'hello regex world'

define SPAM 1234

#define EGGS sunny + side + up
#define ADDER(arg) 123 + arg
#endif

Notice the spaces after # in some of these lines; regular expressions are flexible enough
to account for such departures from the norm. Here is the script at work; picking out
#include and #define lines and their parts. For each matched line, it prints the line
number, the line type, and any matched substrings:

1428 | Chapter 19: Text and Language

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

\w like [a-zA-Z0-9_]

include "Python.h"

define SPAM 1234

Get our desktop app

Company

Features

Documentation

Resources