Example 19-6. PP4E\Lang\cheader.py
"Scan C header files to extract parts of #define and #include lines"
import sys, re
pattDefine = re.compile( # compile to pattobj
'^#[\t ]define[\t ]+(\w+)[\t ](.*)') # "# define xxx yyy..."
\w like [a-zA-Z0-9_]
pattInclude = re.compile(
'^#[\t ]*include[\t ]+<"') # "# include
def scan(fileobj):
count = 0
for line in fileobj: # scan by lines: iterator
count += 1
matchobj = pattDefine.match(line) # None if match fails
if matchobj:
name = matchobj.group(1) # substrings for (...) parts
body = matchobj.group(2)
print(count, 'defined', name, '=', body.strip())
continue
matchobj = pattInclude.match(line)
if matchobj:
start, stop = matchobj.span(1) # start/stop indexes of (...)
filename = line[start:stop] # slice out of line
print(count, 'include', filename) # same as matchobj.group(1)
if len(sys.argv) == 1:
scan(sys.stdin) # no args: read stdin
else:
scan(open(sys.argv[1], 'r')) # arg: input filename
To test, let’s run this script on the text file in Example 19-7.
Example 19-7. PP4E\Lang\test.h
#ifndef TEST_H
#define TEST_H
#include <stdio.h>
#include <lib/spam.h>
include "Python.h"
#define DEBUG
#define HELLO 'hello regex world'
define SPAM 1234
#define EGGS sunny + side + up
#define ADDER(arg) 123 + arg
#endif
Notice the spaces after # in some of these lines; regular expressions are flexible enough
to account for such departures from the norm. Here is the script at work; picking out
#include and #define lines and their parts. For each matched line, it prints the line
number, the line type, and any matched substrings:
1428 | Chapter 19: Text and Language