Skipping .\__init__.pyc
7 => .\Preview\attachgui.py
8 => .\Preview\bob.pkl
Skipping .\Preview\bob.pkl
...more lines omitted: pauses for Enter key press at matches...
Found in 2 files, visited 184
The script lists each file it checks as it goes, tells you which files it is skipping (names
that end in extensions not listed in the variable textexts that imply binary data), and
pauses for an Enter key press each time it announces a file containing the search string.
The search_all script works the same way when it is imported rather than run, but
there is no final statistics output line (fcount and vcount live in the module and so would
have to be imported to be inspected here):
C:\...\PP4E\dev\Examples\PP4E> python
>>> import Tools.search_all
>>> search_all.searcher(r'C:\temp\PP3E\Examples', 'mimetypes')
...more lines omitted: 8 pauses for Enter key press along the way...
>>> search_all.fcount, search_all.vcount # matches, files
(8, 1429)
However launched, this script tracks down all references to a string in an entire directory
tree: a name of a changed book examples file, object, or directory, for instance. It’s
exactly what I was looking for—or at least I thought so, until further deliberation drove
me to seek more complete and better structured solutions, the topic of the next section.
Be sure to also see the coverage of regular expressions in Chapter 19.
The search_all script here searches for a simple string in each file with
the in string membership expression, but it would be trivial to extend
it to search for a regular expression pattern match instead (roughly, just
replace in with a call to a regular expression object’s search method).
Of course, such a mutation will be much more trivial after we’ve learned
how.
Also notice the textexts list in Example 6-17, which attempts to list all
possible binary file types: it would be more general and robust to use
the mimetypes logic we will meet near the end of this chapter in order to
guess file content type from its name, but the skips list provides more
control and sufficed for the trees I used this script against.
Finally note that for simplicity many of the directory searches in this
chapter assume that text is encoded per the underlying platform’s Uni-
code default. They could open text in binary mode to avoid decoding
errors, but searches might then be inaccurate because of encoding
scheme differences in the raw encoded bytes. To see how to do better,
watch for the “grep” utility in Chapter 11’s PyEdit GUI, which will apply
an encoding name to all the files in a searched tree and ignore those text
or binary files that fail to decode.
Searching Directory Trees | 329