By comparison, find.find with just “” for its name pattern is also roughly equivalent
to platform-specific directory tree listing shell commands such as dir /B /S on DOS
and Windows. Since all files match “”, this just exhaustively generates all the file names
in a tree with a single traversal. Because we can usually run such shell commands in a
Python script with os.popen, the following do the same work, but the first is inherently
nonportable and must start up a separate program along the way:
>>> import os
>>> for line in os.popen('dir /B /S'): print(line, end='')
>>> from PP4E.Tools.find import find
>>> for name in find(pattern='*', startdir='.'): print(name)
Watch for this utility to show up in action later in this chapter and book, including an
arguably strong showing in the next section and a cameo appearance in the Grep dialog
of Chapter 11’s PyEdit text editor GUI, where it will serve a central role in a threaded
external files search tool. The standard library’s find module may be gone, but it need
not be forgotten.
In fact, you must pass a bytes pattern string for a bytes filename to
fnnmatch (or pass both as str), because the re pattern matching module
it uses does not allow the string types of subject and pattern to be mixed.
This rule is inherited by our find.find for directory and pattern. See
Chapter 19 for more on re.
Curiously, the fnmatch module in Python 3.1 also converts a bytes pat-
tern string to and from Unicode str in order to perform internal text
processing, using the Latin-1 encoding. This suffices for many contexts,
but may not be entirely sound for some encodings which do not map to
Latin-1 cleanly. sys.getfilesystemencoding might be a better encoding
choice in such contexts, as this reflects the underlying file system’s con-
straints (as we learned in Chapter 4, sys.getdefaultencoding reflects file
content, not names).
In the absence of bytes, os.walk assumes filenames follow the platform’s
convention and does not ignore decoding errors triggered by os.list
dir. In the “grep” utility of Chapter 11’s PyEdit, this picture is further
clouded by the fact that a str pattern string from a GUI would have to
be encoded to bytes using a potentially inappropriate encoding for some
files present. See fnmatch.py and os.py in Python’s library and the Py-
thon library manual for more details. Unicode can be a very subtle affair.
Cleaning Up Bytecode Files
The find module of the prior section isn’t quite the general string searcher we’re after,
but it’s an important first step—it collects files that we can then search in an automated
script. In fact, the act of collecting matching files in a tree is enough by itself to support
a wide variety of day-to-day system tasks.
324 | Chapter 6: Complete System Programs