C:\cygwin\bin
C:\cygwin\cygdrive
C:\cygwin\dev
C:\cygwin\dev\mqueue
C:\cygwin\dev\shm
C:\cygwin\etc
...MANY more lines omitted...
By bytes...
[(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\build_class.py'),
(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\mime\\__init__.py'),
(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\test\\__init__.py')]
[(380582, 78, 'C:\\Python31\\Lib\\pydoc_data\\topics.py'),
(398157, 83, 'C:\\...\\Install\\Source\\Python-2.6\\Lib\\pydoc_topics.py'),
(412434, 83, 'C:\\Python26\\Lib\\pydoc_topics.py')]
By lines...
[(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\build_class.py'),
(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\mime\\__init__.py'),
(0, 0, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\email\\test\\__init__.py')]
[(204107, 5589, 'C:\\...\Install\\Source\\Python-3.0\\Lib\\decimal.py'),
(205470, 5768, 'C:\\cygwin\\...\\python31\\Python-3.1.1\\Lib\\decimal.py'),
(211238, 5768, 'C:\\Python31\\Lib\\decimal.py')]
The script’s trace logic is preset to allow you to monitor its directory progress. I’ve
shortened some directory names to protect the innocent here (and to fit on this page).
This command may take a long time to finish on your computer—on my sadly under-
powered Windows 7 netbook, it took 11 minutes to scan a solid state drive with some
59G of data, 200K files, and 25K directories when the system was lightly loaded (8
minutes when not tracing directory names, but half an hour when many other appli-
cations were running). Nevertheless, it provides the most exhaustive solution to the
original query of all our attempts.
This is also as complete a solution as we have space for in this book. For more fun,
consider that you may need to scan more than one drive, and some Python source files
may also appear in zip archives, both on the module path or not (os.walk silently ignores
zip files in Example 6-3). They might also be named in other ways—with .pyw exten-
sions to suppress shell pop ups on Windows, and with arbitrary extensions for some
top-level scripts. In fact, top-level scripts might have no filename extension at all, even
though they are Python source files. And while they’re generally not Python files, some
importable modules may also appear in frozen binaries or be statically linked into the
Python executable. In the interest of space, we’ll leave such higher resolution (and
potentially intractable!) search extensions as suggested exercises.
Printing Unicode Filenames
One fine point before we move on: notice the seemingly superfluous exception handling
in Example 6-4’s tryprint function. When I first tried to scan an entire drive as shown
in the preceding section, this script died on a Unicode encoding error while trying to
A Quick Game of “Find the Biggest Python File” | 279