Luckily, in Python it’s almost as easy to process a directory tree as it is to inspect a single
directory. We can either write a recursive routine to traverse the tree, or use a tree-
walker utility built into the os module. Such tools can be used to search, copy, compare,
and otherwise process arbitrary directory trees on any platform that Python runs on
(and that’s just about everywhere).
The os.walk visitor
To make it easy to apply an operation to all files in a complete directory tree, Python
comes with a utility that scans trees for us and runs code we provide at every directory
along the way: the os.walk function is called with a directory root name and automat-
ically walks the entire tree at root and below.
Operationally, os.walk is a generator function—at each directory in the tree, it yields a
three-item tuple, containing the name of the current directory as well as lists of both
all the files and all the subdirectories in the current directory. Because it’s a generator,
its walk is usually run by a for loop (or other iteration tool); on each iteration, the
walker advances to the next subdirectory, and the loop runs its code for the next level
of the tree (for instance, opening and searching all the files at that level).
That description might sound complex the first time you hear it, but os.walk is fairly
straightforward once you get the hang of it. In the following, for example, the loop
body’s code is run for each directory in the tree rooted at the current working directory
(.). Along the way, the loop simply prints the directory name and all the files at the
current level after prepending the directory name. It’s simpler in Python than in English
(I removed the PP3E subdirectory for this test to keep the output short):
>>> import os
>>> for (dirname, subshere, fileshere) in os.walk('.'):
... print('[' + dirname + ']')
... for fname in fileshere:
... print(os.path.join(dirname, fname)) # handle one file
...
[.]
.\random.bin
.\spam.txt
.\temp.bin
.\temp.txt
[.\parts]
.\parts\part0001
.\parts\part0002
.\parts\part0003
.\parts\part0004
In other words, we’ve coded our own custom and easily changed recursive directory
listing tool in Python. Because this may be something we would like to tweak and reuse
elsewhere, let’s make it permanently available in a module file, as shown in Exam-
ple 4-4, now that we’ve worked out the details interactively.
Directory Tools | 169