pprint.pprint(allsizes[:3])
pprint.pprint(allsizes[-3:])
When run, this script marches down the module import path and, for each valid di-
rectory it contains, attempts to search the entire tree rooted there. In fact, it nests loops
three deep—for items on the path, directories in the item’s tree, and files in the direc-
tory. Because the module path may contain directories named in arbitrary ways, along
the way this script must take care to:
- Normalize directory paths—fixing up slashes and dots to map directories to a
common form. - Normalize directory name case—converting to lowercase on case-insensitive Win-
dows, so that same names match by string equality, but leaving case unchanged
on Unix, where it matters. - Detect repeats to avoid visiting the same directory twice (the same directory might
be reached from more than one entry on sys.path). - Skip any file-like item in the tree for which os.path.getsize fails (by default
os.walk itself silently ignores things it cannot treat as directories, both at the top
of and within the tree). - Avoid potential Unicode decoding errors in file content by opening files in binary
mode in order to count their lines. Text mode requires decodable content, and
some files in Python 3.1’s library tree cannot be decoded properly on Windows.
Catching Unicode exceptions with a try statement would avoid program exits, too,
but might skip candidate files.
This version also adds line counts; this might add significant run time to this script too,
but it’s a useful metric to report. In fact, this version uses this value as a sort key to
report the three largest and smallest files by line counts too—this may differ from results
based upon raw file size. Here’s the script in action in Python 3.1 on my Windows 7
machine; since these results depend on platform, installed extensions, and path set-
tings, your sys.path and largest and smallest files may vary:
C:\...\PP4E\System\Filetools> bigpy-path.py
By size...
[(0, 0, 'C:\\Python31\\lib\\build_class.py'),
(0, 0, 'C:\\Python31\\lib\\email\\mime\\__init__.py'),
(0, 0, 'C:\\Python31\\lib\\email\\test\\__init__.py')]
[(161613, 3754, 'C:\\Python31\\lib\\tkinter\\__init__.py'),
(211238, 5768, 'C:\\Python31\\lib\\decimal.py'),
(380582, 78, 'C:\\Python31\\lib\\pydoc_data\\topics.py')]
By lines...
[(0, 0, 'C:\\Python31\\lib\\build_class.py'),
(0, 0, 'C:\\Python31\\lib\\email\\mime\\__init__.py'),
(0, 0, 'C:\\Python31\\lib\\email\\test\\__init__.py')]
[(147086, 4132, 'C:\\Python31\\lib\\turtle.py'),
(150069, 4268, 'C:\\Python31\\lib\\test\\test_descr.py'),
(211238, 5768, 'C:\\Python31\\lib\\decimal.py')]
A Quick Game of “Find the Biggest Python File” | 275