... Exits
... Media
... moreplus.py
Files unique to ..
... more.pyc
... spam.txt
... Tester
... __init__.pyc
The unique function is the heart of this script: it performs a simple list difference
operation. When applied to directories, unique items represent tree differences, and
common items are names of files or subdirectories that merit further comparisons or
traversals. In fact, in Python 2.4 and later, we could also use the built-in set object type
if we don’t care about the order in the results—because sets are not sequences, they
would not maintain any original and possibly platform-specific left-to-right order of
the directory listings provided by os.listdir. For that reason (and to avoid requiring
users to upgrade), we’ll keep using our own comprehension-based function instead
of sets.
Finding Tree Differences
We’ve just coded a directory comparison tool that picks out unique files and directories.
Now all we need is a tree walker that applies dirdiff at each level to report unique
items, explicitly compares the contents of files in common, and descends through di-
rectories in common. Example 6-12 fits the bill.
Example 6-12. PP4E\System\Filetools\diffall.py
"""
################################################################################
Usage: "python diffall.py dir1 dir2".
Recursive directory tree comparison: report unique files that exist in only
dir1 or dir2, report files of the same name in dir1 and dir2 with differing
contents, report instances of same name but different type in dir1 and dir2,
and do the same for all subdirectories of the same names in and below dir1
and dir2. A summary of diffs appears at end of output, but search redirected
output for "DIFF" and "unique" strings for further details. New: (3E) limit
reads to 1M for large files, (3E) catch same name=file/dir, (4E) avoid extra
os.listdir() calls in dirdiff.comparedirs() by passing results here along.
################################################################################
"""
import os, dirdiff
blocksize = 1024 * 1024 # up to 1M per read
def intersect(seq1, seq2):
"""
Return all items in both seq1 and seq2;
a set(seq1) & set(seq2) woud work too, but sets are randomly
ordered, so any platform-dependent directory order would be lost
"""
return [item for item in seq1 if item in seq2]
Comparing Directory Trees | 311