[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

comparetrees(dir1, dir2, diffs, True) # changes diffs in-place
print('=' * 40) # walk, report diffs list
if not diffs:
print('No diffs found.')
else:
print('Diffs found:', len(diffs))
for diff in diffs: print('-', diff)


At each directory in the tree, this script simply runs the dirdiff tool to detect unique
names, and then compares names in common by intersecting directory lists. It uses
recursive function calls to traverse the tree and visits subdirectories only after compar-
ing all the files at each level so that the output is more coherent to read (the trace output
for subdirectories appears after that for files; it is not intermixed).


Notice the misses list, added in the third edition of this book; it’s very unlikely, but not
impossible, that the same name might be a file in one directory and a subdirectory in
the other. Also notice the blocksize variable; much like the tree copy script we saw
earlier, instead of blindly reading entire files into memory all at once, we limit each read
to grab up to 1 MB at a time, just in case any files in the directories are too big to be
loaded into available memory. Without this limit, I ran into MemoryError exceptions on
some machines with a prior version of this script that read both files all at once, like this:


bytes1 = open(path1, 'rb').read()
bytes2 = open(path2, 'rb').read()
if bytes1 == bytes2: ...

This code was simpler, but is less practical for very large files that can’t fit into your
available memory space (consider CD and DVD image files, for example). In the new
version’s loop, the file reads return what is left when there is less than 1 MB present or
remaining and return empty strings at end-of-file. Files match if all blocks read are the
same, and they reach end-of-file at the same time.


We’re also dealing in binary files and byte strings again to suppress Unicode decoding
and end-line translations for file content, because trees may contain arbitrary binary
and text files. The usual note about changing this to pass byte strings to os.listdir on
platforms where filenames may generate Unicode decoding errors applies here as well
(e.g. pass dir1.encode()). On some platforms, you may also want to detect and skip
certain kinds of special files in order to be fully general, but these were not in my trees,
so they are not in my script.


One minor change for the fourth edition of this book: os.listdir results are now gath-
ered just once per subdirectory and passed along, to avoid extra calls in dirdiff—not
a huge win, but every cycle counts on the pitifully underpowered netbook I used when
writing this edition.


Comparing Directory Trees | 313
Free download pdf