[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1
C:\temp> python C:\...\PP4E\System\Filetools\join.py pysplit mypy31.msi
Joining C:\temp\pysplit to make C:\temp\mypy31.msi
Join complete: see C:\temp\mypy31.msi

C:\temp> dir *.msi
...more...
02/21/2010 11:21 AM 13,814,272 mypy31.msi
06/27/2009 04:53 PM 13,814,272 python-3.1.msi
2 File(s) 27,628,544 bytes
0 Dir(s) 188,798,611,456 bytes free

C:\temp> fc /b mypy31.msi python-3.1.msi
Comparing files mypy31.msi and PYTHON-3.1.MSI
FC: no differences encountered

The join script simply uses os.listdir to collect all the part files in a directory created
by split, and sorts the filename list to put the parts back together in the correct order.
We get back an exact byte-for-byte copy of the original file (proved by the DOS fc
command in the code; use cmp on Unix).


Some of this process is still manual, of course (I never did figure out how to script the
“walk the floppies to your bedroom” step), but the split and join scripts make it both
quick and simple to move big files around. Because this script is also portable Python
code, it runs on any platform to which we cared to move split files. For instance, my
home computers ran both Windows and Linux at the time; since this script runs on
either platform, the gamers were covered. Before we move on, here are a couple of
implementation details worth underscoring in the join script’s code:


Reading by blocks or files
First of all, notice that this script deals with files in binary mode but also reads each
part file in blocks of 1 KB each. In fact, the readsize setting here (the size of each
block read from an input part file) has no relation to chunksize in split.py (the total
size of each output part file). As we learned in Chapter 4, this script could instead
read each part file all at once: output.write(open(filepath, 'rb').read()). The
downside to this scheme is that it really does load all of a file into memory at once.
For example, reading a 1.4 MB part file into memory all at once with the file object
read method generates a 1.4 MB string in memory to hold the file’s bytes. Since
split allows users to specify even larger chunk sizes, the join script plans for the
worst and reads in terms of limited-size blocks. To be completely robust, the
split script could read its input data in smaller chunks too, but this hasn’t become
a concern in practice (recall that as your program runs, Python automatically re-
claims strings that are no longer referenced, so this isn’t as wasteful as it might
seem).


Sorting filenames
If you study this script’s code closely, you may also notice that the join scheme it
uses relies completely on the sort order of filenames in the parts directory. Because
it simply calls the list sort method on the filenames list returned by os.listdir, it
implicitly requires that filenames have the same length and format when created


288 | Chapter 6: Complete System Programs

Free download pdf