[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

translations listed previously could very well corrupt data as it is input or output—a
random \r in data might be dropped on input, or added for a \n in the data on output.
The net effect is that your binary data would be trashed when read and written—
probably not quite what you want for your audio files and images!


This issue has become almost secondary in Python 3.X, because we generally cannot
use binary data with text-mode files anyhow—because text-mode files automatically
apply Unicode encodings to content, transfers will generally fail when the data cannot
be decoded on input or encoded on output. Using binary mode avoids Unicode errors,
and automatically disables line-end translations as well (Unicode error can be caught
in try statements as well). Still, the fact that binary mode prevents end-of-line transla-
tions to protect file content is best noted as a separate feature, especially if you work
in an ASCII-only world where Unicode encoding issues are irrelevant.


Here’s the end-of-line translation at work in Python 3.1 on Windows—text mode
translates to and from the platform-specific line-end sequence so our scripts are
portable:


>>> open('temp.txt', 'w').write('shrubbery\n') # text output mode: \n -> \r\n
10
>>> open('temp.txt', 'rb').read() # binary input: actual file bytes
b'shrubbery\r\n'
>>> open('temp.txt', 'r').read() # test input mode: \r\n -> \n
'shrubbery\n'

By contrast, writing data in binary mode prevents all translations as expected, even if
the data happens to contain bytes that are part of line-ends in text mode (byte strings
print their characters as ASCII if printable, else as hexadecimal escapes):


>>> data = b'a\0b\rc\r\nd' # 4 escape code bytes, 4 normal
>>> len(data)
8
>>> open('temp.bin', 'wb').write(data) # write binary data to file as is
8
>>> open('temp.bin', 'rb').read() # read as binary: no translation
b'a\x00b\rc\r\nd'

But reading binary data in text mode, whether accidental or not, can corrupt the data
when transferred because of line-end translations (assuming it passes as decodable at
all; ASCII bytes like these do on this Windows platform):


>>> open('temp.bin', 'r').read() # text mode read: botches \r!
'a\x00b\nc\nd'

Similarly, writing binary data in text mode can have as the same effect—line-end bytes
may be changed or inserted (again, assuming the data is encodable per the platform’s
default):


>>> open('temp.bin', 'w').write(data) # must pass str for text mode
TypeError: must be str, not bytes # use bytes.decode() for to-str
>>> data.decode()
'a\x00b\rc\r\nd'

150 | Chapter 4: File and Directory Tools

Free download pdf