[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

to the encoding name passed in (or a default for the underlying platform: see
sys.getdefaultencoding). Continuing our interactive session:


>>> open('data.txt', 'w', encoding='latin1').write(data)
4
>>> open('data.txt', 'r', encoding='latin1').read()
'späm'
>>> open('data.txt', 'rb').read()
b'sp\xe4m'

If we open in binary mode, though, no encoding translation occurs—the last command
in the preceding example shows us what’s actually stored on the file. To see how file
content differs for other encodings, let’s save the same string again:


>>> open('data.txt', 'w', encoding='utf8').write(data) # encode data per utf8
4
>>> open('data.txt', 'r', encoding='utf8').read() # decode: undo encoding
'späm'
>>> open('data.txt', 'rb').read() # no data translations
b'sp\xc3\xa4m'

This time, raw file content is different, but text mode’s auto-decoding makes the string
the same by the time it’s read back by our script. Really, encodings pertain only to
strings while they are in files; once they are loaded into memory, strings are simply
sequences of Unicode characters (“code points”). This translation step is what we want
for text files, but not for binary. Because binary modes skip the translation, you’ll want
to use them for truly binary data. If fact, you usually must—trying to write unencodable
data and attempting to read undecodable data is an error:


>>> open('data.txt', 'w', encoding='ascii').write(data)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 2:
ordinal not in range(128)

>>> open(r'C:\Python31\python.exe', 'r').read()
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2:
character maps to <undefined>

Binary mode is also a last resort for reading text files, if they cannot be decoded per the
underlying platform’s default, and the encoding type is unknown—the following re-
creates the original strings if encoding type is known, but fails if it is not known unless
binary mode is used (such failure may occur either on inputting the data or printing it,
but it fails nevertheless):


>>> open('data.txt', 'w', encoding='cp500').writelines(['spam\n', 'ham\n'])
>>> open('data.txt', 'r', encoding='cp500').readlines()
['spam\n', 'ham\n']

>>> open('data.txt', 'r').readlines()
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2:
character maps to <undefined>

>>> open('data.txt', 'rb').readlines()
[b'\xa2\x97\x81\x94\r%\x88\x81\x94\r%']

148 | Chapter 4: File and Directory Tools

Free download pdf