b'Spam\n'
>>> open('data.bin', 'wb').write('spam\n')
TypeError: must be bytes or buffer, not str
But notice that this file’s line ends with just \n, instead of the Windows \r\n that showed
up in the preceding example for the text file in binary mode. Strictly speaking, binary
mode disables Unicode encoding translation, but it also prevents the automatic end-
of-line character translation performed by text-mode files by default. Before we can
understand this fully, though, we need to study the two main ways in which text files
differ from binary.
Unicode encodings for text files
As mentioned earlier, text-mode file objects always translate data according to a default
or provided Unicode encoding type, when the data is transferred to and from external
file. Their content is encoded on files, but decoded in memory. Binary mode files don’t
perform any such translation, which is what we want for truly binary data. For instance,
consider the following string, which embeds a Unicode character whose binary value
is outside the normal 7-bit range of the ASCII encoding standard:
>>> data = 'sp\xe4m'
>>> data
'späm'
>>> 0xe4, bin(0xe4), chr(0xe4)
(228, '0b11100100', 'ä')
It’s possible to manually encode this string according to a variety of Unicode encoding
types—its raw binary byte string form is different under some encodings:
>>> data.encode('latin1') # 8-bit characters: ascii + extras
b'sp\xe4m'
>>> data.encode('utf8') # 2 bytes for special characters only
b'sp\xc3\xa4m'
>>> data.encode('ascii') # does not encode per ascii
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 2:
ordinal not in range(128)
Python displays printable characters in these strings normally, but nonprintable bytes
show as \xNN hexadecimal escapes which become more prevalent under more sophis-
ticated encoding schemes (cp500 in the following is an EBCDIC encoding):
>>> data.encode('utf16') # 2 bytes per character plus preamble
b'\xff\xfes\x00p\x00\xe4\x00m\x00'
>>> data.encode('cp500') # an ebcdic encoding: very different
b'\xa2\x97C\x94'
The encoded results here reflect the string’s raw binary form when stored in files. Man-
ual encoding is usually unnecessary, though, because text files handle encodings
automatically on data transfers—reads decode and writes encode, according
File Tools | 147