[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

Binary and Text Files


All of the preceding examples process simple text files, but Python scripts can also open
and process files containing binary data—JPEG images, audio clips, packed binary data
produced by FORTRAN and C programs, encoded text, and anything else that can be
stored in files as bytes. The primary difference in terms of your code is the mode argu-
ment passed to the built-in open function:


>>> file = open('data.txt', 'wb') # open binary output file
>>> file = open('data.txt', 'rb') # open binary input file

Once you’ve opened binary files in this way, you may read and write their contents
using the same methods just illustrated: read, write, and so on. The readline and
readlines methods as well as the file’s line iterator still work here for text files opened
in binary mode, but they don’t make sense for truly binary data that isn’t line oriented
(end-of-line bytes are meaningless, if they appear at all).


In all cases, data transferred between files and your programs is represented as Python
strings within scripts, even if it is binary data. For binary mode files, though, file content
is represented as byte strings. Continuing with our text file from preceding examples:


>>> open('data.txt').read() # text mode: str
'Hello file world!\nBye file world.\nThe Life of Brian'

>>> open('data.txt', 'rb').read() # binary mode: bytes
b'Hello file world!\r\nBye file world.\r\nThe Life of Brian'

>>> file = open('data.txt', 'rb')
>>> for line in file: print(line)
...
b'Hello file world!\r\n'
b'Bye file world.\r\n'
b'The Life of Brian'

This occurs because Python 3.X treats text-mode files as Unicode, and automatically
decodes content on input and encodes it on output. Binary mode files instead give us
access to file content as raw byte strings, with no translation of content—they reflect
exactly what is stored on the file. Because str strings are always Unicode text in 3.X,
the special bytes string is required to represent binary data as a sequence of byte-size
integers which may contain any 8-bit value. Because normal and byte strings have al-
most identical operation sets, many programs can largely take this on faith; but keep
in mind that you really must open truly binary data in binary mode for input, because
it will not generally be decodable as Unicode text.


Similarly, you must also supply byte strings for binary mode output—normal strings
are not raw binary data, but are decoded Unicode characters (a.k.a. code points) which
are encoded to binary on text-mode output:


>>> open('data.bin', 'wb').write(b'Spam\n')
5
>>> open('data.bin', 'rb').read()

146 | Chapter 4: File and Directory Tools

Free download pdf