[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

SQL databases (sqlite3 and third-party add-ons). The last two of these categories are
related to database topics, addressed in Chapter 17.


In this section, we’ll take a brief tutorial look at the built-in file object and explore a
handful of more advanced file-related topics. As usual, you should consult either Py-
thon’s library manual or reference books such as Python Pocket Reference for further
details and methods we don’t have space to cover here. Remember, for quick interactive
help, you can also run dir(file) on an open file object to see an attributes list that
includes methods; help(file) for general help; and help(file.read) for help on a spe-
cific method such as read, though the file object implementation in 3.1 provides less
information for help than the library manual and other resources.


The File Object Model in Python 3.X


Just like the string types we noted in Chapter 2, file support in Python 3.X is a bit richer
than it was in the past. As we noted earlier, in Python 3.X str strings always represent
Unicode text (ASCII or wider), and bytes and bytearray strings represent raw binary
data. Python 3.X draws a similar and related distinction between files containing text
and binary data:



  • Text files contain Unicode text. In your script, text file content is always a str
    string—a sequence of characters (technically, Unicode “code points”). Text files
    perform the automatic line-end translations described in this chapter by default
    and automatically apply Unicode encodings to file content: they encode to and
    decode from raw binary bytes on transfers to and from the file, according to a
    provided or default encoding name. Encoding is trivial for ASCII text, but may be
    sophisticated in other cases.

  • Binary files contain raw 8-bit bytes. In your script, binary file content is always a
    byte string, usually a bytes object—a sequence of small integers, which supports
    most str operations and displays as ASCII characters whenever possible. Binary
    files perform no translations of data when it is transferred to and from files: no line-
    end translations or Unicode encodings are performed.


In practice, text files are used for all truly text-related data, and binary files store items
like packed binary data, images, audio files, executables, and so on. As a programmer
you distinguish between the two file types in the mode string argument you pass to
open: adding a “b” (e.g., 'rb', 'wb') means the file contains binary data. For coding new
file content, use normal strings for text (e.g., 'spam' or bytes.decode()) and byte strings
for binary (e.g., b'spam' or str.encode()).


Unless your file scope is limited to ASCII text, the 3.X text/binary distinction can
sometimes impact your code. Text files create and require str strings, and binary files
use byte strings; because you cannot freely mix the two string types in expressions, you
must choose file mode carefully. Many built-in tools we’ll use in this book make the
choice for us; the struct and pickle modules, for instance, deal in byte strings in 3.X,


136 | Chapter 4: File and Directory Tools

Free download pdf