Really, though, 3.X has two additional string types that support most str string oper-
ations: bytes—a sequence of short integers for representing 8-bit binary data, and
bytearray—a mutable variant of bytes. You generally know you are dealing with
bytes if strings display or are coded with a leading “b” character before the opening
quote (e.g., b'abc', b'\xc4\xe8'). As we’ll see in Chapter 4, files in 3.X follow a similar
dichotomy, using str in text mode (which also handles Unicode encodings and line-
end conversions) and bytes in binary mode (which transfers bytes to and from files
unchanged). And in Chapter 5, we’ll see the same distinction for tools like sockets,
which deal in byte strings today.
Unicode text is used in Internationalized applications, and many of Python’s binary-
oriented tools deal in byte strings today. This includes some file tools we’ll meet along
the way, such as the open call, and the os.listdir and os.walk tools we’ll study in
upcoming chapters. As we’ll see, even simple directory tools sometimes have to be
aware of Unicode in file content and names. Moreover, tools such as object pickling
and binary data parsing are byte-oriented today.
Later in the book, we’ll also find that Unicode also pops up today in the text displayed
in GUIs; the bytes shipped other networks; Internet standard such as email; and even
some persistence topics such as DBM files and shelves. Any interface that deals in text
necessarily deals in Unicode today, because str is Unicode, whether ASCII or wider.
Once we reach the realm of the applications programming presented in this book,
Unicode is no longer an optional topic for most Python 3.X programmers.
In this book, we’ll defer further coverage of Unicode until we can see it in the context
of application topics and practical programs. For more fundamental details on how
3.X’s Unicode text and binary data support impact both string and file usage in some
roles, please see Learning Python, Fourth Edition; since this is officially a core language
topic, it enjoys in-depth coverage and a full 45-page dedicated chapter in that book.
File Operation Basics
Besides processing strings, the more.py script also uses files—it opens the external file
whose name is listed on the command line using the built-in open function, and it reads
that file’s text into memory all at once with the file object read method. Since file objects
returned by open are part of the core Python language itself, I assume that you have at
least a passing familiarity with them at this point in the text. But just in case you’ve
flipped to this chapter early on in your Pythonhood, the following calls load a file’s
contents into a string, load a fixed-size set of bytes into a string, load a file’s contents
into a list of line strings, and load the next line in the file into a string, respectively:
open('file').read() # read entire file into string
open('file').read(N) # read next N bytes into string
open('file').readlines() # read entire file into line strings list
open('file').readline() # read next line, through '\n'
System Scripting Overview | 83