Let’s run these method calls to read files, lines, and characters from a text file—the
seek(0) call is used here before each test to rewind the file to its beginning (more on
this call in a moment):
>>> file.seek(0) # go back to the front of file
>>> file.read() # read entire file into string
'Hello file world!\nBye file world.\n'
>>> file.seek(0) # read entire file into lines list
>>> file.readlines()
['Hello file world!\n', 'Bye file world.\n']
>>> file.seek(0)
>>> file.readline() # read one line at a time
'Hello file world!\n'
>>> file.readline()
'Bye file world.\n'
>>> file.readline() # empty string at end-of-file
''
>>> file.seek(0) # read N (or remaining) chars/bytes
>>> file.read(1), file.read(8) # empty string at end-of-file
('H', 'ello fil')
All of these input methods let us be specific about how much to fetch. Here are a few
rules of thumb about which to choose:
- read() and readlines() load the entire file into memory all at once. That makes
them handy for grabbing a file’s contents with as little code as possible. It also
makes them generally fast, but costly in terms of memory for huge files—loading
a multigigabyte file into memory is not generally a good thing to do (and might not
be possible at all on a given computer). - On the other hand, because the readline() and read(N) calls fetch just part of the
file (the next line or N-character-or-byte block), they are safer for potentially big
files but a bit less convenient and sometimes slower. Both return an empty string
when they reach end-of-file. If speed matters and your files aren’t huge, read or
readlines may be a generally better choice. - See also the discussion of the newer file iterators in the next section. As we’ll see,
iterators combine the convenience of readlines() with the space efficiency of read
line() and are the preferred way to read text files by lines today.
The seek(0) call used repeatedly here means “go back to the start of the file.” In our
example, it is an alternative to reopening the file each time. In files, all read and write
operations take place at the current position; files normally start at offset 0 when opened
and advance as data is transferred. The seek call simply lets us move to a new position
for the next transfer operation. More on this method later when we explore random
access files.
142 | Chapter 4: File and Directory Tools