displayed. The following avoids the extra end-of-line spaces—we open for input in
binary mode for undecoded bytes, but drop \r:
C:\...\PP4E\Gui\Tour> python
>>> from tkinter import * # use bytes, strip \r if any
>>> T = Text()
>>> data = open('jack.txt', 'rb').read()
>>> data = data.replace(b'\r\n', b'\n')
>>> T.insert('1.0', data)
>>> T.pack()
>>> T.get('1.0', 'end')[:75]
'000) All work and no play makes Jack a dull boy.\n001) All work and no pla'
To save content later, we can either add the \r characters back on Windows only,
manually encode to bytes, and save in binary mode; or we can open in text mode to
make the file object restore the \r if needed and encode for us, and write the str content
string directly. The second of these is probably simpler, as we don’t need to care about
platform differences.
Either way, though, we still face an encoding step—we can either rely on the platform
default encoding or obtain an encoding name from user interfaces. In the following,
for example, the text-mode file converts end-lines and encodes to bytes internally using
the platform default. If we care about supporting arbitrary Unicode types or run on a
platform whose default does not accommodate characters displayed, we would need
to pass in an explicit encoding argument (the Python slice operation here has the same
effect as fetching through Tk’s “end-1c” position specification):
...continuing prior listing...
>>> content = T.get('1.0', 'end')[:-1] # drop added \n at end
>>> open('copyjack.txt', 'w').write(content) # use platform default
12500 # text mode adds \n on Win
>>> ^Z
C:\...\PP4E\Gui\Tour> fc jack.txt copyjack.txt
Comparing files jack.txt and COPYJACK.TXT
FC: no differences encountered
Supporting Unicode in PyEdit (ahead)
We’ll see a use case of accommodating the Text widget’s Unicode behavior in the larger
PyEdit example of Chapter 11. Really, supporting Unicode just means supporting
arbitrary Unicode encodings in text files on opens and saves; once in memory, text
processing can always be performed in terms of str, since that’s how tkinter returns
content. To support Unicode, PyEdit will open both input and output files in text mode
with explicit encodings whenever possible, and fall back on opening input files in binary
mode only as a last resort. This avoids relying on the limited Unicode support Tk
provides for display of raw byte strings.
To make this policy work, PyEdit will accept encoding names from a wide variety of
sources and allow the user to configure which to attempt. Encodings may be obtained
from user dialog inputs, configuration file settings, the platform default, the prior
Text | 547