[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

files in a variety of Unicode encoding format schemes; strings are decoded from these
formats when read and encoded to them when written. Unless text is always stored in
files using the platform’s default encoding, we need to know which encoding to use,
both to load and to save.


To make this work, PyEdit uses the approaches described in detail in Chapter 9, which
we won’t repeat in full here. In brief, though, tkinter’s Text widget accepts content as
either str and bytes and always returns it as str. PyEdit maps this interface to and from
Python file objects as follows:


Input files (Open)
Decoding from file bytes to strings in general requires the name of an encoding
type that is compatible with data in the file, and fails if the two do not agree (e.g.,
decoding 8-bit data to ASCII). In some cases, the Unicode type of the text file to
be opened may be unknown.
To load, PyEdit first tries to open input files in text mode to read str strings, using
an encoding obtained from a variety of sources—a method argument for a known
type (e.g., from headers of email attachments or source files opened by demos), a
user dialog reply, a configuration module setting, and the platform default. When-
ever prompting users for an open encoding, the dialog is prefilled with the first
choice implied by the configuration file, as a default and suggestion.
If all these encoding sources fail to decode, the file is opened in binary mode to
read text as bytes without an encoding name, effectively delegating encoding issues
to the Tk GUI library; in this case, any \r\n end-lines are manually converted to
\n on Windows so they correctly display and save later. Binary mode is used only
as a last resort, to avoid relying on Tk’s policies and limited character set support
for raw bytes.


Text Processing
The tkinter Text widget returns its content on request as str strings, regardless of
whether str or bytes were inserted. Because of that, all text processing of content
fetched from the GUI is conducted in terms of str Unicode strings here.


Output files (Save, Save As)
Encoding from strings to file bytes is generally more flexible than decoding and
need not use the same encoding from which the string was decoded, but can also
fail if the chosen scheme is too narrow to handle the string’s content (e.g., encoding
8-bit text to ASCII).
To save, PyEdit opens output files in text mode to perform end-line mappings and
Unicode encoding of str content. An encoding name is again fetched from one of
a variety of sources—the same encoding used when the file was first opened or
saved (if any), a user dialog reply, a configuration module setting, and the platform
default. Unlike opens, save dialogs that prompt for encodings are prefilled with
the known encoding if there is one as a suggestion; otherwise, the dialog is prefilled
with the next configured choice as a default, as for opens.


PyEdit: A Text Editor Program/Object | 689
Free download pdf