[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

exceptions or try alternative schemes; this is especially true on platforms where ASCII
may be the default platform encoding.


The problem with treating text as bytes


The prior sections’ rules may seem complex, but they boil down to the following:



  • Unless strings always use the platform default, we need to know encoding types
    to read or write in text mode and to manually decode or encode for binary mode.

  • We can use almost any encoding to write new files as long as it can handle the
    string’s characters, but must provide one that is compatible with the existing data’s
    binary format on reads.

  • We don’t need to know the encoding mode to read text as bytes in binary mode
    for display, but the str content returned by the Text widget still requires us to
    encode to write on saves.


So why not always load text files in binary mode to display them in a tkinter Text widget?
While binary mode input files seem to side-step encoding issues for display, passing
text to tkinter as bytes instead of str really just delegates the encoding issue to the Tk
library, which imposes constraints of its own.


More specifically, opening input files in binary mode to read bytes may seem to support
viewing arbitrary types of text, but it has two potential downsides:



  • It shifts the burden of deciding encoding type from our script to the Tk GUI library.
    The library must still determine how to render those bytes and may not support
    all encodings possible.

  • It allows opening and viewing data that is not text in nature, thereby defeating
    some of the purpose of the validity checks performed by text decoding.


The first point is probably the most crucial here. In experiments I’ve run on Windows,
Tk seems to correctly handle raw bytes strings encoded in ASCII, UTF-8 and Latin-1
format, but not UTF-16 or others such as CP500. By contrast, these all render correctly
if decoded in Python to str before being passed on to Tk. In programs intended for the
world at large, this wider support is crucial today. If you’re able to know or ask for
encodings, you’re better off using str both for display and saves.


To some degree, regardless of whether you pass in str or bytes, tkinter GUIs are subject
to the constraints imposed by the underlying Tk library and the Tcl language it uses
internally, as well as any imposed by the techniques Python’s tkinter uses to interface
with Tk. For example:



  • Tcl, the internal implementation language of the Tk library, stores strings internally
    in UTF-8 format, and decrees that strings passed in to and returned from its C API
    be in this format.


Text | 545
Free download pdf