[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1
>>>
>>> print(T.get('1.0', 'end'))
Bytesfileline1
Textfileline1
Textfileline2
Bytesfileline2

This makes it easy to perform text processing on content after it is fetched: we may
conduct it in terms of str, regardless of which type of string was inserted. However,
this also makes it difficult to treat text data generically from a Unicode perspective: we
cannot save the returned str content to a binary mode file as is, because binary mode
files expect bytes. We must either encode to bytes manually first or open the file in text
mode and rely on it to encode the str. In either case we must know the Unicode en-
coding name to apply, assume the platform default suffices, fall back on guesses and
hope one works, or ask the user.


In other words, although tkinter allows us to insert and view some text of unknown
encoding as bytes, the fact that it’s returned as str strings means we generally need to
know how to encode it anyhow on saves, to satisfy Python 3.X file interfaces. Moreover,
because bytes inserted into Text widgets must also be decodable according to the limi-
ted Unicode policies of the underlying Tk library, we’re generally better off decoding
text to str ourselves if we wish to support Unicode broadly. To truly understand why
that’s true, we need to take a brief excursion through the Land of Unicode.


Unicode text in strings


The reason for all this extra complexity, of course, is that in a world with Unicode, we
cannot really think of “text” anymore without also asking “which kind.” Text in general
can be encoded in a wide variety of Unicode encoding schemes. In Python, this is always
a factor for str and pertains to bytes when it contains encoded text. Python’s str Uni-
code strings are simply strings once they are created, but you have to take encodings
into consideration when transferring them to and from files and when passing them to
libraries that impose constraints on text encodings.


We won’t cover Unicode encodings it in depth here (see Learning Python for back-
ground details, as well as the brief look at implications for files in Chapter 4), but a
quick review is in order to illustrate how this relates to Text widgets. First of all, keep
in mind that ASCII text data normally just works in most contexts, because it is a subset
of most Unicode encoding schemes. Data outside the ASCII 7-bit range, though, may
be represented differently as bytes in different encoding schemes.


For instance, the following must decode a Latin-1 bytes string using the Latin-1 en-
coding—using the platform default or an explicitly named encoding that doesn’t match
the bytes will fail:


>>> b = b'A\xc4B\xe4C' # these bytes are latin-1 format text
>>> b
b'A\xc4B\xe4C'

540 | Chapter 9: A tkinter Tour, Part 2

Free download pdf