[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Short of writing our own email parser, or pursuing other similarly complex approaches,
the best bet today for fetched messages seems to be decoding per user preferences and
defaults, and that’s how we’ll proceed in this edition. The PyMailGUI client of Chap-
ter 14, for instance, will allow Unicode encodings for full mail text to be set on a per-
session basis.

The real issue, of course, is that email in general is inherently complicated by the pres-
ence of arbitrary text encodings. Besides full mail text, we also must consider Unicode
encoding issues for the text components of a message once it’s parsed—both its text
parts and its message headers. To see why, let’s move on.

Related Issue for CGI scripts: I should also note that the full text decoding issue may not be as large a factor for email as it is for some other email package clients. Because the original email standards call for ASCII text and require binary data to be MIME encoded, most emails are likely to decode properly according to a 7- or 8-bit encoding such as Latin-1. As we’ll see in Chapter 15, though, a more insurmountable and related issue looms for server-side scripts that support CGI file uploads on the Web—because Python’s CGI module also uses the email package to parse multipart form data; because this package requires data to be de- coded to str for parsing; and because such data might have mixed text and binary data (included raw binary data that is not MIME-encoded, text of any encoding, and even arbitrary combinations of these), these uploads fail in Python 3.1 if any binary or incompatible text files are included. The cgi module triggers Unicode decoding or type errors in- ternally, before the Python script has a chance to intervene. CGI uploads worked in Python 2.X, because the str type represented both possibly encoded text and binary data. Saving this type’s content to a binary mode file as a string of bytes in 2.X sufficed for both arbitrary text and binary data such as images. Email parsing worked in 2.X for the same reason. For better or worse, the 3.X str/bytes dichotomy makes this generality impossible. In other words, although we can generally work around the email parser’s str requirement for fetched emails by decoding per an 8-bit encoding, it’s much more malignant for web scripting today. Watch for more details on this in Chapter 15, and stay tuned for a future fix, which may have materialized by the time you read these words.

Text payload encodings: Handling mixed type results

Our next email Unicode issue seems to fly in the face of Python’s generic programming
model: the data types of message payload objects may differ, depending on how they
are fetched. Especially for programs that walk and process payloads of mail parts
generically, this complicates code.

email: Parsing and Composing Mail Content | 929

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Get our desktop app

Company

Features

Documentation

Resources