[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

MailParser Class


Example 13-25 implements the last major class in the mailtools package—given the
(already decoded) text of an email message, its tools parse the mail’s content into a
message object, with headers and decoded parts. This module is largely just a wrapper
around the standard library’s email package, but it adds convenience tools—finding
the main text part of a message, filename generation for message parts, saving attached
parts to files, decoding headers, splitting address lists, and so on. See the code for more
information. Also notice the parts walker here: by coding its search logic in one place
as a generator function, we guarantee that all its three clients here, as well as any others
elsewhere, implement the same traversal.


Unicode decoding for text part payloads and message headers


This module also provides support for decoding message headers per email standards
(both full headers and names in address headers), and handles decoding per text part
encodings. Headers are decoded according to their content, using tools in the email
package; the headers themselves give their MIME and Unicode encodings, so no user
intervention is required. For client convenience, we also perform Unicode decoding for
main text parts to convert them from bytes to str here if needed.


The latter main-text decoding merits elaboration. As discussed earlier in this chapter,
Message objects (main or attached) may return their payloads as bytes if we fetch with
a decode=1 argument, or if they are bytes to begin with; in other cases, payloads may
be returned as str. We generally need to decode bytes in order to treat payloads as text.


In mailtools itself, str text part payloads are automatically encoded to bytes by
decode=1 and then saved to binary-mode files to finesse encoding issues, but main-text
payloads are decoded to str if they are bytes. This main-text decoding is performed
per the encoding name in the part’s message header (if present and correct), the plat-
form default, or a guess. As we learned in Chapter 9, while GUIs may allow bytes for
display, str text generally provides broader Unicode support; furthermore, str is
sometimes needed for later processing such as line wrapping and webpage generation.


Since this package can’t predict the role of other part payloads besides the main text,
clients are responsible for decoding and encoding as necessary. For instance, other text
parts which are saved in binary mode here may require that message headers be
consulted later to extract Unicode encoding names for better display. For example,
Chapter 14’s PyMailGUI will proceed this way to open text parts on demand, passing
message header encoding information on to PyEdit for decoding as text is loaded.


Some of the to-text conversions performed here are potentially partial solutions (some
parts may lack the required headers and fail per the platform defaults) and may need
to be improved; since this seems likely to be addressed in a future release of Python’s
email package, we’ll settle for our assumptions here.


976 | Chapter 13: Client-Side Scripting

Free download pdf