Example 13-25. PP4E\Internet\Email\mailtools\mailParser.py
"""
###############################################################################
parsing and attachment extract, analyse, save (see init for docs, test)
###############################################################################
"""
import os, mimetypes, sys # mime: map type to name
import email.parser # parse text to Message object
import email.header # 4E: headers decode/encode
import email.utils # 4E: addr header parse/decode
from email.message import Message # Message may be traversed
from .mailTool import MailTool # 4E: package-relative
class MailParser(MailTool):
"""
methods for parsing message text, attachments
subtle thing: Message object payloads are either a simple
string for non-multipart messages, or a list of Message
objects if multipart (possibly nested); we don't need to
distinguish between the two cases here, because the Message
walk generator always returns self first, and so works fine
on non-multipart messages too (a single object is walked);
for simple messages, the message body is always considered
here to be the sole part of the mail; for multipart messages,
the parts list includes the main message text, as well as all
attachments; this allows simple messages not of type text to
be handled like attachments in a UI (e.g., saved, opened);
Message payload may also be None for some oddball part types;
4E note: in Py 3.1, text part payloads are returned as bytes
for decode=1, and might be str otherwise; in mailtools, text
is stored as bytes for file saves, but main-text bytes payloads
are decoded to Unicode str per mail header info or platform
default+guess; clients may need to convert other payloads:
PyMailGUI uses headers to decode parts saved to binary files;
4E supports fetched message header auto-decoding per its own
content, both for general headers such as Subject, as well as
for names in address header such as From and To; client must
request this after parse, before display: parser doesn't decode;
"""
def walkNamedParts(self, message):
"""
generator to avoid repeating part naming logic;
skips multipart headers, makes part filenames;
message is already parsed email.message.Message object;
doesn't skip oddball types: payload may be None, must
handle in part saves; some others may warrant skips too;
"""
for (ix, part) in enumerate(message.walk()): # walk includes message
fulltype = part.get_content_type() # ix includes parts skipped
The mailtools Utility Package | 977