[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

support a wide variety of text types, we need to use the character-set information saved
by the parser and attached to the Message object. This is especially important if we need
to save the data to a file—we either have to store as bytes in binary mode files, or specify
the correct (or at least a compatible) Unicode encoding in order to use such strings for
text-mode files. Decoding manually works the same way:


>>> q.get_payload(decode=1).decode()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: unexpected

>>> q.get_content_charset()
'latin1'
>>> q.get_payload(decode=1).decode('latin1') # known type
'AäB'
>>> q.get_payload(decode=1).decode(q.get_content_charset()) # allow any type
'AäB'

In fact, all the header details are available on Message objects, if we know where to look.
The character set can also be absent entirely, in which case it’s returned as None; clients
need to define policies for such ambiguous text (they might try common types, guess,
or treat the data as a raw byte string):


>>> q['content-type'] # mapping interface
'text/plain; charset="latin1"'
>>> q.items()
[('Content-Type', 'text/plain; charset="latin1"'), ('MIME-Version', '1.0'),
('Content-Transfer-Encoding', 'base64')]

>> q.get_params(header='Content-Type') # param interface
[('text/plain', ''), ('charset', 'latin1')]
>>> q.get_param('charset', header='Content-Type')
'latin1'

>>> charset = q.get_content_charset() # might be missing
>>> if charset:
... print(q.get_payload(decode=1).decode(charset))
...
AäB

This handles encodings for message text parts in parsed emails. For composing new
emails, we still must apply session-wide user settings or allow the user to specify an
encoding for each part interactively. In some of this book’s email clients, payload con-
versions are performed as needed—using encoding information in message headers
after parsing and provided by users during mail composition.


Message header encodings: email package support


On a related note, the email package also provides support for encoding and decoding
message headers themselves (e.g., From, Subject) per email standards when they are
not simple text. Such headers are often called Internationalized (or i18n) headers, be-
cause they support inclusion of non-ASCII character set text in emails. This term is also
sometimes used to refer to encoded text of message payloads; unlike message headers,


email: Parsing and Composing Mail Content | 933
Free download pdf