[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

Python 3.X’s sharp string types differentiation. For example, the first decode in the
following refers to MIME, and the second to Unicode:


m.get_payload(decode=True).decode() # to bytes via MIME, then to str via Unicode

Even without the MIME decode argument, the payload type may also differ if it is stored
in different forms:


>>> m = Message(); m.set_payload('spam'); m.get_payload() # fetched as stored
'spam'
>>> m = Message(); m.set_payload(b'spam'); m.get_payload()
b'spam'

Moreover, the same hold true for the text-specific MIME subclass (though as we’ll see
later in this section, we cannot pass a bytes to its constructor to force a binary payload):


>>> from email.mime.text import MIMEText
>>> m = MIMEText('Line...?')
>>> m['From'] = 'Lancelot'
>>> m['From']
'Lancelot'
>>> m.get_payload()
'Line...?'
>>> m.get_payload(decode=1)
b'Line...?'

Unfortunately, the fact that payloads might be either str or bytes today not only flies
in the face of Python’s type-neutral mindset, it can complicate your code—scripts may
need to convert in contexts that require one or the other type. For instance, GUI libraries
might allow both, but file saves and web page content generation may be less flexible.
In our example programs, we’ll process payloads as bytes whenever possible, but de-
code to str text in cases where required using the encoding information available in
the header API described in the next section.


Text payload encodings: Using header information to decode


More profoundly, text in email can be even richer than implied so far—in principle,
text payloads of a single message may be encoded in a variety of different Unicode
schemes (e.g., three HTML webpage file attachments, all in different Unicode encod-
ings, and possibly different than the full message text’s encoding). Although treating
such text as binary byte strings can sometimes finesse encoding issues, saving such parts
in text-mode files for opening must respect the original encoding types. Further, any
text processing performed on such parts will be similarly type-specific.


Luckily, the email package both adds character-set headers when generating message
text and retains character-set information for parts if it is present when parsing message
text. For instance, adding non-ASCII text attachments simply requires passing in an
encoding name—the appropriate message headers are added automatically on text
generation, and the character set is available directly via the get_content_charset
method:


email: Parsing and Composing Mail Content | 931
Free download pdf