though, message payload encoding is used for both international Unicode text and truly
binary data such as images (as we’ll see in the next section).
Like mail payload parts, i18n headers are encoded specially for email, and may also be
encoded per Unicode. For instance, here’s how to decode an encoded subject line from
an arguably spammish email that just showed up in my inbox; its =?UTF-8?Q? preamble
declares that the data following it is UTF-8 encoded Unicode text, which is also MIME-
encoded per quoted-printable for transmission in email (in short, unlike the prior sec-
tion’s part payloads, which declare their encodings in separate header lines, headers
themselves may declare their Unicode and MIME encodings by embedding them in
their own content this way):
>>> rawheader = '=?UTF-8?Q?Introducing=20Top=20Values=3A=20A=20Special=20Selecti
on=20of=20Great=20Money=20Savers?='
>>> from email.header import decode_header # decode per email+MIME
>>> decode_header(rawheader)
[(b'Introducing Top Values: A Special Selection of Great Money Savers', 'utf-8')]
>>> bin, enc = decode_header(rawheader)[0] # and decode per Unicode
>>> bin, enc
(b'Introducing Top Values: A Special Selection of Great Money Savers', 'utf-8')
>>> bin.decode(enc)
'Introducing Top Values: A Special Selection of Great Money Savers'
Subtly, the email package can return multiple parts if there are encoded substrings in
the header, and each must be decoded individually and joined to produce decoded
header text. Even more subtly, in 3.1, this package returns all bytes when any substring
(or the entire header) is encoded but returns str for a fully unencoded header, and
uncoded substrings returned as bytes are encoded per “raw-unicode-escape” in the
package—an encoding scheme useful to convert str to bytes when no encoding type
applies:
>>> from email.header import decode_header
>>> S1 = 'Man where did you get that assistant?'
>>> S2 = '=?utf-8?q?Man_where_did_you_get_that_assistant=3F?='
>>> S3 = 'Man where did you get that =?UTF-8?Q?assistant=3F?='
# str: don't decode()
>>> decode_header(S1)
[('Man where did you get that assistant?', None)]
# bytes: do decode()
>>> decode_header(S2)
[(b'Man where did you get that assistant?', 'utf-8')]
# bytes: do decode() using raw-unicode-escape applied in package
>>> decode_header(S3)
[(b'Man where did you get that', None), (b'assistant?', 'utf-8')]
# join decoded parts if more than one
934 | Chapter 13: Client-Side Scripting