>>> parts = decode_header(S3)
>>> ' '.join(abytes.decode('raw-unicode-escape' if enc == None else enc)
... for (abytes, enc) in parts)
'Man where did you get that assistant?'
We’ll use logic similar to the last step here in the mailtools package ahead, but also
retain str substrings intact without attempting to decode.
Late-breaking news: As I write this in mid-2010, it seems possible that
this mixed type, nonpolymorphic, and frankly, non-Pythonic API be-
havior may be addressed in a future Python release. In response to a rant
posted on the Python developers list by a book author whose work you
might be familiar with, there is presently a vigorous discussion of the
topic there. Among other ideas is a proposal for a bytes-like type which
carries with it an explicit Unicode encoding; this may make it possible
to treat some text cases in a more generic fashion. While it’s impossible
to foresee the outcome of such proposals, it’s good to see that the issues
are being actively explored. Stay tuned to this book’s website for further
developments in the Python 3.X library API and Unicode stories.
Message address header encodings and parsing, and header creation
One wrinkle pertaining to the prior section: for message headers that contain email
addresses (e.g., From), the name component of the name/address pair might be encoded
this way as well. Because the email package’s header parser expects encoded substrings
to be followed by whitespace or the end of string, we cannot ask it to decode a complete
address-related header—quotes around name components will fail.
To support such Internationalized address headers, we must also parse out the first
part of the email address and then decode. First of all, we need to extract the name and
address parts of an email address using email package tools:
>>> from email.utils import parseaddr, formataddr
>>> p = parseaddr('"Smith, Bob" <[email protected]>') # split into name/addr pair
>>> p # unencoded addr
('Smith, Bob', '[email protected]')
>>> formataddr(p)
'"Smith, Bob" <[email protected]>'
>>> parseaddr('Bob Smith <[email protected]>') # unquoted name part
('Bob Smith', '[email protected]')
>>> formataddr(parseaddr('Bob Smith <[email protected]>'))
'Bob Smith <[email protected]>'
>>> parseaddr('[email protected]') # simple, no name
('', '[email protected]')
>>> formataddr(parseaddr('[email protected]'))
'[email protected]'
Fields with multiple addresses (e.g., To) separate individual addresses by commas.
Since email names might embed commas, too, blindly splitting on commas to run each
email: Parsing and Composing Mail Content | 935