though parsing won’t always work. Instead, another utility can be used to parse each
address individually: getaddresses ignores commas in names when spitting apart sep-
arate addresses, and parseaddr does, too, because it simply returns the first pair in the
getaddresses result (some line breaks were added to the following for legibility):
>>> from email.utils import getaddresses
>>> multi = '"Smith, Bob" <[email protected]>, Bob Smith <[email protected]>, [email protected],
"Bob" <[email protected]>'
>>> getaddresses([multi])
[('Smith, Bob', '[email protected]'), ('Bob Smith', '[email protected]'), ('', '[email protected]'),
('Bob', '[email protected]')]
>>> [formataddr(pair) for pair in getaddresses([multi])]
['"Smith, Bob" <[email protected]>', 'Bob Smith <[email protected]>', '[email protected]',
'Bob <[email protected]>']
>>> ', '.join([formataddr(pair) for pair in getaddresses([multi])])
'"Smith, Bob" <[email protected]>, Bob Smith <[email protected]>, [email protected],
Bob <[email protected]>'
>>> getaddresses(['[email protected]']) # handles single address cases too
('', '[email protected]')]
Now, decoding email addresses is really just an extra step before and after the normal
header decoding logic we saw earlier:
>>> rawfromheader = '"=?UTF-8?Q?Walmart?=" <[email protected]>'
>>> from email.utils import parseaddr, formataddr
>>> from email.header import decode_header
>>> name, addr = parseaddr(rawfromheader) # split into name/addr parts
>>> name, addr
('=?UTF-8?Q?Walmart?=', '[email protected]')
>>> abytes, aenc = decode_header(name)[0] # do email+MIME decoding
>>> abytes, aenc
(b'Walmart', 'utf-8')
>>> name = abytes.decode(aenc) # do Unicode decoding
>>> name
'Walmart'
>>> formataddr((name, addr)) # put parts back together
'Walmart <[email protected]>'
Although From headers will typically have just one address, to be fully robust we need
to apply this to every address in headers, such as To, Cc, and Bcc. Again, the multiad-
dress getaddresses utility avoids comma clashes between names and address separa-
tors; since it also handles the single address case, it suffices for From headers as well:
>>> rawfromheader = '"=?UTF-8?Q?Walmart?=" <[email protected]>'
>>> rawtoheader = rawfromheader + ', ' + rawfromheader
>>> rawtoheader
936 | Chapter 13: Client-Side Scripting