[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

>>> s = b.decode('utf8') UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat... >>> s = b.decode() UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> s = b.decode('latin1') >>> s 'AÄBäC'

Once you’ve decoded to a Unicode string, you can “convert” it to a variety of different
encoding schemes. Really, this simply translates to alternative binary encoding formats,
from which we can decode again later; a Unicode string has no Unicode type per se,
only encoded binary data does:

>>> s.encode('latin-1') b'A\xc4B\xe4C'

>>> s.encode('utf-8') b'A\xc3\x84B\xc3\xa4C'

>>> s.encode('utf-16') b'\xff\xfeA\x00\xc4\x00B\x00\xe4\x00C\x00'

>>> s.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xc4' in position 1: o...

Notice the last test here: the string you encode to must be compatible with the scheme
you choose, or you’ll get an exception; here, ASCII is too narrow to represent characters
decoded from Latin-1 bytes. Even though you can convert to different (compatible)
representations’ bytes, you must generally know what the encoded format is in order
to decode back to a string:

>>> s.encode('utf-16').decode('utf-16') 'AÄBäC' >>> s.encode('latin-1').decode('latin-1') 'AÄBäC'

>>> s.encode('latin-1').decode('utf-8') UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid dat...

>>> s.encode('utf-8').decode('latin-1') UnicodeEncodeError: 'charmap' codec can't encode character '\xc3' in position 2:...

Note the last test here again. Technically, encoding Unicode code points (characters)
to UTF-8 bytes and then decoding back again per the Latin-1 format does not raise an
error, but trying to print the result does: it’s scrambled garbage. To maintain fidelity,
you must generally know what format encoded bytes are in:

>>> s 'AÄBäC' >>> x = s.encode('utf-8').decode('utf-8') # OK if encoding matches data >>> x 'AÄBäC' >>> x = s.encode('latin-1').decode('latin-1') # any compatible encoding works

Text | 541

[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

Get our desktop app

Company

Features

Documentation

Resources