Chapter 5 ■ Network Data aND Network errors
87
JSON is among the best choices available today for sending data between different computer languages. Since
Python 2.6, it has been included in the Standard Library as a module named json. It offers a universal technique for
serializing simple data structures.
import json
json.dumps([51, 'Namárië!'])
'[51, "Nam\u00e1ri\u00eb!"]'
json.dumps([51, 'Namárië!'], ensure_ascii=False)
'[51, "Namárië!"]'
json.loads('{"name": "Lancelot", "quest": "Grail"}')
{u'quest': u'Grail', u'name': u'Lancelot'}
Note from this example that JSON not only allows Unicode characters in its strings but can even include Unicode
characters literally inline in its payload if you tell the Python json module that it need not restrict its output to ASCII
characters. Also note that the JSON representation is defined as producing a string, which is why full strings and not
simply Python byte objects are being used here as input and output from the json module. Per the JSON standard,
you will want to encode its strings as UTF-8 for transmission on the wire.
The XML format is better for documents since its basic structure is to take strings and mark them up by wrapping
them in angle-bracketed elements. In Chapter 10, you will take an extensive look at the various options available
in Python for processing documents written in XML and related formats. For now, however, simply keep in mind
that you do not have to limit your use of XML to when you are actually using the HTTP protocol. There might be a
circumstance when you need markup in text and you find XML useful in conjunction with some other protocol.
Among the many other formats that developers might want to consider are binary formats like Thrift and Google
Protocol Buffers, which are a bit different from the formats just defined because both the client and the server need
to have a code definition available to them of what each message will contain. However, these systems contain
provisions for different protocol versions so that new servers can be brought into production still talking to other
machines with an older protocol version until they can all be updated to the new one. They are efficient, and they pass
binary data with no problem.
Compression
Since the time necessary to transmit data over the network is often more significant than the time your CPU spends
preparing the data for transmission, it is often worthwhile to compress data before sending it. The popular HTTP
protocol, as you will see in Chapter 9, lets a client and server figure out whether they can both support compression.
An interesting fact about the GNU zlib facility, which is available through the Python Standard Library and is
one of the most ubiquitous forms of compression on the Internet today, is that it is self-framing. If you start feeding
it a compressed stream of data, then it can tell you when the compressed data has ended and give you access to the
uncompressed payload that might follow.
Most protocols choose to do their own framing and then, if desired, pass the resulting block to zlib for
decompression. However, you could conceivably promise yourself that you will always tack a bit of uncompressed
data onto the end of each zlib compressed string (here, I will use a single b'.' byte) and watch for your compression
object to split out that “extra data” as the signal that you are done.
Consider this combination of two compressed data streams:
import zlib
data = zlib.compress(b'Python') + b'.' + zlib.compress(b'zlib') + b'.'
data
b'x\x9c\x0b\xa8,\xc9\xc8\xcf\x03\x00\x08\x97\x02\x83.x\x9c\xab\xca\xc9L\x02\x00\x04d\x01\xb2.'
len(data)
28