75
Chapter 5
Network Data and Network Errors
The first four chapters of this book showed how hosts are named on an IP network and how to set up and tear down
both TCP streams and UDP datagram connections between hosts. But how should you prepare data for transmission?
How should it be encoded and formatted? And for what kinds of errors will Python programs need to be prepared?
These questions are relevant regardless of whether you are using streams or datagrams, and this chapter provides
all of the basic answers to them.
Bytes and Strings
Computer memory chips and network cards both support the byte as their common currency. This tiny 8-bit package
of information has become our global unit of information storage. There is a difference between memory chips and
network cards, however. Python is able to completely conceal from you the choices that it makes about how to represent
numbers, strings, lists, and dictionaries in memory as your program runs. Unless you use special debugging tools, you
cannot even see the bytes with which these data structures are stored, only how they behave from the outside.
Network communication is different because the socket interface exposes bytes and makes them visible to both
the programmer and the application. When doing network programming, you generally cannot avoid thinking about
how data will be represented on the wire, which raises questions that a high-level language like Python otherwise lets
you avoid.
So, now let’s consider the properties of bytes.
• A bit is the smallest unit of information. It is a digit that can be either zero or one. In
electronics, a bit is often implemented as a wire whose voltage is either hot or tied to ground.
• Eight bits together make a byte.
The bits need to be ordered so that you can tell which is which. When you write a binary number like 01100001 ,
you order the digits in the same direction as you do when writing base-ten numbers, with the most significant bit first
(just as in the decimal number 234, the 2 is the most significant and the 4 is the least significant, because the hundreds
place makes a bigger difference to the number’s magnitude than the tens or ones places).
One way to interpret a lone byte is as a number between 00000000 and 11111111. If you do the math, these are
the values 0 and 255 in decimal.
You can also interpret the highest byte values in the 0 through 255 range as negative numbers since you can
reach them by wrapping around backward from 0. A common choice is to interpret 10000000 through 11111111,
which would normally be 128 through 255, as -128 through -1 instead, because then the most significant digit tells
you whether the number is negative. (This is called two’s-complement arithmetic.) Or you can interpret a byte using a
variety of more complicated rules that will either assign some symbol or meaning to the byte through means of a table
or build even larger numbers by putting the byte together with other bytes.
Network standards use the term octet for the 8-bit byte since in the old days a byte could have a variety of
different lengths on different computers.