Foundations of Python Network Programming

(WallPaper) #1

Chapter 5 ■ Network Data aND Network errors


80


In any case, the string '4253' is not how your computer represents this number as an integer variable in Python.
Instead, it will store it as a binary number, using the bits of several successive bytes to represent the ones place, twos
place, fours place, and so forth of a single large number. You can glimpse the way that the integer is stored by using the
hex() built-in function at the Python prompt.





hex(4253)
'0x109d'





Each hex digit corresponds to four bits, so each pair of hex digits represents a byte of data. Instead of being stored
as four decimal digits (4, 4, 2, and 3), with the first 4 being the “most significant” digit (since tweaking its value would
throw the number off by a thousand) and 3 being its least significant digit, the number is stored as a most significant
byte 0x10 and a least significant byte 0x9d, adjacent to one another in memory.
But in which order should these two bytes appear? Here we reach a point of great difference among the
architectures of different brands of computer processors. While they will all agree that the bytes in memory have an
order and they will all store a string like Content-Length: 4253 in exactly that order starting with C and ending with 3 ,
they do not share a single idea about the order in which the bytes of a binary number should be stored.
We describe the difference this way: some computers are “big-endian” (for example, older SPARC processors)
and put the most significant byte first, just like we do when writing decimal digits; other computers (like the nearly
ubiquitous x86 architecture) are “little-endian” and put the least significant byte first (where “first” means “at the byte
with the lower memory address”).
For an entertaining historical perspective on this issue, be sure to read Danny Cohen’s paper IEN-137, “On Holy
Wars and a Plea for Peace,” which introduced the words big-endian and little-endian in a parody of Jonathan Swift:
http://www.ietf.org/rfc/ien/ien137.txt.
Python makes it easy to see the difference between the two endians. Simply use the struct module, which
provides a variety of operations for converting data to and from popular binary formats. Here is the number 4253
represented first in a little-endian format and then in a big-endian order:





import struct
struct.pack('<i', 4253)
b'\x9d\x10\x00\x00'
struct.pack('>i', 4253)
b'\x00\x00\x10\x9d'





Here I used the struct formatting code 'i', which uses four bytes to store an integer, and this leaves the two upper
bytes zero for a small number like 4253. You can think of the struct endianness codes '<' and '>' for these two
orders as little arrows pointing toward the least significant end of a string of bytes, if that helps you to remember which
one to use. See the struct module documentation in the Standard Library for the full array of data formats that it
supports. It also supports an unpack() operation, which converts the binary data back to Python numbers.





struct.unpack('>i', b'\x00\x00\x10\x9d')
(4253,)





If the big-endian format makes more sense to you intuitively, then you may be pleased to learn that it “won” the
contest of which endianness would become the standard for network data. Therefore, the struct module provides
another symbol, '!', which means the same thing as '>' in pack() and unpack() but says to other programmers
(and, of course, to yourself as you read the code later), “I am packing this data so that I can send it over the network.”

Free download pdf