Foundations of Python Network Programming

(WallPaper) #1

Chapter 5 ■ Network Data aND Network errors


76


In Python, you will normally represent bytes in one of two ways: either as an integer whose value happens to
be between 0 and 255 or as a length-1 byte string where the byte is the single value that it contains. You can type a
byte-valued number using any of the typical bases supported in Python source code—binary, octal, decimal, and
hexadecimal.





0b1100010
98
0b1100010 == 0o142 == 98 == 0x62
True





You can convert a list of such numbers to a byte string by passing them to the bytes() type inside a sequence,
and you can convert back by attempting to iterate across the byte string.





b = bytes([0, 1, 98, 99, 100])
len(b)
5
type(b)
<class 'bytes'>
list(b)
[0, 1, 98, 99, 100]





What can be a bit confusing is that the repr() of a byte string object uses ASCII characters as a shorthand for
the array elements whose byte values happen to correspond to printable character codes, and it uses the explicit
hexadecimal format \xNN only for bytes that do not correspond to a printable ASCII character.





b
b'\x00\x01bcd'





Do not be fooled, however: byte strings are in no way inherently ASCII in their semantics, and they are intended
to represent mere sequences of 8-bit bytes.


Character Strings

If you really do want to transmit a string of symbols over a socket, you need an encoding that assigns each symbol to a
valid byte value. The most popular such encoding is ASCII, which stands for American Standard Code for Information
Interchange, and it defines character codes 0 through 127, which can fit into 7 bits. Therefore, when ASCII is stored in
bytes, the most significant bit is always zero. Codes 0 through 31 represent control commands for an output display, not
actual glyphs such as letters, numbers, and punctuation, so they cannot be displayed in a quick chart like the one that
follows. The three subsequent 32-character tiers of ASCII characters that do represent glyphs are, as you can see, a first
tier of punctuation and digits, then a tier that includes the uppercase letters, and finally a tier of the lowercase letters:





for i in range(32, 128, 32):
... print(' '.join(chr(j) for j in range(i, i+32)))
...
! " # $ % & ' ( ) * + , -. / 0 1 2 3 4 5 6 7 8 9 : ; < = >?
@ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~




Free download pdf