[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

at the contents of binary data in a structured way, though, as well as to construct its
contents, the standard library struct module is a more powerful alternative.


The struct module provides calls to pack and unpack binary data, as though the data
was laid out in a C-language struct declaration. It is also capable of composing and
decomposing using any endian-ness you desire (endian-ness determines whether the
most significant bits of binary numbers are on the left or right side). Building a binary
datafile, for instance, is straightforward—pack Python values into a byte string and
write them to a file. The format string here in the pack call means big-endian (>), with
an integer, four-character string, half integer, and floating-point number:


>>> import struct
>>> data = struct.pack('>i4shf', 2, 'spam', 3, 1.234)
>>> data
b'\x00\x00\x00\x02spam\x00\x03?\x9d\xf3\xb6'
>>> file = open('data.bin', 'wb')
>>> file.write(data)
14
>>> file.close()

Notice how the struct module returns a bytes string: we’re in the realm of binary data
here, not text, and must use binary mode files to store. As usual, Python displays most
of the packed binary data’s bytes here with \xNN hexadecimal escape sequences, because
the bytes are not printable characters. To parse data like that which we just produced,
read it off the file and pass it to the struct module with the same format string—you
get back a tuple containing the values parsed out of the string and converted to Python
objects:


>>> import struct
>>> file = open('data.bin', 'rb')
>>> bytes = file.read()
>>> values = struct.unpack('>i4shf', data)
>>> values
(2, b'spam', 3, 1.2339999675750732)

Parsed-out strings are byte strings again, and we can apply string and bitwise operations
to probe deeper:


>>> bin(values[0] | 0b1) # accessing bits and bytes
'0b11'
>>> values[1], list(values[1]), values[1][0]
(b'spam', [115, 112, 97, 109], 115)

Also note that slicing comes in handy in this domain; to grab just the four-character
string in the middle of the packed binary data we just read, we can simply slice it out.
Numeric values could similarly be sliced out and then passed to struct.unpack for
conversion:


>>> bytes
b'\x00\x00\x00\x02spam\x00\x03?\x9d\xf3\xb6'
>>> bytes[4:8]
b'spam'

152 | Chapter 4: File and Directory Tools

Free download pdf