Foundations of Python Network Programming

(WallPaper) #1
Chapter 8 ■ CaChes and Message Queues

141

The solution is that the clients all implement a single, stable algorithm that can turn a key into an integer n that
selects one of the servers from their list. They do this by using a “hash” algorithm, which mixes the bits of a string
when forming a number so that any pattern in the string is, ideally, obliterated.
To see why patterns in key values must be obliterated, consider Listing 8-2. It loads a dictionary of English words
(you might have to download a dictionary of your own or adjust the path to make the script run on your own machine)
and explores how those words would be distributed across four servers if they were used as keys. The first algorithm
tries to divide the alphabet into four roughly equal sections and distributes the keys using their first letter; the other
two algorithms use hash functions.


Listing 8-2. Two Schemes for Assigning Data to Servers: Patterns in the Data and Bits from a Hash


#!/usr/bin/env python3


Foundations of Python Network Programming, Third Edition


https://github.com/brandon-rhodes/fopnp/blob/m/py3/chapter08/hashing.py


Hashes are a great way to divide work.


import hashlib


def alpha_shard(word):
"""Do a poor job of assigning data to servers by using first letters."""
if word[0] < 'g': # abcdef
return 'server0'
elif word[0] < 'n': # ghijklm
return 'server1'
elif word[0] < 't': # nopqrs
return 'server2'
else: # tuvwxyz
return 'server3'


def hash_shard(word):
"""Assign data to servers using Python's built-in hash() function."""
return 'server%d' % (hash(word) % 4)


def md5_shard(word):
"""Assign data to servers using a public hash algorithm."""
data = word.encode('utf-8')
return 'server%d' % (hashlib.md5(data).digest()[-1] % 4)


if name == 'main':
words = open('/usr/share/dict/words').read().split()
for function in alpha_shard, hash_shard, md5_shard:
d = {'server0': 0, 'server1': 0, 'server2': 0, 'server3': 0}
for word in words:
d[function(word.lower())] += 1
print(function.name[:-6])
for key, value in sorted(d.items()):
print(' {} {} {:.2}'.format(key, value, value / len(words)))
print()

Free download pdf