Foundations of Python Network Programming

(WallPaper) #1

Chapter 8 ■ CaChes and Message Queues


14 0


even be preemptively cleared of stale items related to the changed value. But sometimes there can be great value in
caching intermediate results at higher levels of the application such as data structures, snippets of HTML, or even
entire web pages. That way, a cache hit prevents not only a database access but also the cost of turning the result into
a data structure and then into rendered HTML.
There are many good introductions and in-depth guides that are linked to from the Memcached site, as well as a
surprisingly extensive FAQ; it’s as though the Memcached developers have discovered that catechism is the best way
to teach people about their service. I will just make some general points here.
First, keys have to be unique, and consequently developers tend to use prefixes and encodings to keep distinct
the various classes of objects they are storing. You often see things like user:19, mypage:/node/14, or even the entire
text of a SQL query used as a key. Keys can be only 250 characters long, but by using a strong hash function, you can
get away with lookups that support longer strings. The values stored in Memcached, by the way, can be longer than
keys but are limited to 1MB in length.
Second, you must always remember that Memcached is a cache. It is ephemeral, it uses RAM for storage, and
if restarted, it remembers nothing that you have ever stored! Your application should always be able to recover and
rebuild all of its data if the cache should disappear.
Third, make sure that your cache does not return data that is too old to be accurately presented to your users.
“Too old” depends entirely upon your problem domain. A bank balance probably needs to be absolutely up-to-date,
while “today’s top headline” can probably be a few minutes old on a news site’s front page.
There are three approaches to solving the problem of stale data and making sure that it gets cleaned up and is not
returned forever far past its useful shelf life.


•    Memcached will let you set an expiration date and time on each item that you place in the
cache, and it will take care of dropping these items silently when the time comes.

•    You can reach in and actively invalidate particular cache entries the moment that they become
invalid—if you have a way to map from the identity of a piece of information to all of the keys
in the cache that could possibly have included it.

•    You can rewrite and replace entries that are invalid instead of simply removing them, which
works well for entries that might be hit dozens of times per second. Instead of all of those
clients finding the missing entry and all trying to recompute it simultaneously, they find
the rewritten entry there instead. For the same reason, prepopulating the cache when an
application first comes up can be a crucial survival skill for large sites.

As you might guess, decorators are a popular way to add caching in Python since they wrap function calls without
changing their names or signatures. If you look at the Python Package Index, you will find several decorator cache
libraries that can take advantage of Memcached.


Hashing and Sharding


The design of Memcached illustrates an important principle that is used in several other kinds of databases and that
you might want to employ in architectures of your own. When faced with several Memcached instances in a list, a
Memcached client will shard the database by hashing each key’s string value and letting the hash determine which
server in the Memcached cluster is used to store that particular key.
To understand why this is effective, consider a particular key-value pair—such as the key sq:42 and the
value 1764 that might be stored by Listing 8-1. To make the best use of the RAM it has available, the Memcached
cluster wants to store this key and value exactly once. But to make the service fast, it wants to avoid duplication
without requiring any coordination between the different servers or communication between all of the clients.
This means that all of the clients, without any other information to go on than (a) the key and (b) the list
of Memcached servers with which they are configured, need some scheme for working out where that piece of
information belongs. If they fail to make the same decision, then not only might the key and value be copied to several
servers and reduce the overall memory available but also a client’s attempt to remove an invalid entry could leave
other invalid copies elsewhere.

Free download pdf