Cassandra
Cassandra was developed by Facebook for its inbox searching feature. It was
released as an open source project when Facebook turned it over to Apache in
- Cassandra is a key/value store that runs on a flexible cluster of nodes
and is also a wide column store, like HBase, discussed in the “Wide Column
Store” section, later in this chapter. Nodes may be added and removed from
the cluster. Data is replicated across multiple nodes of the cluster. There is no
central node, and access to data exists from any node; if the node receiving
the request does not house the specific data requested, it still services the
request by retrieving and sending the data. The main goal of Cassandra is fast
retrieval of data, with fault tolerance being handled through replication across
nodes and speed adjustments via adding additional nodes to create more
access points.
One interesting feature is that Cassandra may be tuned to adjust the trade-off
between speed of transactions and consistency of data. When data is stored, it
is initially stored in memory and gets sent to disk only when specific criteria
are met. This makes interaction very quick. In fact, not all data stored in
Cassandra is designed to persist over time, and data might not get written to
disk at all. This means that not all readers or seekers of data may find a
specific piece, but in cases like Facebook’s need to store inbox search data
that has only limited time value (such as search results that could be different
tomorrow or even 10 minutes from now), this might not matter at all. In these
cases, both access speed and convenience are more important.
Cassandra is being used by Facebook, Twitter, Reddit, and many others.
etcd
The open source project behind etcd is CoreOS, which is working in the
container world. (See Chapter 32, “Containers and Ubuntu,” for more about
containers.) This key/value store is designed specifically for containerized
deployment across a cluster of machines. It is written in Go and is in
production use by many large companies, including Cloud Foundry and
anyone using Kubernetes.
The focus of etcd is four-fold: simplicity, security, speed, and reliability. It
includes a user-facing API and complete access to the source code via
GitHub.