designed to work with Google’s MapReduce framework, which was created
to process huge data sets across large clusters of computing nodes.
BigTable stores the massive sets of data used by many Google programs, such
as Google Reader, My Search History, Google Earth, YouTube, and Gmail.
BigTable is not available for use outside Google.
The papers that describe Google’s design for both BigTable and MapReduce
are listed in the “References” section at the end of this chapter.
HBase
HBase is the database used by Hadoop, Apache Project’s free software
application for processing huge amounts of data across large clusters of
compute nodes in a cluster. Hadoop is modeled in part after the information in
Google’s MapReduce and Google File System papers. HBase is to BigTable
what Hadoop is to MapReduce.
The main feature of HBase is its ability to host very large tables—on the scale
of billions of rows across millions of columns. It is designed to host them on
commodity hardware. HBase provides a RESTful web service interface that
supports many formats and encodings and is optimized for real-time queries.
Numerous companies are using Hadoop, including some very big names like
Amazon, eBay, Facebook, IBM, LinkedIn, Yahoo!, and Rackspace.
Graph Stores
Graph stores, or graph databases, literally store data as a graph. This means
the data is represented as a series of nodes and indicates how they relate to
each other. In the simplest case, a graph with only one node, only the record
and its properties need to be recorded. The properties list can be as short as
one or as long as a few million (perhaps more).
Rather than allow that awkwardness to grow, most graph databases start
creating new nodes sooner, each node having its own properties and also
explicit relationships that tie each node to other nodes. It is the relationships
that organize the nodes, and the structure is therefore flexible. A graph can
look like a list or a map or a tree or something else entirely.
Graph databases are queried using traversals. A traversal begins at a defined
starting node and follows through related nodes to answer questions such as
“What classes are my friends taking that I am not enrolled in?” or “If server X
has a network connection problem, what web services will be disrupted?” In a