Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1
Load Balancing and High Availability | 121

MySQL


Replication


MySQL has built-in support for master-slave replication. The master logs all transac-
tions to abinlog(binary log). During replication, the binlog is replayed on the slaves,
which apply the transactions to themselves. The slaves can use different storage
engines, which makes this facility useful for ancillary purposes such as backup or
full-text indexing. Master-slave replication works well for load balancing in applica-
tions where reads outnumber writes, since all writes must be applied to the master.


However, master-slave replication as described does not provide high availability;
there is a single master that is a single point of failure. A slave can be promoted to be
the master during failover, but the commands to do this must be executed manually
by a custom monitoring script. There is currently no facility for automatically pro-
moting a slave. Additionally, all clients must be able to determine which member is
currently the master. The MySQL documentation suggests setting up a dynamic
DNS entry pointing to the current master; however, this will introduce another
potential failure point.


MySQL cluster


The primary high-availability solution for MySQL is the MySQL Cluster technology,
available since version 4.1. Cluster is primarily an in-memory database, though as of
version 5, disk storage is supported. The Cluster product is based on the NDB stor-
age engine, backed by data nodes.


MySQL Cluster is designed for localized clusters; distributed clusters are not sup-
ported as the protocol used between nodes is not encrypted or optimized for band-
width usage. The interconnect can use Ethernet (100 Mbps or greater) or SCI
(Scalable Coherent Interconnect, a high-speed cluster interconnect protocol). It is
most effective for clusters with medium to large datasets; the recommended configu-
ration is 1–8 nodes with 16 GB of RAM each.


Because the majority of the data is stored in memory, the cluster must have enough
memory to store as many redundant copies of the full working set as the application
dictates. This number is called thereplication factor. With a replication factor of 2,
each piece of data is stored on two separate servers, and you can lose only one server
out of the cluster without losing data.


For high availability, at least three physical servers must be used: two data nodes and
a management node. The management node is needed to arbitrate between the two
data nodes if they become disconnected and out of synchronization with each other.
A replication factor of 2 is used, so the two data nodes must each have enough mem-
ory to hold the working set, unless disk storage is used.

Free download pdf