Thursday, August 8, 2013

Cassandra vs MongoDB: Replication

What is replication?

Most databases offer some way to recover data in the event of hardware failure. In Cassandra and MongoDB this is achieved via their respective replication strategies where the same data is stored on multiple hosts. In addition replication usually goes hand-in-hand with sharding so I've mentioned some details on sharding. To the impatient people, just read the Q&As :)

Cassandra replication

Cassandra was never designed to be run as a single node - the only time this is likely is in a development environment. This becomes clear as soon as you create a keyspace as you must provide a replication factor. For example:

CREATE KEYSPACE People
           WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}


The keyspace 'People' will replicate every row of every column family to three nodes in the cluster. A few key points about Cassandra replication:
  • You can create a keyspace with a replication factor greater than your number of nodes and add them later. For instance I ran the above create keyspace command on a single node Cassandra cluster running on my laptop.
  • You don't need all the replicas available to do an insert. This is based on the consistency of the write (a separate post will come later comparing Cassandra and MongoDB consistency).
  • You get sharding for free if your number of nodes is greater than your replication factor.
Cassandra is designed to be very fault tolerant - when replicating data the aim is to survive things like a node failure, a rack failure and even a datacentre failure. For this reason anything but the simplest Cassandra setup will use a replication strategy that is rack and datacentre aware. Cassandra gets this information from a snitch. A snitch is responsible for identifying a node's rack and datacentre. Examples of the types of snitch are: PropertyFileSnitch where you define each node's rack and datacentre in a file; EC2Snitch where the rack and datacentre of each node is inferred from the AWS region and availability zone and a RackInferringSnitch where the rack and datacenter of each node is inferred from its IP address. Cassandra uses this information to avoid placing replicas in the same rack and keeping a set number of replicas in each datacenter. Once you have setup your snitch then all of this just happens.
An important feature of Cassandra is that all replicas are equal. There is no master for a particular piece of data and if a node goes down there is no period where a slave replica needs to become the master replica. This makes single node failure (load permitting) nearly transparent to the client (nearly because if there is a query in progress on the node during the failure then the query will fail). Most clients can be configured to retry transparently.
What's an ideal replication factor? That depends on the number of node failures you want to be able to handle and continue working at the consistency you want to be able to write at. If you have 3 replicas and you want to always write to a majority (QUORUM in Cassandra) then you can continue to write with 1 node down. If you want to handle 2 nodes down then you need a replication factor of 5.


MongoDB replication

Fault tolerance and replication are not as apparent in MongoDB from an application developer's point of view. Replication strategy is usually tightly coupled with sharding but this doesn't have to be the case - you can shard without replication and replicate without sharding. 

How does this work? Multiple MongoD processes can be added to a replica set. One of these MongoD processes will be automatically elected as the master. Every read and write must then go to the master and the writes are asynchronously replicated to the rest of the nodes in the replica set. If the master node goes down then a new master is automatically elected by the remaining nodes. This means that replication does not result in horizontal scaling as by default everything goes through the master.
Sharding in Mongo is achieved separately by separating collections across multiple shards. Shards are either individual MongoD instances or replica sets. Clients then send queries to query routers (MongoS processes) that route client requests to the correct shard. The metadata for the cluster (i.e. which shard contains what data) is kept on another process called a Config Server.  
Ensuring that replicas are unlikely to fail together e.g on the same rack, is down to the cluster setup. The nodes in a replica set must be manually put on different racks etc.


Q&A - Question, Cassandra and Mongo


Q. Are any additional processes required?
C. No - All nodes are equal.
M. No - However if you want sharding a separate MongoS (query router) process and three Config servers are required. Additional nodes that don't store the data may also be required to vote on which node should become the new master in the event of a failure.

Q. Is there a master for each piece of data?
C. No - all replicas are equal. All can process inserts and reads.
M. Yes - a particular MongoD instance is the master in a replica set. The rest are asynchronously replicated to.

Q. How does a client know which replica to send reads and writes to?
C. It doesn't. Writes and reads can be sent to any node and they'll be routed to the correct node. There are however token aware clients that can work this out based on hashing of they row key to aid performance.
M. If sharding is also enabled a separate process runs called MongoS that routes queries to the correct replica set. When there is no sharding then it will discover which replica in the replica set is the master.

Q. Does the application programmer care about how the data is sharded?
C. The row key is hashed so the programmer needs to ensure that the key has reasonably high cardinality. When using CQL3 the row key is the first part of the primary key so put low cardinality fields later in a compound primary key.
M. Yes - a field in the document is the designated shard key. This field is therefore mandatory and split into ranges. Shard keys such as a dates should be avoided as eventually all keys will be in the last range and it will cause the database to require re-balancing.

87 comments: