Presenter: Johnny Miller
Who is he? Datastax Solutions Architect
Where? Skills matter exchange in London
I went to an "introductory" talk even though I have a lot of experience with Cassandra for a few reasons:
- Meet other people in London that are using Cassandra
- To discover what I don't know about Cassandra
Here are my notes that in roughly the same order as the talk.
What's Cassandra? The headlines
- Been around for ~5 years - originally developed by Facebook for their inbox search
- Distributed key store - column orientated data model
- Tuneable consistency - per request decide how consistent you want the response to be
- Datacenter aware with asynchronous replication
- Designed for use as a cluster - not much value in a single node Cassandra deployment
Gossip - how nodes in a cluster learn about other nodes
- P2P protocol for how nodes discover location and state of other nodes
- New nodes are given seed nodes for bootstrapping - but these aren't single points of failure as they aren't used again
Data distribution and replication
- Replication factor: How many nodes each piece of data is stored on
- Each node is given a range of primary keys to look after
Partitioners - How to decide which node gets what data
- Row keys are hashed to decide node then a replication strategy defines how to pick the other replicas
Replicas - how to select where else the data lives
- All replicas are equally important. No difference between the node the key hashed to and the other replicas that were selected
- Two ways to pick the other replicas:
- Simple: Only single DC. Specify just a replication factor. Hashes the key and then walks the cluster and picks the replicas. Not very clever - all replicas could end up on the same rack
- Network: Configure with a RF per DC. Walk the ring for each DC until it reaches a node in another rack
Snitch - how to define a data centre and a rack
- Informs Cassandra about node topology, designates DC and Rack for every node
- Example: Rack inferring snitch designates DC and Rack based on the IP of the node
- Example: Property file snitch where every node has the DC and Rack of every other node
- Example: GossipingPropertyFileSnitch: Every node knows its own DC and Rack and tells other nodes via Gossip
- Dynamic snitching: monitors performance of reads, this snitch wraps the other snitches to respond to network latency
Client requests
- Connect to any client in the node - becomes the coordinator. This node knows which nodes to talk to for the request
- Multi DC - picks a coordinator in the other data centre to replicate data there or to get data for a read
Consistency
- Quorum = (Replication Factor/2) + 1 i.e. more than half
- E.g R = 3, Q = 2, tolerate 1 replica going down to continue reading and writing at Quorum
- Per request consistency - can decide certain writes are more important and require higher consistency than others
- Example consistency levels: ANY, ONE, TWO, THREE, QUORUM, EACH_QUORUM, LOCAL_QUORUM
- SERIAL: New in cassandra 2.0
Write requests - what happens?
- The coordinator (node the client connects to) forwards the write to all the replicas in the local DC and designates a coordinator in the other DCs to do the same there
- The coordinator may be a replica but does not need to be
- For a single node writes first go to commit log (disk), then writes to meltable (memory)
- When does the write succeed? Depends on consistency e.g a write consistency of ONE means that the data needs to be in the commit log and memtable of at least one replica
Hinted handoff - how Cassandra deals with nodes being down on write
- Coordinator node keeps hints if one of the replicas down
- When the node comes back up the hints are then sent to the node so it can catch up
- Hints are kept for a finite amount of time - default is three hours
Read requests - what happens?
- Coordinator contacts a number of nodes depending on the consistency - once enough have responded the read can be successful
- Will send requests to node responding the fastest
- If not consistent - compare timestamps + do a read repair
- Possible other background read repair
What was missing?
Overall it was a great talk however here is some possible improvements:
- A glossary/overview at the start? Perhaps a mapping from relational terminology to Cassandra terminology. For example the term keyspace was used a number of times before describing what it is
- Overview of consistency when talking about eventual consistency - however this did come later? A few scenarios for when read/writes at different consistency levels would fail/succeed would have been very helpful
- Compaction required for an intro talk? I thought talking about compaction was a bit too much for an introductory talk as you need to understand memtables and sstables before it makes sense
- The downsides of Cassandra: for example some forms of schema migration/change is a nightmare when you are using CQL3 + have data you need to migrate