Thursday, July 10, 2014

Installing Cassandra and Spark with Ansible

I am currently doing a proof of concept with Spark and Cassandra. I quickly need to be able to create and start Cassandra and Spark clusters. Ansible to the rescue!

I split this my ansible playbook into three roles:
  • Cassandra
  • Ops center
  • Spark
My main playbook is very simple:

I have some hosts defined in a separate hosts file called m25-cassandra. I've decided to install htop, I could have out this in a general server role.

I also define a few variables, these of course course could be defined else where per role:
  • cluster_name - this will replace the cluser name in each of the hosts cassandra.yaml
  • seeds - as above
So lets take a look at each role.


Here are the tasks:

This is doing the following:
  • Installing a JRE
  • Adding the Apache Cassandra debian repository
  • Adding the keys for the debian repository
  • Installing the latest version of Cassandra
  • Replacing the cassandra.yaml (details later)
  • Ensuring Cassandra is started
The template cassandra.yaml uses the following variables:
  • cluster_name: '{{ cluster_name }}' - So we can rename the cluster
  • - seeds: "{{ seeds }}" - So when we add a new node it connects to the cluster
  • listen_address: {{ inventory_hostname }} - Listen on the nodes external IP so other nodes can communicate with it
  • rpc_address: {{ inventory_hostname }} - So we can connect ops center and cqlsh to the nodes
Magic! Now adding new hosts to my hosts file with the tag m25_cassandra will get Cassandra installed, connected to the cluster and started.

Ops Center

The tasks file for ops center:

This is doing the following:
  • Adding the Datastax community debian repository
  • Adding the key for the repo
  • Installing Ops Center
  • Starting Ops Center
No templates here as all the default configuration is fine.


The spark maven build can build a debian package but I didn't find a public debian repo with it in so the following just downloads and unzips the Spark package:

I start the workers using the script from my local master do don't need to start anything on the nodes that have Cassandra on.


Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually fills me with pain! This is just got a PoC, I don't suggest downloading Spark from the public internet or always installing the latest version of Cassandra for your production systems. The full souce including templates and directory structure is here.

1 comment:

Tejuteju said...

Thank you. Well it was a nice post and very helpful information on Big Data Hadoop Online Training India