Thursday, July 10, 2014

Installing Cassandra and Spark with Ansible

I am currently doing a proof of concept with Spark and Cassandra. I quickly need to be able to create and start Cassandra and Spark clusters. Ansible to the rescue!

I split this my ansible playbook into three roles:
  • Cassandra
  • Ops center
  • Spark
My main playbook is very simple:

----
hosts: m25-cassandra
remote_user: ubuntu
sudo: yes
vars:
cluster_name: Test Cluster
seeds: 10.79.134.173
tasks:
- name: install htop
apt: name=htop state=present
roles:
- cassandra
- opscenter
- spark
I have some hosts defined in a separate hosts file called m25-cassandra. I've decided to install htop, I could have out this in a general server role.

I also define a few variables, these of course course could be defined else where per role:
  • cluster_name - this will replace the cluser name in each of the hosts cassandra.yaml
  • seeds - as above
So lets take a look at each role.

Cassandra


Here are the tasks:

name: install java
apt: name=openjdk-7-jre state=present update_cache=yes
- name: add cassandra debian repository
apt_repository: repo='deb http://www.apache.org/dist/cassandra/debian 20x main' state=present
- name: add the key for the cassandra debian repo
apt_key: keyserver=pgp.mit.edu id=F758CE318D77295D
- name: add the other key for cassandra
apt_key: keyserver=pgp.mit.edu id=2B5C1B00
- name: install cassandra
apt: name=cassandra state=present update_cache=yes
- name: override cassandra.yaml file
template: src=cassandra.yaml dest=/etc/cassandra/
- name: make sure cassandra is started
service: name=cassandra state=restarted
This is doing the following:
  • Installing a JRE
  • Adding the Apache Cassandra debian repository
  • Adding the keys for the debian repository
  • Installing the latest version of Cassandra
  • Replacing the cassandra.yaml (details later)
  • Ensuring Cassandra is started
The template cassandra.yaml uses the following variables:
  • cluster_name: '{{ cluster_name }}' - So we can rename the cluster
  • - seeds: "{{ seeds }}" - So when we add a new node it connects to the cluster
  • listen_address: {{ inventory_hostname }} - Listen on the nodes external IP so other nodes can communicate with it
  • rpc_address: {{ inventory_hostname }} - So we can connect ops center and cqlsh to the nodes
Magic! Now adding new hosts to my hosts file with the tag m25_cassandra will get Cassandra installed, connected to the cluster and started.

Ops Center


The tasks file for ops center:

- name: add the debian package for ops center
apt_repository: repo='deb http://debian.datastax.com/community stable main' state=present
- name: add the key for the ops center repository
apt_key: url=http://debian.datastax.com/debian/repo_key state=present
- name: install ops center
apt: name=opscenter state=present update_cache=yes
- name: start opscenter
service: name=opscenterd state=started
This is doing the following:
  • Adding the Datastax community debian repository
  • Adding the key for the repo
  • Installing Ops Center
  • Starting Ops Center
No templates here as all the default configuration is fine.

Spark


The spark maven build can build a debian package but I didn't find a public debian repo with it in so the following just downloads and unzips the Spark package:

- name: download spark
get_url: url=http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-bin-hadoop2.tgz dest=/opt/
- name: unzip spark
unarchive: copy=no src=/opt/spark-1.0.0-bin-hadoop2.tgz dest=/opt
view raw gistfile1.txt hosted with ❤ by GitHub

I start the workers using the start-slaves.sh script from my local master do don't need to start anything on the nodes that have Cassandra on.

Conclusion


Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually fills me with pain! This is just got a PoC, I don't suggest downloading Spark from the public internet or always installing the latest version of Cassandra for your production systems. The full souce including templates and directory structure is here.

1 comment:

Tejuteju said...

Thank you. Well it was a nice post and very helpful information on Big Data Hadoop Online Training India