I split this my ansible playbook into three roles:
- Cassandra
- Ops center
- Spark
My main playbook is very simple:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
---- | |
hosts: m25-cassandra | |
remote_user: ubuntu | |
sudo: yes | |
vars: | |
cluster_name: Test Cluster | |
seeds: 10.79.134.173 | |
tasks: | |
- name: install htop | |
apt: name=htop state=present | |
roles: | |
- cassandra | |
- opscenter | |
- spark |
I also define a few variables, these of course course could be defined else where per role:
- cluster_name - this will replace the cluser name in each of the hosts cassandra.yaml
- seeds - as above
So lets take a look at each role.
Cassandra
Here are the tasks:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: install java | |
apt: name=openjdk-7-jre state=present update_cache=yes | |
- name: add cassandra debian repository | |
apt_repository: repo='deb http://www.apache.org/dist/cassandra/debian 20x main' state=present | |
- name: add the key for the cassandra debian repo | |
apt_key: keyserver=pgp.mit.edu id=F758CE318D77295D | |
- name: add the other key for cassandra | |
apt_key: keyserver=pgp.mit.edu id=2B5C1B00 | |
- name: install cassandra | |
apt: name=cassandra state=present update_cache=yes | |
- name: override cassandra.yaml file | |
template: src=cassandra.yaml dest=/etc/cassandra/ | |
- name: make sure cassandra is started | |
service: name=cassandra state=restarted |
- Installing a JRE
- Adding the Apache Cassandra debian repository
- Adding the keys for the debian repository
- Installing the latest version of Cassandra
- Replacing the cassandra.yaml (details later)
- Ensuring Cassandra is started
The template cassandra.yaml uses the following variables:
- cluster_name: '{{ cluster_name }}' - So we can rename the cluster
- - seeds: "{{ seeds }}" - So when we add a new node it connects to the cluster
- listen_address: {{ inventory_hostname }} - Listen on the nodes external IP so other nodes can communicate with it
- rpc_address: {{ inventory_hostname }} - So we can connect ops center and cqlsh to the nodes
Magic! Now adding new hosts to my hosts file with the tag m25_cassandra will get Cassandra installed, connected to the cluster and started.
Ops Center
The tasks file for ops center:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: add the debian package for ops center | |
apt_repository: repo='deb http://debian.datastax.com/community stable main' state=present | |
- name: add the key for the ops center repository | |
apt_key: url=http://debian.datastax.com/debian/repo_key state=present | |
- name: install ops center | |
apt: name=opscenter state=present update_cache=yes | |
- name: start opscenter | |
service: name=opscenterd state=started |
- Adding the Datastax community debian repository
- Adding the key for the repo
- Installing Ops Center
- Starting Ops Center
No templates here as all the default configuration is fine.
Spark
The spark maven build can build a debian package but I didn't find a public debian repo with it in so the following just downloads and unzips the Spark package:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- name: download spark | |
get_url: url=http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-bin-hadoop2.tgz dest=/opt/ | |
- name: unzip spark | |
unarchive: copy=no src=/opt/spark-1.0.0-bin-hadoop2.tgz dest=/opt |
I start the workers using the start-slaves.sh script from my local master do don't need to start anything on the nodes that have Cassandra on.
Conclusion
Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually fills me with pain! This is just got a PoC, I don't suggest downloading Spark from the public internet or always installing the latest version of Cassandra for your production systems. The full souce including templates and directory structure is here.
1 comment:
Thank you. Well it was a nice post and very helpful information on Big Data Hadoop Online Training India
Post a Comment