Monday, December 22, 2014

Getting started: Cassandra + Spark with Vagrant

I play with a lot of different technologies and I like to keep my work stations clean. I do this by having a lot of vagrant VMs. My latest is Apache Spark with Apache Cassandra. We're going to install a working setup of Cassandra/Spark using Vagrant and Ansible. The Vagrant/Ansible is on Github here.

To get going you'll need:
If you haven't used Ansible before ignore all the paid for Ansible Tower and install it with your favourite package manager e.g homebrew or apt. 

Once that's installed checkout the Vagrant file.

Then launch the VM with vagrant up. This can take some time as it actually installs:
  • Java
  • Cassandra
  • Spark
  • Spark Cassandra connector
I could have baked a virtual box with all this in but the Ansible also documents you install all of these (and me once I've forgotten). As well as being slow it has the disadvantage that if downloads Cassandra/Spark so if their repositories are down it won't work.

The VM runs on port Your Spark master should be up and running on

You'll also have ops centre installed at:

To add the cluster simply click "Add existing cluster.." then enter the IP

If you want to use cqlsh then simply "vagrant ssh" in and then run "cqlsh"

To get spark shell up and running just "vagrant ssh" in and then run the spark-shell command:

Spark shell has been aliased to include the Cassandra spark connector so you can start using Cassandra backed RDDs right away!

Any questions or problems just ping me on twitter: @chbatey


Luis said...

Thank you for all your work!

I was wondering if this method was still available. Went i try to run the vagrant up command i get "host not found"

michael said...