Wednesday, June 4, 2014

Unit testing Java Cassandra applications

I often get asked what is the best method for unit and and integration testing Java code that communicates with Cassandra.

Here are what I think the options are:
  1. Mocking libraries: Use mocking libraries such as Mockito to mock out the driver you are using
  2. Cassandra Unit
  3. Integration test the class in question with against a real Cassandra instance running locally
  4. Stubbed Cassandra (disclaimer: I made this)
Lets take these one by one and look at their respective advantages and disadvantages. The things to consider are:
  • Speed of the tests
  • Able to run the tests concurrently and how easy this is to achieve
  • Able to test everything? Even failures?
  • Are the tests brittle?
  • Readability of the tests
  • Requirements on environment e.g do you need a Cassandra instance installed?
  • Confidence it will work against a real Cassandra
  • Subjective nonsense by writer of this blog

1 Mock the library

You can use a combination of factories and mocking to stop the driver actually interacting with Cassandra. Then verify your code's interactions with these mocks.

  • Fast - no I/O
  • Execute tests concurrently
  • No Cassandra instance required on machine your code compile runs
  • Can test everything including the various failures as you can mock the library to throw ReadTimeout exceptions etc
  • You may mock the driver to behave in a different way to how it will behave 
  • Very hard to understand tests due to large amount of boiler plate mocking code
  • Very brittle tests. Change your driver, change your tests! Change from a query to a prepared statement, change your tests!
  • A lot of boiler plate. Take the Datstax driver for example, it returns a ResultSet, which is iterable, fancy writing the code that mocks it returning many results?
  • Don't do it if you wish to remain sane

2 Cassandra Unit

Cassandra Unit is a tool for starting an embedded Cassandra in the JVM your tests are running. It also has a great API for ingesting data into Cassandra for your tests.

  • Pretty fast. It is all in process.
  • Can run tests concurrently if they use different keyspaces and none of the tests turn it off etc
  • Can use CQL to load data as it is a real Cassandra. I think this leads to readable tests
  • No Cassandra required on machine. Can use the it via a Maven dependency
  • High confidence your code will work against a real Cassandra 

  • Unable to test failure scenarios. What is a read time out when there is only one node running in the same JVM as your test?
  • No way to verify consistency of queries

  • Very useful tool for in process happy path tests

3 Integration style tests using a real Cassandra

This is something I've done a lot of recently. Every dev machine at my current work place has a Cassandra instance running (using the awesome tool ccm). Then we test our DAOs by assuming it is there and doing testing in a dynamic keyspace.

  • Can run tests concurrently if they use different keyspaces and none of the tests turn it off etc
  • Tests aren't brittle, can change from queries to prepared statements, or the queries involved without changing tests
  • Readable tests - all data setup is done in CQL
  • Very high confidence it will work against a real Cassandra as the test is against a  real Cassandra :)
  • Probably the slowest option. But it is still millisecond quick.
  • Hard to test scenarios other than turning the node off. This then makes the tests slow.
  • Cassandra required to build and run tests
  • Slightly slower but very useful for testing happy paths
  • Very similar to using Cassandra unit

4 Stubbed Cassandra

Stubbed Cassandra is a new open source tool that pretends to be Cassandra and can be primed to returns rows, read timeouts, write timeouts and unavailable errors. It can be used via a maven dependency or as a standalone server.

  • Very fast
  • Can test all types of errors and be confident in what the driver does as the driver thinks it is a real Cassandra
  • Can run many instances inside the same JVM listening on different binary ports. So can run tests concurrently with no extra effort e.g no requirement to use different keyspaces
  • Tests less brittle than mocking the driver. Can change driver without changing test but if you change queries you need to update your priming
  • No requirement to have a real Cassandra. Just brought in by a maven dependency
  • Slightly more brittle than a real Cassandra/Cassandra unit. Requires priming on the query, priming for each prepared statement
  • Slightly less confidence it will work against a real Cassandra as it isn't a real Cassandra. But more confidence than mocking
  • It is new and does not support all of Cassandra's features, so if you use a feature that Scassandra doesn't support you are stuck!

  • Best solution for all error case testing
  • Best solution if you need to execute tests concurrently


rgknp said...

Very informative post. Any suggestions on testing Scala-Cassandra apps? Particularly which use Spark and Spark Streaming.


Unknown said...

The Article on Unit testing Java Cassandra applications is very informative. It give detail information about it .Thanks for Sharing the information on Unit Testing in Java. Software Testing Company