Christopher Batey's Blog has moved to www.batey.info: August 2013

Saturday, August 31, 2013

Book Review: Driving Technical Change: Why People on Your Team Don't Act on Good Ideas, and How to Convince Them They Should

http://www.amazon.co.uk/Driving-Technical-Change-People-Convince/dp/1934356603

Author: Terrence Ryan

Before reading a technical book I tend to look at what the author is about. Terrence works for Adobe, has an active github account where it looks like he works primarily with front end technologies: CSS, JavaScript.

Review

I categorise books as follows:

Reference - useful for a particular technology when actively using it.
Tutorial/teach - useful for learning about a technology, easy to read even if not currently using it. For example beginners guides for technologies.
General technical books - describing methodologies, practices and general feel good technical books. Books like Clean Code by Uncle Bob fall into this category.

Driving technical change definitely falls into category number three. I read it while commuting and got through it in a couple of hours. It is split into the following sections:

Describing the type of people that typically oppose change. These are:

The unaware: don't want the change because they are ignorant of the new technology/process. These people are easy to get on your side if you help them.
The burned: don't want change because they have previously been burned by the technology/process in question.
The cynic: prefer to look clever by rejecting change. These people tend to have a superficial knowledge that if you are well prepared you can get them on your side.
The irrational: people who don't want change for an irrational reason e.g they have a personal problem with you. Best to avoid trying to convince these people.

Techniques for driving technical change. This section gives various techniques and indicates which type of person they are effective against. I found this section of the book eventually become quite repetitive.
Finally strategies, how to use the techniques and in which order to actual drive the technical change you want.

Overall I think this book is worth a read as it is very quick to get through and can make you think about whether your colleagues fit into any of the stereotypes. Or perhaps more importantly do you yourself fall into one of them when other people are trying to drive change?

After reading the book it made me realise that if I think a team needs to adopt a new technology or process then it is up to me to put in the work to prove why. Too often people in the work place are trying to introduce new technologies without being able to answer the simple question: What problem is it solving? Or how is it going to improve the software we're producing?

Sunday, August 18, 2013

Groovy MockFor: mocking multiple invocations of the same method with different arguments

Coming at Groovy from a Java background it wasn't long before I wanted to see what mocking was like in Groovy. I'm fond of Mockito's flavour of mocking where I setup my mocks to behave appropriately, poke the class under test, then verify the correct invocations have been made with the class under test's dependencies. Saying that I have nothing against the setup expectations and then fail at the end if they haven't been met.

So Groovy MockFor puzzled me for a while. Lets look at an example of testing a very simple Service class that is required to process entries in a map and store each of them:

And the Datastore looks like this:

Now lets say we want to test that when the Service is initialised that we "open" the Datastore. This can be achieved with MockFor quite easily:

Lets take see what is going on here.

On line 9 we create a new MockFor object to mock the DataStore.
On line 10 we demand that the open method is called and we return true.
On line 11 we get a proxy instance that we can pass into the Service class.
On line 14 we create an instance of the Service class passing in our mocked Datastore.
Finally on line 17 we verify that all our demands have been met.

It is worth pointing out that the MockFor isn't strongly typed, we could have mocked any method even if it doesn't exist. If we run this test we'll get the following failure:

junit.framework.AssertionFailedError: verify[0]: expected 1..1 call(s) to 'open' but was called 0 time(s)

Brilliant. Now lets move onto a move complicated example. Lets say we want to test the processEntry method takes each of the entries and stores them in the Datastore. This is when it became apparent to me that what is happening on line 10 is a closure that is executed when the mocked method is called. It just happened to return true as that was the last statement in the closure and Groovy doesn't always require the return statement. My first attempt to test the above scenario led me to:

Okay so the test fails now:

junit.framework.AssertionFailedError: verify[1]: expected 1..1 call(s) to 'storeField' but was called 0 time(s)

However we can make it pass without writing any meaningful code:

But changing line 6 to the following:

Means we'll get the following failure:

Assertion failed:
assert v == "someValue"
| |
| false
this is a made up value

And then we can actually write some sensible code to make it pass as now we're actually asserting that the correct values have been passed in.

Now finally onto the problem from the title. How do we mock multiple invocations to the same method with different arguments? We may want to do this when we verify all the values in the map passed to processEntry are passed to the Datastore rather than just the first one. This is where if you are coming from a Java/Mockito background you need to think differently. The solution I used in a similar situation looked like this:

Here we've written more complicated code in the closure that will be executed when the storeField method is called. Depending on what the key is that has been passed in a different assertion is executed. Of course you need to add some more code to the closure if we wanted to verify that no other values have been passed in.

This same style can be used to return different values depending on input parameters e.g.

I've also included the equivalent Java Mockito code for comparison.

Friday, August 16, 2013

Gradle, Groovy and Java

I've been moving my projects from Maven to Gradle and it has given me my first exposure to Groovy as it's Gradle's language of choice. Gradle is the perfect build tool for Java projects, Groovy projects and polyglot projects containing any combination of JVM languages. To get going with Groovy with Gradle you need to add the Groovy plugin to a Gradle script - this extends the Java plugin:

apply plugin: 'groovy'

Then the Gradle build will look for both Groovy and Java when you build your project:

Christophers-MacBook-Pro:testing chbatey$ gradle build
:compileJava UP-TO-DATE
:compileGroovy UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:jar UP-TO-DATE
:assemble UP-TO-DATE
:compileTestJava UP-TO-DATE
:compileTestGroovy UP-TO-DATE
:processTestResources UP-TO-DATE
:testClasses UP-TO-DATE
:test UP-TO-DATE
:check UP-TO-DATE
:build UP-TO-DATE

BUILD SUCCESSFUL

IntelliJ's support for Gradle has improved vastly as of late but if you experience problems revert to the Gradle idea plugin instead - I tend to use it if the Gralde project is more than a simple Groovy/Java project. Importing a project with just the above in your Gradle script and having created source folders for Java and Groovy IntelliJ should recognise everything:

Now lets create some Groovy and Java! To allow us to interchange between Java and Groovy and be able to add some unit tests, add the following to the build.gradle:

repositories {

mavenCentral()

}

dependencies {

compile 'org.codehaus.groovy:groovy:2.1.6'

testCompile 'junit:junit:4.+'

}

Now lets see a Groovy class testing a Java class:

On the left we can see a Java class in the main source tree and on the left a Groovy class testing it. This is a nice way to learn some Groovy and write tests with less boilerplate. You can say goodbye to types and semicolons. It is an easy transition for a Java developer as all Java is valid Groovy so you can write Java and Groovyify it as much or as little as you like. There is nothing stopping you from adding production Groovy code and testing it with Java as well.

If you want to stay on the command line then just run gradle test from the command line. If you want the source it's on GitHub: https://github.com/chbatey/GroovyTestingJava

Thursday, August 15, 2013

I don't fix bugs any more

One part of being a software engineer that I used to enjoy was the satisfaction of cracking a hard bug. Trawling through logs, using profilers and heap dumps and theorising on different ways different threads could have interleved etc.

This week it suddenly dawned on me that I don't have to do this any more!

I've spent the last 6+ months working with some great people and we follow a number of great XP practices:

All our acceptance tests are automated, many of them before development has started
All code is done TDD
All production code is paired on
We swap pairs every day, so unless a feature is completed in under a day most of our team will see the code

In addition we also have a great definition of done; for a particular feature it means:

Development complete (implicitly reviewed by the fact it was paired on)
All edge cases functionally tested and signed off
Deployed to our UAT environment

We always go the extra mile for functional testing. If a feature states we should store something persistently we don't only check that it has been but test all the edge cases around the datastore being down, being slow, traffic being dropped by a firewall etc. If is is a multi node cluster datastore (e.g Cassandra) we'll test that we still function when nodes go down etc. And when I say test I don't mean a quick manual test but automated tests running against every commit.

This is the first team I've worked in where I haven't had to fight for software to be developed like this. Regardless of how hard it is to create automated tests for a particular scenario - we'll do it.

So if you like fixing bugs or being woken up in the middle of the night with production problems - don't develop software like this!

I can go on holiday whenever I like: how pairing with regular swaps make my life easier

I'm extremely keen and enthusiastic. When I'm working, if there is someone who knows more about something than I do then I am compelled to study/play with it until I am up to speed. If there is an area of a system I don't understand I'll study it over lunch - I just like knowing things!

The problem with people who have this characteristic is that they tend to become "the expert" or at least the go-to person for different things and this can cause some problems (or at least I see them as problems):

these people get distracted a lot with other people them asking them questions
they become a bottleneck because their input is needed to make decisions
they eventually become "architects" or "tech leads" and don't get to develop any more - this makes them sad :(

In my current project we are fairly strict about pair swapping: every day after our standup we swap people around. One person always stays on a feature so there is continuity and so everyone typically spends two days on a particular task. It's enough time to get stuck in but not bogged down.

After doing this for a few months I've noticed the following:

We have no code ownership at all, if a single pair stays on a feature or one person is always part of the pair that works in a particular area this tends still to happen. Not so for us and I think that is fantastic
Due to number one, we refactor and delete code without anyone getting offended
People don't get disheartened with boring tasks or features as their time working on them is reasonably short
You constantly get fresh views on how we are doing things. Even if you've been on the feature before having four days off means that you still generally have fresh ideas
We have no fake dependencies between features - if there are free developers then the highest priority item will be worked on. No waiting for someone to finish a different feature.

After a while I noticed that we would not swap if something was "nearly done". Why? I guess we thought it would be disruptive and the task would take longer. I began not to like this. I brought it up in a retro and we went back to strict swapping. Now I know what you are thinking - processes shouldn't be rigid, relax man!

However...

When a software engineer says something is nearly done alarm bells should go off! A lot of the time it can still take hours or days so a swap is still a good idea. Or let's give software developers some credit - let's say the feature is actually nearly done. I think that is the perfect time to get fresh eyes on the code. Make sure the entire team is happy with how it's been implemented before we call it done! A new set of eyes may see some new edge cases etc too.

So what am I saying? Daily pair swaps means I'm confident the entire team knows our code base and feature set and can answer questions on both - no more "I don't know - let's check with Ted when he gets back from holiday". This means I can go on holiday whenever I like, and I like holidays so this is great!

Thursday, August 8, 2013

Cassandra vs MongoDB: Replication

What is replication?

Most databases offer some way to recover data in the event of hardware failure. In Cassandra and MongoDB this is achieved via their respective replication strategies where the same data is stored on multiple hosts. In addition replication usually goes hand-in-hand with sharding so I've mentioned some details on sharding. To the impatient people, just read the Q&As :)

Cassandra replication

Cassandra was never designed to be run as a single node - the only time this is likely is in a development environment. This becomes clear as soon as you create a keyspace as you must provide a replication factor. For example:

CREATE KEYSPACE People
           WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}

The keyspace 'People' will replicate every row of every column family to three nodes in the cluster. A few key points about Cassandra replication:

You can create a keyspace with a replication factor greater than your number of nodes and add them later. For instance I ran the above create keyspace command on a single node Cassandra cluster running on my laptop.
You don't need all the replicas available to do an insert. This is based on the consistency of the write (a separate post will come later comparing Cassandra and MongoDB consistency).
You get sharding for free if your number of nodes is greater than your replication factor.

Cassandra is designed to be very fault tolerant - when replicating data the aim is to survive things like a node failure, a rack failure and even a datacentre failure. For this reason anything but the simplest Cassandra setup will use a replication strategy that is rack and datacentre aware. Cassandra gets this information from a snitch. A snitch is responsible for identifying a node's rack and datacentre. Examples of the types of snitch are: PropertyFileSnitch where you define each node's rack and datacentre in a file; EC2Snitch where the rack and datacentre of each node is inferred from the AWS region and availability zone and a RackInferringSnitch where the rack and datacenter of each node is inferred from its IP address. Cassandra uses this information to avoid placing replicas in the same rack and keeping a set number of replicas in each datacenter. Once you have setup your snitch then all of this just happens.

An important feature of Cassandra is that all replicas are equal. There is no master for a particular piece of data and if a node goes down there is no period where a slave replica needs to become the master replica. This makes single node failure (load permitting) nearly transparent to the client (nearly because if there is a query in progress on the node during the failure then the query will fail). Most clients can be configured to retry transparently.

What's an ideal replication factor? That depends on the number of node failures you want to be able to handle and continue working at the consistency you want to be able to write at. If you have 3 replicas and you want to always write to a majority (QUORUM in Cassandra) then you can continue to write with 1 node down. If you want to handle 2 nodes down then you need a replication factor of 5.

MongoDB replication

Fault tolerance and replication are not as apparent in MongoDB from an application developer's point of view. Replication strategy is usually tightly coupled with sharding but this doesn't have to be the case - you can shard without replication and replicate without sharding.

How does this work? Multiple MongoD processes can be added to a replica set. One of these MongoD processes will be automatically elected as the master. Every read and write must then go to the master and the writes are asynchronously replicated to the rest of the nodes in the replica set. If the master node goes down then a new master is automatically elected by the remaining nodes. This means that replication does not result in horizontal scaling as by default everything goes through the master.

Sharding in Mongo is achieved separately by separating collections across multiple shards. Shards are either individual MongoD instances or replica sets. Clients then send queries to query routers (MongoS processes) that route client requests to the correct shard. The metadata for the cluster (i.e. which shard contains what data) is kept on another process called a Config Server.

Ensuring that replicas are unlikely to fail together e.g on the same rack, is down to the cluster setup. The nodes in a replica set must be manually put on different racks etc.

Q&A - Question, Cassandra and Mongo

Q. Are any additional processes required?

C. No - All nodes are equal.

M. No - However if you want sharding a separate MongoS (query router) process and three Config servers are required. Additional nodes that don't store the data may also be required to vote on which node should become the new master in the event of a failure.

Q. Is there a master for each piece of data?

C. No - all replicas are equal. All can process inserts and reads.

M. Yes - a particular MongoD instance is the master in a replica set. The rest are asynchronously replicated to.

Q. How does a client know which replica to send reads and writes to?

C. It doesn't. Writes and reads can be sent to any node and they'll be routed to the correct node. There are however token aware clients that can work this out based on hashing of they row key to aid performance.

M. If sharding is also enabled a separate process runs called MongoS that routes queries to the correct replica set. When there is no sharding then it will discover which replica in the replica set is the master.

Q. Does the application programmer care about how the data is sharded?

C. The row key is hashed so the programmer needs to ensure that the key has reasonably high cardinality. When using CQL3 the row key is the first part of the primary key so put low cardinality fields later in a compound primary key.

M. Yes - a field in the document is the designated shard key. This field is therefore mandatory and split into ranges. Shard keys such as a dates should be avoided as eventually all keys will be in the last range and it will cause the database to require re-balancing.

Christopher Batey's Blog has moved to www.batey.info