Friday, March 27, 2015

Cassandra anti-patterns webinar: Video and Q&A

Last week I gave a webinar on avoiding anti-patterns in Cassandra. It was good fun to do and prepare and if you look through my blog most of the sections have a dedicated post.

Here is the recording:

We got a lot of questions and didn't get to them in the recording so catching up now. If I have missed yours or you think of more then ping me on twitter: @chbatey

Q: When is DSE going to support UDTs?

DSE 4.7 will include a certified version of Cassandra 2.1, sometime in the next few months.

Q: Can you alter a UDT?

Yes see here:

Q: with denormalized data, how do you handle a store name change or staff name change?

First make sure you need the update, when modelling data immutably that is not often the case. If you need to change a small number of rows I'd do it with a small script/program, large number of rows Apache Spark.

Q: I had the idea that C* 2.x has vector clock, am I wrong? 

No Vector clocks in Cassandra, see
Q: Using the event source model with frequent rollups, would that not generate a 'queueing' style anti-pattern if data from previous rollup period then gets deleted?

If you used the same partition and did range queries, yes. But I would use a partition say per day (or what ever the period is that you didn't have rolled up), thus avoiding ever reading over the deleted data.

Q: How would you do the "roll ups" in the account balance calculation example?

Most cases I'd do it in application for the first query that required it. It doesn't matter if two threads get to it first as they can both calculate it and the write to the roll up table would be idempotent. If the rollup calculation takes too long and you don't want to slow down a user request with it then you can schedule it in your app or by a different process.

Q: Why would you not use counters for balance?

Cassandra counters are more for things like statistics, page views etc. You can't update them atomically and they are slower to update then a pure write.

Q: C = Quorum?


Q: How might you go about modeling the "versioning" of time series data so as to avoid updates? I mean where you write a measurement for a particular timestamp and then later on you need to write a new measurement for the same timestamp.

Use a TimeUUID rather than a Timestamp. Then you can have millions per millisecond.

Q: If I perform an "if not exist" write and it fails to reach enough replicas, what state can I expect the data to be in? In other words, can I expect the data to not be written to the cluster?

So assuming it for past the if not exists part (for that you'll get applied = false in the response. Then it is like any write. Cassandra will return how many replicas acked the write. You can't be sure that the rest didn't get it as they may have just not have responded.

Q: I'm wondering if Cassandra could be used to implement distributed locks (Like Redis, Zookeeper)?

You can with LWTs, here are the details:

Q: In order to emulate a queue without falling on this anti-pattern, can I use the new Date Time Compaction Strategy and TTL?

Answered at the end of the recording

Q: And we have 24 table per date. After day we create one table on date and drop table per hour. Is it anti patern.

Moving table is like moving partition, it does avoid the anti-pattern but it is a lot of work.

Q: Why not change the tombstome grace period to delete quickly?

You can, but then you need to keep up with repairs which may not be possible.

Q: What would the use case for using Cassandra in a queueing pattern vs. a traditional message oriented middleware?

People typically try and use Cassandra as as queue when they already have it in their infrastructure and they need to get messages from one DC to another. This is when they fall into the anti-pattern.

Q: For the Queue anti-pattern, the > timeuid clause will help on fetch, what about compaction/jvm issues; any recommendations or comments?

Nothing specifically, the best discussion of Cassandra JVM tuning for GC that I have read is here:

Q: There are times where data simply cannot be written simultaneously and therefore must be joined at a later time. What do you recommend for joining needs? An external tool such as Spark SQL or ?

Answered at the end of the recording.

Q: Probably one of the best Webinars. Example, were really great. Appreciate DataStax arranging for this. Thanks.

Okay okay this wasn't a question :)

Q: Will quorum reads of a partially-successful counter update get the latest info?

Depends on the number of replicas the write for to and at what consistency. You'll get back in the WriteTimeoutException how many acked the write. If it is a QURORUM (e.g 2 if RF = 3) then it will read it, otherwise you don't know.

Q: Can you point to a good read for retry, no rollback?

On failure modes:

Q: How would I go about solving limit offset queries, without having to skip rows programmatically, for example taking a simple page 2 customer table?

Just make sure you have a clustering column and start the next limit query from the last result from the previous query.

Q: You said Cassandra does not do a rollback. Is that true for all cases -- are there any instance where Cassandra would do a rollback?

Not as far as I know.

Q: I missed the beginning. Are UNLOGGED batches OK to use to speed up writes? See:

Q: Great presentation. Regarding the secondary index question, the second one should be much more faster, as it hits the primary key, yes?

Yes, so it only needs to go to a small section of the secondary index table as it knows which node the partition is on.

Q: which is the best pattern for timeseries

This depends on the type of time series, quantity/frequency. What you basically want is partitions that don't grow too large, so in the millions, not hundreds of millions and the use of a TimeUUID as the clustering column.

Q: Are the batch execution started in separate threads when using the the batch optimization?

They will be sent off in parallel, I don't know the threading model here but I imagine they are split on one thread and sent aync. A good question for the cassandra devs who hangout in #cassandra on freenode.

Q: What approach can be taken with dse, which is C* 2.0 and doesn't have UDT's?

You can just have a lot of columns! The next DSE version will be 2.1

Q: Using a time bucket is a way to also prevent the rows from growing too wide (I.e. many millions of columns). Any guidance for the recommended tradeoffs between wide rows with slice queries and more narrow rows and some multi-partition queries?

There is rarely a general rule for Cassandra, it is all about your data set and read/write frequency. However in general I do my best to keep all reads from a single partition and go out of my way to keep it at most 2. If you have a very high ingest rate and you read for long periods this can get hard and you may need to go to more partitions.

Q: Do the same rules apply to batch loading when using SSTableLoader and/or the BulkOutputFormat with Hadoop?

I've never used the BulkOutputFormat with Hadoop. For the SSTableLoader. For the sstableloader command, once you have generated the SS tables then it handles the importing.

Q: is BatchType.LOGGED the default for a BatchStatement?


Q: do we have any ORM framworks for datastax cassandra

The DataStax Java/C# driver now have it built in, there is also the less popular SpringData

Q: What if you have constraint to write data in table only if it is different (by different I meant different by all properties which can be 5-10)?

If you want to write this at a high throughput then I would resolve it at read time as otherwise you'll be doing a read then write which has a lot of race conditions and it a lot slower. IF you include a TimeUUID and write all updates you can then work it out at read time.

Q: Do tombstones get created with data inserted with a TTL and automatically deleted when expired?

Yes it generated a tombstone. For immutable timeseries data the new DateTieredCompaction strategy makes deleting this data a lot more efficient.

Q: Can you go explain a bit more about the de-normalization solution to secondary indexes.

Write the same data but with a partition key as staff ID and the time as the clustering column. This means you can go to a single partition and do a range query. Even a secondary index with a partition key in the query is worse than this as it has to go to the secondary index table and then do a multi partition query in the original table keyed by customer id.

Q: Does the removal of a secondary index cause a performance hit during the delete? Assuming you aren't using the index for any queries

Don't know about this one, I've asked around and will update once i get an answer.

Q: Question about secondary indexes vs inverted inverted superior to secondary? Will global indexes replace inverted indexes?

By inverted I am assuming you mean manually inserting data twice with a different primary key. This will always out perform secondary index as you're storing all the customer events for a staff member on one node and sequentially on disk. For global indexes we'll have to wait and see but that is the idea. The only concern I have is you can specialise the double write to exactly what you want (e.g bucket up staff members or not) where as global indexes will have to be a more general solution.

Q: Using the default token split on adding a node in 1.2.x, what issues/symptoms will I experience if I continue to use this method with low numbers of nodes?

I assume you're talking about vnodes as without them you pick the token split. The allocation of tokens with vnodes is well discussed here: