Friday, March 27, 2015

Cassandra anti-patterns webinar: Video and Q&A

Last week I gave a webinar on avoiding anti-patterns in Cassandra. It was good fun to do and prepare and if you look through my blog most of the sections have a dedicated post.

Here is the recording:

We got a lot of questions and didn't get to them in the recording so catching up now. If I have missed yours or you think of more then ping me on twitter: @chbatey

Q: When is DSE going to support UDTs?

DSE 4.7 will include a certified version of Cassandra 2.1, sometime in the next few months.

Q: Can you alter a UDT?

Yes see here:

Q: with denormalized data, how do you handle a store name change or staff name change?

First make sure you need the update, when modelling data immutably that is not often the case. If you need to change a small number of rows I'd do it with a small script/program, large number of rows Apache Spark.

Q: I had the idea that C* 2.x has vector clock, am I wrong? 

No Vector clocks in Cassandra, see
Q: Using the event source model with frequent rollups, would that not generate a 'queueing' style anti-pattern if data from previous rollup period then gets deleted?

If you used the same partition and did range queries, yes. But I would use a partition say per day (or what ever the period is that you didn't have rolled up), thus avoiding ever reading over the deleted data.

Q: How would you do the "roll ups" in the account balance calculation example?

Most cases I'd do it in application for the first query that required it. It doesn't matter if two threads get to it first as they can both calculate it and the write to the roll up table would be idempotent. If the rollup calculation takes too long and you don't want to slow down a user request with it then you can schedule it in your app or by a different process.

Q: Why would you not use counters for balance?

Cassandra counters are more for things like statistics, page views etc. You can't update them atomically and they are slower to update then a pure write.

Q: C = Quorum?


Q: How might you go about modeling the "versioning" of time series data so as to avoid updates? I mean where you write a measurement for a particular timestamp and then later on you need to write a new measurement for the same timestamp.

Use a TimeUUID rather than a Timestamp. Then you can have millions per millisecond.

Q: If I perform an "if not exist" write and it fails to reach enough replicas, what state can I expect the data to be in? In other words, can I expect the data to not be written to the cluster?

So assuming it for past the if not exists part (for that you'll get applied = false in the response. Then it is like any write. Cassandra will return how many replicas acked the write. You can't be sure that the rest didn't get it as they may have just not have responded.

Q: I'm wondering if Cassandra could be used to implement distributed locks (Like Redis, Zookeeper)?

You can with LWTs, here are the details:

Q: In order to emulate a queue without falling on this anti-pattern, can I use the new Date Time Compaction Strategy and TTL?

Answered at the end of the recording

Q: And we have 24 table per date. After day we create one table on date and drop table per hour. Is it anti patern.

Moving table is like moving partition, it does avoid the anti-pattern but it is a lot of work.

Q: Why not change the tombstome grace period to delete quickly?

You can, but then you need to keep up with repairs which may not be possible.

Q: What would the use case for using Cassandra in a queueing pattern vs. a traditional message oriented middleware?

People typically try and use Cassandra as as queue when they already have it in their infrastructure and they need to get messages from one DC to another. This is when they fall into the anti-pattern.

Q: For the Queue anti-pattern, the > timeuid clause will help on fetch, what about compaction/jvm issues; any recommendations or comments?

Nothing specifically, the best discussion of Cassandra JVM tuning for GC that I have read is here:

Q: There are times where data simply cannot be written simultaneously and therefore must be joined at a later time. What do you recommend for joining needs? An external tool such as Spark SQL or ?

Answered at the end of the recording.

Q: Probably one of the best Webinars. Example, were really great. Appreciate DataStax arranging for this. Thanks.

Okay okay this wasn't a question :)

Q: Will quorum reads of a partially-successful counter update get the latest info?

Depends on the number of replicas the write for to and at what consistency. You'll get back in the WriteTimeoutException how many acked the write. If it is a QURORUM (e.g 2 if RF = 3) then it will read it, otherwise you don't know.

Q: Can you point to a good read for retry, no rollback?

On failure modes:

Q: How would I go about solving limit offset queries, without having to skip rows programmatically, for example taking a simple page 2 customer table?

Just make sure you have a clustering column and start the next limit query from the last result from the previous query.

Q: You said Cassandra does not do a rollback. Is that true for all cases -- are there any instance where Cassandra would do a rollback?

Not as far as I know.

Q: I missed the beginning. Are UNLOGGED batches OK to use to speed up writes? See:

Q: Great presentation. Regarding the secondary index question, the second one should be much more faster, as it hits the primary key, yes?

Yes, so it only needs to go to a small section of the secondary index table as it knows which node the partition is on.

Q: which is the best pattern for timeseries

This depends on the type of time series, quantity/frequency. What you basically want is partitions that don't grow too large, so in the millions, not hundreds of millions and the use of a TimeUUID as the clustering column.

Q: Are the batch execution started in separate threads when using the the batch optimization?

They will be sent off in parallel, I don't know the threading model here but I imagine they are split on one thread and sent aync. A good question for the cassandra devs who hangout in #cassandra on freenode.

Q: What approach can be taken with dse, which is C* 2.0 and doesn't have UDT's?

You can just have a lot of columns! The next DSE version will be 2.1

Q: Using a time bucket is a way to also prevent the rows from growing too wide (I.e. many millions of columns). Any guidance for the recommended tradeoffs between wide rows with slice queries and more narrow rows and some multi-partition queries?

There is rarely a general rule for Cassandra, it is all about your data set and read/write frequency. However in general I do my best to keep all reads from a single partition and go out of my way to keep it at most 2. If you have a very high ingest rate and you read for long periods this can get hard and you may need to go to more partitions.

Q: Do the same rules apply to batch loading when using SSTableLoader and/or the BulkOutputFormat with Hadoop?

I've never used the BulkOutputFormat with Hadoop. For the SSTableLoader. For the sstableloader command, once you have generated the SS tables then it handles the importing.

Q: is BatchType.LOGGED the default for a BatchStatement?


Q: do we have any ORM framworks for datastax cassandra

The DataStax Java/C# driver now have it built in, there is also the less popular SpringData

Q: What if you have constraint to write data in table only if it is different (by different I meant different by all properties which can be 5-10)?

If you want to write this at a high throughput then I would resolve it at read time as otherwise you'll be doing a read then write which has a lot of race conditions and it a lot slower. IF you include a TimeUUID and write all updates you can then work it out at read time.

Q: Do tombstones get created with data inserted with a TTL and automatically deleted when expired?

Yes it generated a tombstone. For immutable timeseries data the new DateTieredCompaction strategy makes deleting this data a lot more efficient.

Q: Can you go explain a bit more about the de-normalization solution to secondary indexes.

Write the same data but with a partition key as staff ID and the time as the clustering column. This means you can go to a single partition and do a range query. Even a secondary index with a partition key in the query is worse than this as it has to go to the secondary index table and then do a multi partition query in the original table keyed by customer id.

Q: Does the removal of a secondary index cause a performance hit during the delete? Assuming you aren't using the index for any queries

Don't know about this one, I've asked around and will update once i get an answer.

Q: Question about secondary indexes vs inverted inverted superior to secondary? Will global indexes replace inverted indexes?

By inverted I am assuming you mean manually inserting data twice with a different primary key. This will always out perform secondary index as you're storing all the customer events for a staff member on one node and sequentially on disk. For global indexes we'll have to wait and see but that is the idea. The only concern I have is you can specialise the double write to exactly what you want (e.g bucket up staff members or not) where as global indexes will have to be a more general solution.

Q: Using the default token split on adding a node in 1.2.x, what issues/symptoms will I experience if I continue to use this method with low numbers of nodes?

I assume you're talking about vnodes as without them you pick the token split. The allocation of tokens with vnodes is well discussed here:


general manager said...

Really effective information shared about webinar,.

Jatin Sethi said...

The guidelines around issues, for example, visiting, contact and individual things change between rehab focuses. You will ordinarily be permitted visits during distributed visiting hours. You'll for the most part be permitted to make and get calls as well, in any case, at times they might be observed.
It's far-fetched you'll have the option to take things, for example, workstations and cell phones into a rehab focus however controls shift
inspirational quotes for drug addicts
recovery quotes

babloo said...

happy marriage anniversary di and jiju status
happy anniversary uncle aunty

Jatin Sethi said...

Did you understand that yespornplease made a look in 2013's lighthearted comedy, Don John? I surmise you think your forlorn ways of life in mother's storm cellar is Hollywood material now, as well, huh? The film's roughly a fap fiend who can not quit jerking off to Internet porn, notwithstanding the way that he's continually getting laid. I shock why the screw they mentioned storage room fag Joseph Gordon-Levitt for the guideline work instead of me. Do you notice a slim washout like him screwing a co-megastar like Scarlett Johansson? No doubt, the executive didn't assume so either, which is the reason he's downsized to Julianne Moore by means of the stop! Definitely, yespornplease.Com is favored with the guide of film fellows, stinky library bums, you, your co-representatives, and each body else. Donald Trump supposedly looks the site online for his Russian pee tape every morning, and every night your mom hollers at your father, "Get off yespornplease and are accessible to bed!" It's basically simpler twitching it to blistering youngsters than prepared 20 minutes for Viagra to kick in and stick it on your mom's sandy no man's land cooch. Talking about adolescents, the site's drum introduction is a goddamn image among the TikTok set, compelling mother and father, teachers, and directing advisors wherever to false they don't remember it. Truly, one and all screwing adores this web website. Some portion of that notoriety comes from their enormous scope of classes. Their Categories page records a hundred essential subgenres of muck, from Blowjob and Anal, to Asian and Ebony, to Big Tits and Old/Young. They've were given 300,000 DIY screw flicks recorded under Amateur, 15,000 Vintage sex films your father and grandpa used to jolt off to, and very nearly 20,000 bits of filth which are supposedly Popular With Women. I'm a sucker for the Public profanity and Lesbians, even as you neckbeards will wrench it to the Cartoon porn and Hentai, and the kinksters can jump into the Bondage district and Fetish portion.Family fun enthusiasts will be agitated with methods for the obvious nonappearance of a good Taboo style. A quest for the timeframe "Interbreeding" turns up entirely, however you can in any case flip up a huge amount of porn by methods for scanning for "Untouchable" or "Step". It's by and by there, similar to it is wherever else, however they would prefer not to make an exorbitant measure of commotion around it. The indistinguishable goes for their Pee recordings, which can be likewise accessible by means of the thousands.

Mark Henry said...

Indeed yespornplease presents to you the best free pornography recordings you can discover on the net.

That is the reason yespornplease is your best choice with regards to picking XXX porno. You can't, and you would prefer not to pass up all that we've gathered for your delight. You would not quit watching the best recordings realizing which is the page where you will discover them. You have effectively discovered it and you can not miss the second to load up with joy taking a gander at the most sweltering and tasty Internet. All deliberately chose with the goal that every video puts you at a thousand and you generally need to return for additional. Of that we are certain, you will like such a lot of that you will return.

We as a whole know the xxx recordings of yespornplease however on our site you can discover the cream de la cream, separating the inferior quality substance. You will presently don't need to sit around investigating recordings and picking the ones with the best quality and substance, we will do it for you.

We're staying put and you'll wind up thinking of us as the best form of yespornplease we buckle down for. We need to please tastes and stay perpetually, we realize that this is accomplished exclusively by offering quality and that is our main thing. That is the reason we welcome you to visit us. We realize that once you see the nature of our material, you will get diligent to our page.
A page where your porno minutes will be the most agreeable and best. You will not need to move from here. You can appreciate and fill yourself with joy without leaving our site briefly.

Need to see free versatile pornography in excellent and HD?

On our site you will appreciate watching the best motion pictures. We sincerely feel that our guests merit what we think they merit. Great, enduring, top notch motion pictures. They merit not to lose subtleties of the scenes introduced by every film they need to see. That is conceivable, in light of the fact that we have an assortment of the best films in HD quality. So you can appreciate the best of the most sultry and distorted snapshots of every video you need to see.

Yespornplease is the ideal spot to observe free pornography video here you will track down the best pornography recordings of the whole organization.
In the event that you can appreciate quality and assortment here. Yespornplease have great material, complimentary and we are continually reestablishing. So you can be certain that with us your fun and joy won't ever end. Try not to make due with something tolerably great, in case you will track down the best on this page.

We offer free pornography video XXX so you can make the most of your sexuality

Why yespornplease and not another page?
Since there could be no other spot like Yes pornography where to observe free pornography recordings of the greatest HD quality and totally horny, similar to our site. Make the most of your sexuality to the greatest, make some great memories and get those climaxes you need such a huge amount with the material we have for you.

yespornplease is the spot, come in and you will consider that to be with the expectation of complimentary pornography films we have no opposition. We are the awesome. There could be no other equivalent and there will not be. We work pondering your fulfillment consistently. We search for simply the best material.

Mark Henry said...
This comment has been removed by the author.