Thursday, May 28, 2015

Cassandra Aggregates - min, max, avg, group by

This blog has moved to batey.info and won't be updated here.

Disclaimer: all this was against 2.2-beta so the syntax may have changed.

Cassandra 2.2 introduced user defined functions and user defined aggregates. We can now do things like min, max and average on the server rather than having to bring all the data back to your application. Max and min are built in but we'll see how you could have implemented them your self.

Max/Min


Here's an example table for us to try and implement max/min against.


User defined aggregates work by calling your user defined function on every row returned from your query, they differ from a function because the first value to the function is state that is passed between rows, much like a fold.

Creating an aggregate is a two or three step process:
  1. Create a function that takes in state (any Cassandra type including collections) as the first parameter and any number of additional parameters
  2. (Optionally) Create a final function that is called after the state function has been called on every row
  3. Refer to these in an aggregate
For max we don't need a final function but we will for average later.


Here we're using Java for the language (you can also use JavaScript) and just using Math.max. For our aggregate definition we start with (INITCOND) null (so it will return null for an empty table) and then set the state to be the max of the current state and the value passed in. We can our new aggregate like:


GroupBy


So there's no group by keyword in Cassandra but you can get similar behaviour with a custom user defined aggregate. Imagine you had a table that kept track of everything your customers did e.g



We can write a UDA to get a count of a particular column:


And we keep track of the counts in a map. Example use for counting both the event_type and the origin of the event:


More often than not when you use group by in other databases you are totalling another field. For example imagine we were keeping track of customer purchases and wanted a total amount each customer has spent:



We can create a generate aggregate for that called group_and_total:


And an example usage:


As you can see Haddad spends way too much.

Average


The Cassandra docs have an example of how to use a user defined aggregate to calculate aggregates: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udas

Small print


If you've ever heard me rant about distributed databases you've probably heard me talk about scalable queries, ones that work on a 3 node cluster as well as a 1000 node cluster. User defined functions and aggregates are executed on the coordinator. So if you don't include a partition key in your query all the results are brought back to the coordinator for your function to be executed, if you do a full table scan for your UDF/A don't expect it to be fast if your table is huge.

This functionality is in beta for a very good reason, it is user defined code running in your database! Proper sandboxing e.g with a SecurityManager will be added before this goes mainstream.




28 comments:

vignesh m said...
This comment has been removed by a blog administrator.
karmadip dodiya said...

Great job,
Can you know when approximate Cassandra 2.2 production version will release?

Christopher Batey said...

2.2 is on release candidate 2 so it will be released soon. As to when it is "production ready" that is normally after a few point releases. A general rule of thumb is to wait for DataStax to roll it into DataStax Enterprise :)

keypoint said...

Hi Christopher,

2.2 is now released and I read the documentation, but still no "group by" function. Is it still needed to write a separate function like "state_group_and_count" to achieve group by in SQL? I'm new and not quite sure. Thanks a lot.

Arbi Akhina said...

I'm having a OperationTimedOut: errors={}, last_host=127.0.0.1 when I run a similar aggregate query from CQLSH http://stackoverflow.com/questions/32567571/failed-to-run-custom-aggregation-operationtimedout-errors-last-host-127-0-0

Martin said...

Hi Christoper,

Great examples. Unfortunately the grouping example does not seem to work anymore. In Cassandra 3 beta 2 the incoming state map is a java.util.collections$UnmodifiableMap and it fails with a java.lang.UnsupportedOperationException when using the function.

Anuj said...

Hi Christoper,

I am new to spark/cassandra. I'm using cassandra 2.2 & spark 1.5.1 for internal POC work.
I am successfully able to execute aggregate functions through cqlsh.

I wanted to know is there any way to call the same function using datastax spark cassandra java API ? or I can create/register user defined function in spark itself and perform the same operations on cassandra rows ?

I have already spent couple of days without any success.

Thanks
Anuj Jain

Muhammad Rafif Murazza said...

Hi Christopher,

I just want to ask you one question, can we use this UDA to aggregates values from collections data type?
for examples, I have a field set .
Id | setfield |
1 | {a,b} |
2 | {a,c} |
3 | {a,b} |

is it possible to count the value so that I get the result:
{a : 3, b :2, c:1} ?

Thank you.
Raz

Pratik Shekhar said...

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ pieces of training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Pratik Shekhar
MaxMunus
E-mail: pratik@maxmunus.com
Ph:(0) +91 9066268701
http://www.maxmunus.com/

Dattatray Bhosale said...

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Cassandra, kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Sangita Mohanty
MaxMunus
E-mail: sangita@maxmunus.com
Skype id: training_maxmunus
Ph:(0) 9738075708 / 080 - 41103383
http://www.maxmunus.com/

rock Miller said...

Obtaining personalized antivirus solutions for your computer system is now possible. With the help of AVG Support Number USA

Printer Support 0800-046-0571 All Printer Solutions said...

Absolutely great post here. Thank you for Blog posting.. I am waiting your next post..
Avg Support Number UK
Avg Help Number UK
Avg Phone Number UK

Customer Help Number said...

However, AVG is designed in such a way that whenever a new update is available it automatically gets updated but sometimes for technical reasons it misses the update. In that case, you can call our technical experts on AVG Help Number UK and let us help you in manually updating the software.
AVG Help Number UK

Kim Leong said...

Nicely written and explained. If you are an Avast user then you can consult our techies on Avast Help Number UK.

Avg Support Number 08000418254 UK said...


This blog is a very valuable piece of information
Avg Phone Number UK
Avg Support Number UK
McAfee Help Number UK

Customer Help Number said...

The user, first of all, needs to interact with a specialist of AVG Antivirus. For this dialing AVG, Antivirus Contact Number UK by the user is the primary and most authentic way of receiving accurate answers. Professionals of AVG are the best in giving out workable solutions.
AVG Helpline Number UK

Oliver Brown said...

I'm certainly very happy to read this blog site posts which carries plenty of helpful data
Avg Help Number UK
Avg Helpline Number UK
Avg Support Number UK
Avg Contact Number UK
Avg Technical Help Number UK

Technical Support All Antivirus Product said...

Thank you for sharing this blog. I am waiting for your next blog.
AVG is antivirus software that is a bit complicated for the users from non-technical background to understand when it comes to its technical aspects. If in case you are unable to understand and fix any technical issue related to your antivirus on your own then ask support at Avg Support Number UK | Avg Customer Care Number UK | Avg Phone Number UK

Unknown said...

in php I wrote

execute(new Cassandra\SimpleStatement($sql));
foreach ($result as $row) { var_dump($row);}
?>


$result ​​return :


array (size=1)
'stat.group_and_count(date)' =>
object(Cassandra\Map)[6]
public 'type' =>
object(Cassandra\Type\Map)[9]
public 'keyType' =>
object(Cassandra\Type\Scalar)[7]
...
public 'valueType' =>
object(Cassandra\Type\Scalar)[8]
...
public 'keys' =>
array (size=3)
0 => string '2018-10-04' (length=10)
1 => string '2018-10-06' (length=10)
2 => string '2018-10-28' (length=10)
public 'values' =>
array (size=3)
0 => int 1
1 => int 1
2 => int 3

but how to retrieve result values ​​in separate array ?

ex:

$key[0]= '2018-10-04';
$value[0] = 1;
$key[1]= '2018-10-06';
$value[1] = 1;
$key[1]= '2018-10-28';
$value[1] = 3;

thanks in advance

Teju Teju said...

It is nice blog Thank you provide important information and I am searching for the same information to save my time Big Data Hadoop Online Training

kasperskyhelpnumber said...

Kaspersky is the antivirus that is very efficiently facilitating the security of the systems from all types of malware spyware Trojans and also other unwanted harmful elements. For solution to any type of technical errors or troubles ask the expert at kaspersky Support UK. The team of technicians is well versed with the handling of technical faults and errors related to the software. More help click here kaspersky Helpline UK

Technical Support All Antivirus Product said...

Avast is an antivirus that is serving as security software from more than 30 years, for any technical issue that you get to face while using this software then connect help and support from the experts at Avast Phone Number UK. The team of experts will provide an easy guide for whatever complex issue you will be facing. More help click here Avast Support UK

Albert Smith said...

Avg is an antivirus known for providing the users with all the best possible solutions that can keep the system secure from all types of unwanted harmful elements and also from all unwanted online suspicious activities. For solution to any technical glitch that you get to face while using this software then connect to the team at Avg Customer Care Number UK & Avg Phone Number UK

Antivirus Support said...

AVG is software that provides security to all the systems from all types of malware, spyware and other harmful unwanted elements, but it often happens that the user finds himself or herself stuck in the web of complicated technologies therefore for solution to these connect at Avg Help Number UK. The lines are kept open all the time you can connect from anywhere. More help click here Avg Helpline Number UK

Mcafee Support Number UK said...

I am very much impressed with the simple understandable language written in this piece as I am a person who is not a pro at English therefore for me this blog was still understandable.
How to stop Bullguard from quarantining false positives?
How to fix problems related to McAfee installation on windows 7?
What to do my recent Norton update as disabled the windows firewall?
How to run a smart scan in Avast antivirus?
How to fix a failed AVG antivirus repair?


Technical Support All Antivirus Product said...

I searched so much all over the internet but could not find anything better than this , the writer has done a great job.
If Avast antivirus is not performing as per the set standards, you need to get some errors and issues fixed get connected to the team of trained technicians at Avast Antivirus Help UK. The lines are open all the time there is no such issue that cannot be fixed by the technicians sitting at the help desk. The technical team is actually the best option available for the users who are not very sound with the techniques.

Dr Purva Pius said...

Hello Everybody,
My name is Mrs Sharon Sim. I live in Singapore and i am a happy woman today? and i told my self that any lender that rescue my family from our poor situation, i will refer any person that is looking for loan to him, he gave me happiness to me and my family, i was in need of a loan of $250,000.00 to start my life all over as i am a single mother with 3 kids I met this honest and GOD fearing man loan lender that help me with a loan of $250,000.00 SG. Dollar, he is a GOD fearing man, if you are in need of loan and you will pay back the loan please contact him tell him that is Mrs Sharon, that refer you to him. contact Dr Purva Pius,via email:(urgentloan22@gmail.com) Thank you.

Albert Smith said...

Thanks a lot to the writer, this information given here resolved my issue very easily.
Avg Customer Service & Avg Phone Number