Apache Cassandra Tutorial
Apache Cassandra Tutorial
What is
Scalability and high availability without compromising performance.
Casandra store data by dividing data evenly around its cluster of nodes.
Apache Cassandra 1
Every Cassandra table (column family) needs to have a partition key and a clustering
key.
Partition key determines which node stores the data. It's responsible for data distribution
across the notes.
Install on MacOS
$ brew install cassandra
Apache Cassandra 2
Availability and partition tolerance (distributed) over consistency.
However, Cassandra can be tuned with replication factor and consistency level to also
meet Consistency on CAP.
Primary key is also the partition key, which is critical in this distributed system fashion
cause it also determines data locality. Therefore, each row is stored at the node and its
replicas (cluster).
Rows are spread around the cluster based on a hash of the partition key, which is the
first element of the primary key.
Apache Cassandra 3
A partition key will always belong to one node and that partition's data will always be
found on that node.
Clustering Key
Columns followed by first element on primary key. Clustering keys are used to selected
data (cluster them) on ASC or DESC order.
Apache Cassandra 4
Apache Cassandra 5
Simple SELECT query format:
*****
Partitions are groups of rows that share the same partition key. When we issue a read
query, we want to read rows from as few partitions as possible.
Data Model
Collection Types
In Cassandra, we can have Set, List or Map columns.
UPDATE users
SET time_stamps =
{ '2012-9-24 12:00' : 'log in',
'2012-9-24 12:31' : 'product checkout' }
WHERE user_id = 'marcuslucas';
Static Column
Column whose value is shared among all in the table. Similar to static variables in Java.
Apache Cassandra 6
Data Aggregation
Min, max, average, sum and count.
Apache Cassandra 7