0% found this document useful (0 votes)
45 views

Apache Cassandra Tutorial

Apache Cassandra is a distributed database system that provides scalability and high availability without compromising performance. It uses a shared-nothing architecture where data is divided evenly across the cluster nodes, with each node responsible for a partition of data and no single point of failure. Cassandra offers replication across multiple nodes for fault tolerance and can replicate data across multiple data centers for high availability.

Uploaded by

marcuslucas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Apache Cassandra Tutorial

Apache Cassandra is a distributed database system that provides scalability and high availability without compromising performance. It uses a shared-nothing architecture where data is divided evenly across the cluster nodes, with each node responsible for a partition of data and no single point of failure. Cassandra offers replication across multiple nodes for fault tolerance and can replicate data across multiple data centers for high availability.

Uploaded by

marcuslucas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Apache Cassandra

What is
Scalability and high availability without compromising performance.

Distributed database system using a shared nothing architecture.

A single logical database is spread across a cluster of nodes.

Casandra store data by dividing data evenly around its cluster of nodes.

Each node is responsible for part of the data.

The act of distributing data across nodes is referred to as data partitioning.

Not master-slave architecture, is decentralized, all notes are the sam.

Every node in the cluster is identical.

Used by Apple, Netflix, etc.

Data is automatically replicated to multiple nodes for fault tolerance.

Is possible to replicate across multiple data centers.

Failed nodes can be replaced with no downtime.

Apache Cassandra 1
Every Cassandra table (column family) needs to have a partition key and a clustering
key.
Partition key determines which node stores the data. It's responsible for data distribution
across the notes.

Install on MacOS
$ brew install cassandra

Cqlsh - Command Line Client


To query the database, insert, select, etc.

Cassandra Favors AP on CAP Theorem

Apache Cassandra 2
Availability and partition tolerance (distributed) over consistency.

However, Cassandra can be tuned with replication factor and consistency level to also
meet Consistency on CAP.

Primary and Partition Key


CREATE TABLE videos (videoid uuid, name varchar, description var

Primary key is also the partition key, which is critical in this distributed system fashion
cause it also determines data locality. Therefore, each row is stored at the node and its
replicas (cluster).

Rows are spread around the cluster based on a hash of the partition key, which is the
first element of the primary key.

Apache Cassandra 3
A partition key will always belong to one node and that partition's data will always be
found on that node.

Clustering Key
Columns followed by first element on primary key. Clustering keys are used to selected
data (cluster them) on ASC or DESC order.

Apache Cassandra 4
Apache Cassandra 5
Simple SELECT query format:

SELECT * FROM user_videos WHERE userid = xxxx ORDER BY addded_da

*****

Partitions are groups of rows that share the same partition key. When we issue a read
query, we want to read rows from as few partitions as possible.

Data Model
Collection Types
In Cassandra, we can have Set, List or Map columns.

CREATE TABLE users (


user_id text PRIMARY KEY,
first_name text,
last_name text,
emails set<text>,
top_places list<text>;
time_stamps map<timestamp, text>
);

Example of updating map:

UPDATE users
SET time_stamps =
{ '2012-9-24 12:00' : 'log in',
'2012-9-24 12:31' : 'product checkout' }
WHERE user_id = 'marcuslucas';

Static Column
Column whose value is shared among all in the table. Similar to static variables in Java.

Apache Cassandra 6
Data Aggregation
Min, max, average, sum and count.

Apache Cassandra 7

You might also like