0% found this document useful (0 votes)

577 views23 pages

Cassandra PPT Final

The document provides information about Apache Cassandra including: 1) It was initially developed at Facebook to meet their scalability and reliability needs for their inbox search feature. 2) Notable points about Cassandra include that it is a column-oriented, distributed database that is scalable, fault-tolerant and consistent. 3) Companies like Facebook, Twitter, Uber, Spotify and Instagram use Cassandra for applications that require high availability, scalability and low latency for large amounts of data.

Uploaded by

Aryan dharmadhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

577 views23 pages

Cassandra PPT Final

Uploaded by

Aryan dharmadhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

ARYAN DHARMADHIKARI-45

PRUTHA DESHPANDE-42
VAISHNAVI DESHMUKH-41
AMEYA DATE-33
Introduction
• Apache Cassandra is an open source distributed database management
system designed to handle large amounts of data across many commodity
servers, providing high availability with no single point of failure.
• Facebook had developed Cassandra in order to meet the reliability and
scalability needs.
• The reason behind it was that it was designed to fulfil the Storage needs of
the inbox search problem (inbox search enables users to search through
their Facebook inbox ).

• Initial release: 2008

• Stable release: 3.4 / March 8,2016
• Written in: Java
• Type: Database / NoSQL
NOTABLE POINTS
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on
Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point of
failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
FEATURES
• Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate
more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is continuously available for
business-critical applications that cannot afford a failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as
you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats including: structured,
semi-structured, and unstructured. It can dynamically accommodate changes to your data
structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data where you need by
replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and
Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly
fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
DATA MODEL

MySQL Cassandra
KEY POINTS
• NoSQL follows Key-value stores.
• NoSQL is capable in partitioning a database by introducing more and more servers.
• NoSQL is schemaless.
• NoSQL allows in replication which helps in case of loss of data.
• A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is
highly structured.
• Row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation
under a single row key is atomic per replica no matter how many columns are being read or written into.
• Columns are grouped together into sets called column families.
• Cassandra exposes two kinds of columns families, Simple and Super column families. Super column families
can be visualized as a column family within a column family.

PHYSICAL VIEW LOGICAL VIEW

How is Primary Key generated?
ARCHITECTURE

The design goal of Cassandra is to handle big data workloads across multiple nodes
without any single point of failure.
• All the nodes in a cluster play the same role. Each node is independent and at the
same time interconnected to other nodes.
• Each node in a cluster can accept read and write requests, regardless of where the
data is actually located in the cluster.
• When a node goes down, read/write requests can be served from other nodes in
the network.
ARCHITECTURE
• Partitioning : One of the key design features for Cassandra is the ability to scale incrementally. This
requires, the ability to dynamically partition the data over the set of nodes (i.e., storage hosts) in the
cluster.
• Replication : Cassandra stores replicas on multiple nodes to ensure
reliability and fault tolerance. A replication strategy determines the nodes
where replicas are placed. The total number of replicas across the cluster is
referred to as the replication factor.
• A replication factor of 1 means there is only 1 copy of row stored in a cluster.
TABLE OPERATIONS
 CREATING A TABLE
.
TABLE OPERATIONS
 ALTERING A TABLE
ALTER(TABLE| COLUMNFAMILY) <tablename> <instruction>

• Adding a column
ALTER TABLE table name
ADD new column datatype;
• Dropping a column
ALTER TABLE table name
DROP column name;
 DROPPING A TABLE
DROP TABLE<tablename>
APPLICATIONS

• Messaging - Cassandra is a great database which can handle a big amount of data.
So it is preferred for the companies that provide Mobile phones and messaging
services. These companies have a huge amount of data, so Cassandra is best for
them.
• Handle high speed Applications - Cassandra can handle the high speed data so it is
a great database for the applications where data is coming at very high speed from
different devices or sensors.
• Product Catalogs and retail apps - Cassandra is used by many retailers for durable
shopping cart protection and fast product catalog input and output.
• Social Media Analytics and recommendation engine - Cassandra is a great
database for many online companies and social media providers for analysis and
recommendation to their customers.
CASE STUDY-
HOW UBER MANAGES A MILLION WRITES PER SECOND USING MESOS AND CASSANDRA ACROSS MULTIPLE
DATACENTERS

• Uber Technologies, Inc. (commonly referred to as Uber) provides

ride-hailing services, food delivery, and freight transport.

• It is headquartered in San Francisco and operates in approximately 70 countries

and 10,500 cities worldwide.

• Since 2010, over 14 billion rides have been serviced to the customers and a lot of
data has been generated and processed every single day.

• They built their own system that runs Cassandra on top of Mesos.
MESOS
•Mesos is Data Center OS that allows you to program against your datacenter like it’s a single
pool of resources.

•At the time Mesos was proven to run on 10s of thousands of machines, which was one of
Uber’s requirements, so that’s why they chose Mesos. Today Kubernetes could probably work
too.

•Uber has build their own sharded database on top of MySQL, called Schemaless.

•The idea is Cassandra and Schemaless will be the two data storage options in Uber.

•Uber has about 20 Cassandra clusters now and plans on having 100 in the future.
WHY IS MESOS AND CASSANDRA USED?

• Uber found there was hardly any difference, 5-10% overhead, between
running Cassandra on bare metal versus running Cassandra in a container
managed by Mesos.
• Performance is good: mean read latency: 13 ms and write latency: 25 ms,
For their largest clusters they are able to support more than a million
writes/sec and ~100k reads/sec.
• It’s very easy to create and run workloads across clusters.
SPECIFIC USAGE OF CASSANDRA.

• Geospatial Data
• Real Time Analytics
• Caching and Quick Data Retrieval
• Data Sharding
• Fault Tolerance
• Consistency and Reliability
• Scalability and High Availability
PERSONALIZATION AT SPOTIFY USING CASSANDRA

• Spotify uses Cassandra for two major purposes

1. Entity Metadata Store
2. User profile Store
• Why is Cassandra a good fit?
1. Horizontal scaling
2. Cross-site Replication
3.Low-latency operations and tunable consistency
4.Bulk Data Transfer
PERSONALIZATION AT SPOTIFY USING CASSANDRA

Cassandra data model

CREATE TABLE entitymetadata (

entityid text,
featurename text,
featurevalue text,
PRIMARY KEY (entityid, featurekey)
)
CREATE TABLE userprofilelatest (
userid text,
featurename text,
featurevalue text,
PRIMARY KEY (userid, featurename)
)
USE OF CASSANDRA AT INSTAGRAM

• At Instagram they have one of world’s largest deployments

of Apache Cassandra Database
• Use of Cassandra is done for fraud detection, feed and
direct inbox.
• They have really good experience with reliability and
availability of Cassandra
• But there was requirement of improvement in read
latency.
• Instagram’s Cassandra team started working on project to
reduce Cassandra’s read latency, which was RockSandra
USE OF CASSANDRA AT INSTAGRAM

After developing and testing, they implemented and successfully rolled in several production
Cassandra clusters in Instagram and the latency was much lower and consistent

Cassandra Presentation Final
100% (3)
Cassandra Presentation Final
71 pages
Nptel Big Data Full PPT Book With Assignment Solution Rajiv Mishra IIT Patna 2021
100% (1)
Nptel Big Data Full PPT Book With Assignment Solution Rajiv Mishra IIT Patna 2021
1,103 pages
Adbms SQL Queries Sem-III
100% (1)
Adbms SQL Queries Sem-III
37 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
60+ MySQL Interview Questions and Answers (2025 Updated)
No ratings yet
60+ MySQL Interview Questions and Answers (2025 Updated)
12 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
The Big Data Technology Landscape
No ratings yet
The Big Data Technology Landscape
36 pages
Cim Short Question and Answer
100% (2)
Cim Short Question and Answer
9 pages
Cassandra Article Review
No ratings yet
Cassandra Article Review
10 pages
Railway Management System
100% (1)
Railway Management System
30 pages
DBMS Question DBMS
100% (1)
DBMS Question DBMS
14 pages
Hadoop 1000 MCQ Question
No ratings yet
Hadoop 1000 MCQ Question
96 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Spark SQL
100% (1)
Spark SQL
25 pages
DBMS Unit 3 Notes
No ratings yet
DBMS Unit 3 Notes
29 pages
Cassandra Complete Notes
No ratings yet
Cassandra Complete Notes
5 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
Big Data and Business Analytics: Lab Manual
100% (1)
Big Data and Business Analytics: Lab Manual
45 pages
PySpark Notes
No ratings yet
PySpark Notes
29 pages
Dbms Unit-I
100% (4)
Dbms Unit-I
80 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Chapter 6
100% (1)
Chapter 6
51 pages
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
No ratings yet
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
20 pages
DBMS Lab Manual
100% (1)
DBMS Lab Manual
76 pages
Python IQ
No ratings yet
Python IQ
123 pages
PLSQL Programs
No ratings yet
PLSQL Programs
16 pages
Chapter 7
No ratings yet
Chapter 7
48 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Ais Chapter 1 and 2 Quizlet
100% (1)
Ais Chapter 1 and 2 Quizlet
63 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Apache Cassandra: by Chethan Gowda
No ratings yet
Apache Cassandra: by Chethan Gowda
12 pages
Unit 5
100% (1)
Unit 5
109 pages
010 Intro Natural Language Processing
No ratings yet
010 Intro Natural Language Processing
43 pages
View Equivalent Schedule in DBMS
No ratings yet
View Equivalent Schedule in DBMS
22 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
Ugc Net Questions For Computer Science DBMS PDF
No ratings yet
Ugc Net Questions For Computer Science DBMS PDF
3 pages
File Formats in Big Data
No ratings yet
File Formats in Big Data
13 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Spark MCQ
No ratings yet
Spark MCQ
3 pages
DBMS Question Bank-2021
100% (1)
DBMS Question Bank-2021
14 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
Cassandra As Used by Facebook
100% (1)
Cassandra As Used by Facebook
12 pages
MIS Chapter 1 One
No ratings yet
MIS Chapter 1 One
24 pages
Big Data Computing - Assignment 3
No ratings yet
Big Data Computing - Assignment 3
3 pages
Ddbms Lab Manual
No ratings yet
Ddbms Lab Manual
100 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Noc19 cs33 Assignment5
No ratings yet
Noc19 cs33 Assignment5
3 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Informatica University
No ratings yet
Informatica University
6 pages
DBMS Handwritten Notes
No ratings yet
DBMS Handwritten Notes
87 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
CV Deshmukh Vaishnavi
No ratings yet
CV Deshmukh Vaishnavi
2 pages
Advanced Database Management System - Tutorials and Notes - Partitioned Parallel Hash Join
No ratings yet
Advanced Database Management System - Tutorials and Notes - Partitioned Parallel Hash Join
6 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
BigData Objective
No ratings yet
BigData Objective
93 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
UK HSE Plant Ageing Study Ref - No. RR823
100% (2)
UK HSE Plant Ageing Study Ref - No. RR823
199 pages
Question Bank ASQL
No ratings yet
Question Bank ASQL
2 pages
Mongodb
No ratings yet
Mongodb
66 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Project Report
100% (1)
Project Report
55 pages
Manual CES
100% (1)
Manual CES
11 pages
DDL and DML Commands in SQL
No ratings yet
DDL and DML Commands in SQL
14 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
Criminal Record Management
No ratings yet
Criminal Record Management
27 pages
Section 1 Quiz Database Design Oracle
No ratings yet
Section 1 Quiz Database Design Oracle
25 pages
Performance Testing-NFR
No ratings yet
Performance Testing-NFR
17 pages
User Manual Rhinomanometer Rhinospir PRO1 en PDF
No ratings yet
User Manual Rhinomanometer Rhinospir PRO1 en PDF
133 pages
Lesson D - 1 Ch04 Data Management Elements of The Database Environment
No ratings yet
Lesson D - 1 Ch04 Data Management Elements of The Database Environment
26 pages
Sqlquries
No ratings yet
Sqlquries
19 pages
DATA PROCESSING - Knec Notes
No ratings yet
DATA PROCESSING - Knec Notes
13 pages
User Guide
No ratings yet
User Guide
773 pages
MIS Case Studies
No ratings yet
MIS Case Studies
4 pages
Sap Abap On Hana Sample Resume 1
No ratings yet
Sap Abap On Hana Sample Resume 1
8 pages
Forensic Analysis of Aqualectra's IT Infrastructure
No ratings yet
Forensic Analysis of Aqualectra's IT Infrastructure
44 pages
Secure Health Information System With Blockchain Technology
No ratings yet
Secure Health Information System With Blockchain Technology
8 pages
Ifinance 4.1 (Inglés)
No ratings yet
Ifinance 4.1 (Inglés)
122 pages
Requerimientos Tecnicos para Infor LN 10-4
No ratings yet
Requerimientos Tecnicos para Infor LN 10-4
21 pages
31ST Daaam International Symposium On Intelligent Manufacturing and Automation - System For Monitoring Control in Industrial Technological Processes
No ratings yet
31ST Daaam International Symposium On Intelligent Manufacturing and Automation - System For Monitoring Control in Industrial Technological Processes
6 pages
Soa in Kubenetes
No ratings yet
Soa in Kubenetes
13 pages
IET332 - Project Report
No ratings yet
IET332 - Project Report
8 pages
PHP Mysql
No ratings yet
PHP Mysql
9 pages
Aashu Sharma BI
No ratings yet
Aashu Sharma BI
3 pages
University of Michigan-Dearborn CIS 556 - Database Systems, Winter 2023 Homework #2
No ratings yet
University of Michigan-Dearborn CIS 556 - Database Systems, Winter 2023 Homework #2
3 pages
Report Cad Cam
No ratings yet
Report Cad Cam
10 pages

Cassandra PPT Final

Uploaded by

Cassandra PPT Final

Uploaded by

ARYAN DHARMADHIKARI-45

• Initial release: 2008

PHYSICAL VIEW LOGICAL VIEW

• Uber Technologies, Inc. (commonly referred to as Uber) provides

• It is headquartered in San Francisco and operates in approximately 70 countries

• Spotify uses Cassandra for two major purposes

Cassandra data model

CREATE TABLE entitymetadata (

• At Instagram they have one of world’s largest deployments

You might also like