0% found this document useful (0 votes)
577 views23 pages

Cassandra PPT Final

The document provides information about Apache Cassandra including: 1) It was initially developed at Facebook to meet their scalability and reliability needs for their inbox search feature. 2) Notable points about Cassandra include that it is a column-oriented, distributed database that is scalable, fault-tolerant and consistent. 3) Companies like Facebook, Twitter, Uber, Spotify and Instagram use Cassandra for applications that require high availability, scalability and low latency for large amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
577 views23 pages

Cassandra PPT Final

The document provides information about Apache Cassandra including: 1) It was initially developed at Facebook to meet their scalability and reliability needs for their inbox search feature. 2) Notable points about Cassandra include that it is a column-oriented, distributed database that is scalable, fault-tolerant and consistent. 3) Companies like Facebook, Twitter, Uber, Spotify and Instagram use Cassandra for applications that require high availability, scalability and low latency for large amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

ARYAN DHARMADHIKARI-45

PRUTHA DESHPANDE-42
VAISHNAVI DESHMUKH-41
AMEYA DATE-33
Introduction
• Apache Cassandra is an open source distributed database management
system designed to handle large amounts of data across many commodity
servers, providing high availability with no single point of failure.
• Facebook had developed Cassandra in order to meet the reliability and
scalability needs.
• The reason behind it was that it was designed to fulfil the Storage needs of
the inbox search problem (inbox search enables users to search through
their Facebook inbox ).

• Initial release: 2008


• Stable release: 3.4 / March 8,2016
• Written in: Java
• Type: Database / NoSQL
NOTABLE POINTS
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on
Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point of
failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
FEATURES
• Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate
more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is continuously available for
business-critical applications that cannot afford a failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as
you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats including: structured,
semi-structured, and unstructured. It can dynamically accommodate changes to your data
structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data where you need by
replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and
Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly
fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
DATA MODEL

MySQL Cassandra
KEY POINTS
• NoSQL follows Key-value stores.
• NoSQL is capable in partitioning a database by introducing more and more servers.
• NoSQL is schemaless.
• NoSQL allows in replication which helps in case of loss of data.
• A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is
highly structured.
• Row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation
under a single row key is atomic per replica no matter how many columns are being read or written into.
• Columns are grouped together into sets called column families.
• Cassandra exposes two kinds of columns families, Simple and Super column families. Super column families
can be visualized as a column family within a column family.

PHYSICAL VIEW LOGICAL VIEW


How is Primary Key generated?
ARCHITECTURE

The design goal of Cassandra is to handle big data workloads across multiple nodes
without any single point of failure.
• All the nodes in a cluster play the same role. Each node is independent and at the
same time interconnected to other nodes.
• Each node in a cluster can accept read and write requests, regardless of where the
data is actually located in the cluster.
• When a node goes down, read/write requests can be served from other nodes in
the network.
ARCHITECTURE
• Partitioning : One of the key design features for Cassandra is the ability to scale incrementally. This
requires, the ability to dynamically partition the data over the set of nodes (i.e., storage hosts) in the
cluster.
• Replication : Cassandra stores replicas on multiple nodes to ensure
reliability and fault tolerance. A replication strategy determines the nodes
where replicas are placed. The total number of replicas across the cluster is
referred to as the replication factor.
• A replication factor of 1 means there is only 1 copy of row stored in a cluster.
TABLE OPERATIONS
 CREATING A TABLE
.
TABLE OPERATIONS
 ALTERING A TABLE
ALTER(TABLE| COLUMNFAMILY) <tablename> <instruction>

• Adding a column
ALTER TABLE table name
ADD new column datatype;
• Dropping a column
ALTER TABLE table name
DROP column name;
 DROPPING A TABLE
DROP TABLE<tablename>
APPLICATIONS

• Messaging - Cassandra is a great database which can handle a big amount of data.
So it is preferred for the companies that provide Mobile phones and messaging
services. These companies have a huge amount of data, so Cassandra is best for
them.
• Handle high speed Applications - Cassandra can handle the high speed data so it is
a great database for the applications where data is coming at very high speed from
different devices or sensors.
• Product Catalogs and retail apps - Cassandra is used by many retailers for durable
shopping cart protection and fast product catalog input and output.
• Social Media Analytics and recommendation engine - Cassandra is a great
database for many online companies and social media providers for analysis and
recommendation to their customers.
CASE STUDY-
HOW UBER MANAGES A MILLION WRITES PER SECOND USING MESOS AND CASSANDRA ACROSS MULTIPLE
DATACENTERS

• Uber Technologies, Inc. (commonly referred to as Uber) provides


ride-hailing services, food delivery, and freight transport.

• It is headquartered in San Francisco and operates in approximately 70 countries


and 10,500 cities worldwide.

• Since 2010, over 14 billion rides have been serviced to the customers and a lot of
data has been generated and processed every single day.

• They built their own system that runs Cassandra on top of Mesos.
MESOS
•Mesos is Data Center OS that allows you to program against your datacenter like it’s a single
pool of resources.

•At the time Mesos was proven to run on 10s of thousands of machines, which was one of
Uber’s requirements, so that’s why they chose Mesos. Today Kubernetes could probably work
too.

•Uber has build their own sharded database on top of MySQL, called Schemaless.

•The idea is Cassandra and Schemaless will be the two data storage options in Uber.

•Uber has about 20 Cassandra clusters now and plans on having 100 in the future.
WHY IS MESOS AND CASSANDRA USED?

• Uber found there was hardly any difference, 5-10% overhead, between
running Cassandra on bare metal versus running Cassandra in a container
managed by Mesos.
• Performance is good: mean read latency: 13 ms and write latency: 25 ms,
For their largest clusters they are able to support more than a million
writes/sec and ~100k reads/sec.
• It’s very easy to create and run workloads across clusters.
SPECIFIC USAGE OF CASSANDRA.

• Geospatial Data
• Real Time Analytics
• Caching and Quick Data Retrieval
• Data Sharding
• Fault Tolerance
• Consistency and Reliability
• Scalability and High Availability
PERSONALIZATION AT SPOTIFY USING CASSANDRA

• Spotify uses Cassandra for two major purposes


1. Entity Metadata Store
2. User profile Store
• Why is Cassandra a good fit?
1. Horizontal scaling
2. Cross-site Replication
3.Low-latency operations and tunable consistency
4.Bulk Data Transfer
PERSONALIZATION AT SPOTIFY USING CASSANDRA

Cassandra data model

CREATE TABLE entitymetadata (


entityid text,
featurename text,
featurevalue text,
PRIMARY KEY (entityid, featurekey)
)
CREATE TABLE userprofilelatest (
userid text,
featurename text,
featurevalue text,
PRIMARY KEY (userid, featurename)
)
USE OF CASSANDRA AT INSTAGRAM

• At Instagram they have one of world’s largest deployments


of Apache Cassandra Database
• Use of Cassandra is done for fraud detection, feed and
direct inbox.
• They have really good experience with reliability and
availability of Cassandra
• But there was requirement of improvement in read
latency.
• Instagram’s Cassandra team started working on project to
reduce Cassandra’s read latency, which was RockSandra
USE OF CASSANDRA AT INSTAGRAM

After developing and testing, they implemented and successfully rolled in several production
Cassandra clusters in Instagram and the latency was much lower and consistent

You might also like