Architecture of Apache Cassandra
Last Updated :
27 Sep, 2021
Avinash Lakshman and Prashant Malik initially developed Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open source project on google code in July 2008. It became an Apache incubator project in March 2009. It became one of the top level project in 17 Feb 2010. Fueled by the internet revolution, mobile devices, and e-commerce, modern applications have outgrown relational databases. Out of necessity, a new generation of databases has emerged to address large-scale, globally distributed data management challenges.
Cassandra powers online services and mobile backend for some of the world’s most recognizable brands, including Apple, Netflix, and Facebook.
Architecture of Apache Cassandra:
In this section we will describe the following component of Apache Cassandra.
Basic Terminology:
Node
Data center
Cluster
Operations:
Read Operation
Write Operation
Storage Engine:
CommitLog
Memtables
SSTables
Data Replication Strategies
let's discuss one by one.
Basic Terminology:
1. Node:
Node is the basic component in Apache Cassandra. It is the place where actually data is stored. For Example:As shown in diagram node which has IP address 10.0.0.7 contain data (keyspace which contain one or more tables).

Figure - Node
2. Data Center:
Data Center is a collection of nodes.
For example:
DC – N1 + N2 + N3 ….
DC: Data Center
N1: Node 1
N2: Node 2
N3: Node 3
3. Cluster:
It is the collection of many data centers.
For example:
C = DC1 + DC2 + DC3….
C: Cluster
DC1: Data Center 1
DC2: Data Center 2
DC3: Data Center 3

Figure - Node, Data center, Cluster
Operations:
1. Read Operation:
In Read Operation there are three types of read requests that a coordinator can send to a replica. The node that accepts the write requests called coordinator for that particular operation.
- Step-1: Direct Request:
In this operation coordinator node sends the read request to one of the replicas. - Step-2: Digest Request:
In this operation coordinator will contact to replicas specified by the consistency level. For Example: CONSISTENCY TWO; It simply means that Any two nodes in data center will acknowledge. - Step-3: Read Repair Request:
If there is any case in which data is not consistent across the node then background Read Repair Request initiated that makes sure that the most recent data is available across the nodes.
2. Write Operation:
- Step-1:
In Write Operation as soon as we receives request then it is first dumped into commit log to make sure that data is saved. - Step-2:
Insertion of data into table that is also written in MemTable that holds the data till it’s get full. - Step-3:
If MemTable reaches its threshold then data is flushed to SS Table.

Figure - Write Operation in Cassandra
Storage Engine:
- Commit log:
Commit log is the first entry point while writing to disk or memTable. The purpose of commit log in apache Cassandra is to server sync issues if a data node is down. - Mem-table:
After data written in Commit log then after that data is written in Mem-table. Data is written in Mem-table temporarily. - SSTable:
Once Mem-table will reach a certain threshold then data will flushed to the SSTable disk file.
Data Replication Strategy:
Basically it is used for backup to ensure no single point of failure. In this strategy Cassandra uses replication to achieve high availability and durability. Each data item is replicated at N hosts, where N is the replication factor configured \per-instance".
There are two type of replication Strategy: Simple Strategy, and Network Topology Strategy. These are explained as following below.
1. Simple Strategy:
In this Strategy it allows a single integer RF (replication_factor) to be defined. It determines the number of nodes that should contain a copy of each row. For example, if replication_factor is 2, then two different nodes should store a copy of each row. It treats all nodes identically, ignoring any configured datacenters or racks.
CQL(Cassandra Query language) query for Simple Strategy. A keyspace is created using a CREATE KEYSPACE statement:
create_keyspace_statement ::=
CREATE KEYSPACE [ IF NOT EXISTS ] keyspace_name
WITH options
For instance:
CREATE KEYSPACE User_data
WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 2};
To check keyspace Schema used the following CQl query.
DESCRIBE KEYSPACE User_data
Pictorial Representation of Simple Strategy.

Figure - Simple Strategy
2. Network Topology Strategy:
In this strategy it allows a replication factor to be specified for each datacenter in the cluster. Even if your cluster only uses a single datacenter. This Strategy should be preferred over SimpleStrategy to make it easier to add new physical or virtual datacenters to the cluster later.
CQL(Cassandra Query language) query for Network Topology Strategy.
CREATE KEYSPACE User_data
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 3}
AND durable_writes = false;
To check keyspace Schema used the following CQl query.
DESCRIBE KEYSPACE User_data
Pictorial Representation of Network Topology Strategy.

Figure - Network Topology Strategy
Table Structure in Cassandra:
USE User_data;
CREATE TABLE User_table (
User_id int,
User_name text,
User_add text,
User_phone text,
PRIMARY KEY (User_id)
);
Insert into User_data (User_id, User_name, User_add, User_phone )
VALUES(1000, ‘Ashish’, ‘Noida’, ‘8077178539’);
Insert into User_data (User_id, User_name, User_add, User_phone )
VALUES(1001, ‘Ashish Gupta’, ‘Bangalore’);
Insert into User_data (User_id, User_name, User_add, User_phone )
VALUES(1002, ‘Abi’);
Output:

Figure - Table Structure
Application of Apache Cassandra:
Some of the application use cases that Cassandra excels in include:
- Real-time, big data workloads
- Time series data management
- High-velocity device data consumption and analysis
- Media streaming management (e.g., music, movies)
- Social media (i.e., unstructured data) input and analysis
- Online web retail (e.g., shopping carts, user transactions)
- Real-time data analytics
- Online gaming (e.g., real-time messaging)
- Software as a Service (SaaS) applications that utilize web services
- Online portals (e.g., healthcare provider/patient interactions)
- Most write-intensive systems
Similar Reads
Architecture of HBase
Prerequisites - Introduction to Hadoop, Apache HBase HBase architecture has 3 main components: HMaster, Region Server, Zookeeper. Figure - Architecture of HBase All the 3 components are described below: HMaster - The implementation of Master Server in HBase is HMaster. It is a process in which regio
3 min read
Apache Cassandra tools
Prerequisites - Introduction to Apache Cassandra Apache Cassandra (NOSQL database) Architecture of Apache Cassandra In this article, we are going to discuss the tools of Apache Cassandra which help to perform in various aspects of tasks such that the status of the node, the status of the ring, back
3 min read
Node in Apache Cassandra
In this article, we are going to discuss what is a node in Cassandra, information of node, how we can access the information about the node, and by using Nodetool utility we will also discuss some nodetool commands. let's discuss one by one. Node : A node in Cassandra contains the actual data and it
2 min read
Five main benefits of Apache Cassandra
In this article, we will discuss the 5 main benefits of Apache Cassandra in which scalability, High Availability, High Fault Tolerance, High Performance, Multi-Data Center, and Hybrid Cloud Support are the main factors. Prerequisite - Introduction to Apache Cassandra Scalability : In Cassandra, If a
3 min read
Oracle Architecture
Pre-requisites: Database Oracle Database is a relational database management system (RDBMS) that Oracle Corporation created and marketed. It is one of the most popular RDBMSs on the market and is used to store and retrieve data for a wide range of applications. Oracle Database is well-known for its
5 min read
SSTable in Apache Cassandra
In this article, we are going to discuss SSTable which is one of the storage engines in Cassandra and SSTable components and also, we will cover what type of information kept in different database file in SSTable. Letâs discuss one by one. SSTable : It is one of the storage engines in Apache Cassand
3 min read
Apache Cassandra (NOSQL database)
In this article, we will learn the basics of Apache Cassandra and the basics of CQL (Cassandra Query Language) operations like Create, insert, delete, select, etc. Apache CassandraApache Cassandra is an open-source NoSQL database that is used for handling big data. Apache Cassandra has the capabilit
2 min read
Concept of indexing in Apache Cassandra
Indexing in Apache CassandraIn Apache Cassandra, data can be accessed using attributes that are part of the partition key. For example, if Emp_id is a column in an Employee table and it serves as the partition key, you can filter or search data using this key. In this case, the WHERE clause can be u
4 min read
The Three-Level ANSI-SPARC Architecture
In 1971, DBTG(DataBase Task Group) realized the requirement for a two-level approach having views and schema and afterward, in 1975, ANSI-SPARC realized the need for a three-level approach with the three levels of abstraction comprises of an external, a conceptual, and an internal level. The three-l
2 min read
NoSQL Data Architecture Patterns
Architecture Pattern is a logical way of categorizing data that will be stored on the Database. NoSQL is a type of database which helps to perform operations on big data and store it in a valid format. It is widely used because of its flexibility and a wide variety of services. Architecture Patterns
4 min read