Apache Kafka
Apache Kafka
Ans:- Apache Kafka is an Open source, distributed event streaming platform designed for
handling high-throughput, fault-tolerant and real-time data streaming. It enables applications
to publish, to subscribe to, store and process streams of events (or records) in a scalable and
reliable manner.
It was originally developed by LinkedIn and later open-sourced as a part of the Apache
Software Foundation.
#2.Features of Kafka:-
a)Distributed Architecture:- Kafka is designed to run across a cluster of servers, providing
high scalability and fault tolerance.
b)Real-time Event Processing:- Kafka processes events (records) in real time, enabling low-
latency communication between systems.
c)Durable and Reliable:- Kafka stores records on disk with replication, ensuring data
durability and fault tolerance.
d)High Throughput:- Kafka is capable of handling millions of messages per second, kafka is
ideal for high-traf c systems.
e)Horizontal Scalability:- Kafka can scale out by adding more brokers (nodes) to the cluster
without downtime.
f) Kafka allows producers and Consumers to operate independently, enabling loose coupling
between systems.
4. What is Partition?
-->A topic is split into several parts which are known as the partitions of the topic. These
partitions are separated in an order. The data content gets stored in the partitions within the
topic. Therefore, while creating a topic, we need to specify the number of partitions(the
number is arbitrary and can be changed later). Each message gets stored into partitions with
an incremental id known as its Offset value.
-->Example of Partition:- Suppose the orders topic has 3 partitions:
a) Partition 0 stores orders with IDs 1,4,7.
b) Partition 1 Stores with IDs 2,5,8.
c) Partition 2 stores with IDs 3,6,9.
This allows multiple consumers to process different parttitions in parallel.
-->Kafka brokers are also known as Bootstrap brokers because connection with any one
broker means connection with the entire cluster. Each broker in the cluster knows about all
the brokers, partitions as well as topics.
fi
The below gure shows how broker looks like containing a topic with n number of partitions.
Each broker is holding a topic, namely Topic-x with three partitions 0,1 and 2. Remember, all
partitions do not belong to one broker only, it is always distributed among each broker
(depends on the quantity). Broker 1 and Broker 2 contains another topic-y having two
partitions 0 and 1. Thus, Broker 3 does not hold any data from Topic-y. It is also concluded
that no relationship ever exists between the broker number and the partition number.
fi
6) What is Cluster in Kafka?
--> In Kafka, cluster is a group of one or more brokers(servers) working together to provide
scalable, fault-tolerant and high-performance distributed messaging system. A cluster is the
backbone of kafka's architecture, enabling it to handle large volumes of data ef ciently.
-->We can describe kafka as a distributed cluster that consists of key components
likebrokers, topics and partitions.