0% found this document useful (0 votes)
2 views

Apache Kafka

Apache Kafka is an open-source, distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data streaming. It features a distributed architecture, real-time event processing, and allows for durable and reliable message storage, with topics and partitions enabling efficient data organization and processing. Kafka clusters consist of multiple brokers that manage topics and their partitions, facilitating scalable and efficient data handling.

Uploaded by

prasanna.mallari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Apache Kafka

Apache Kafka is an open-source, distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data streaming. It features a distributed architecture, real-time event processing, and allows for durable and reliable message storage, with topics and partitions enabling efficient data organization and processing. Kafka clusters consist of multiple brokers that manage topics and their partitions, facilitating scalable and efficient data handling.

Uploaded by

prasanna.mallari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

#1.What is Apache Kafka?

Ans:- Apache Kafka is an Open source, distributed event streaming platform designed for
handling high-throughput, fault-tolerant and real-time data streaming. It enables applications
to publish, to subscribe to, store and process streams of events (or records) in a scalable and
reliable manner.
It was originally developed by LinkedIn and later open-sourced as a part of the Apache
Software Foundation.

#2.Features of Kafka:-
a)Distributed Architecture:- Kafka is designed to run across a cluster of servers, providing
high scalability and fault tolerance.

b)Real-time Event Processing:- Kafka processes events (records) in real time, enabling low-
latency communication between systems.

c)Durable and Reliable:- Kafka stores records on disk with replication, ensuring data
durability and fault tolerance.

d)High Throughput:- Kafka is capable of handling millions of messages per second, kafka is
ideal for high-traf c systems.

e)Horizontal Scalability:- Kafka can scale out by adding more brokers (nodes) to the cluster
without downtime.

f) Kafka allows producers and Consumers to operate independently, enabling loose coupling
between systems.

g)Consumers can reprocess messages by replaying events from kafka's log.


fi
3. What is a Topic in Kafka?
-->A topic in kafka is a category or feed name to which records are published.
-->In Kafka, the word topic refers to a category or a common name used to store and publish
a particular stream of data.
-->In Kafka, we can create n number of topics as we want. It is identi ed by its name, which
depends on the user's choice. A producer publishes data to the topics, and a consumer reads
that data from the topic by subscribing it.

-->Example:- Imagine an e-comemrce app. You can create topics like:


a) Orders :- For storing order events
b) payments :- For payment related events

4. What is Partition?
-->A topic is split into several parts which are known as the partitions of the topic. These
partitions are separated in an order. The data content gets stored in the partitions within the
topic. Therefore, while creating a topic, we need to specify the number of partitions(the
number is arbitrary and can be changed later). Each message gets stored into partitions with
an incremental id known as its Offset value.
-->Example of Partition:- Suppose the orders topic has 3 partitions:
a) Partition 0 stores orders with IDs 1,4,7.
b) Partition 1 Stores with IDs 2,5,8.
c) Partition 2 stores with IDs 3,6,9.
This allows multiple consumers to process different parttitions in parallel.

How the Offsets which is described above works?


-->Offsets are partition-speci c. If a topic has multiple partitions, each partition maintain its
own sequence of offsets starting from 0. Kafka assigns offsets incrementally as records are
added to a partition.
-->Also, Once assigned, an offset cannot be changed. This immutability ensures the integrity
of the data.
fi
Example of Offset:-
Partition 0:
Offset 0 : Record A
Offset 1 : Record B
Offset 2 : Record C
Partiton 1:
Offset 0: Record D
Offset 1: Record E

5. What are Brokers?


-->A kafka cluster is comprised of one or more servers which are known as brokers or kafka
brokers. A. broker is a container that holds several topics with their multiple partitions.

-->The brokers in the cluster are identi ed by an integer id only.

-->Kafka brokers are also known as Bootstrap brokers because connection with any one
broker means connection with the entire cluster. Each broker in the cluster knows about all
the brokers, partitions as well as topics.
fi
The below gure shows how broker looks like containing a topic with n number of partitions.

Example of Brokers and Topics:-


Suppose, a Kafka cluster consisting of three brokers, namely Broker 1, Broker 2, and Broker
3

Each broker is holding a topic, namely Topic-x with three partitions 0,1 and 2. Remember, all
partitions do not belong to one broker only, it is always distributed among each broker
(depends on the quantity). Broker 1 and Broker 2 contains another topic-y having two
partitions 0 and 1. Thus, Broker 3 does not hold any data from Topic-y. It is also concluded
that no relationship ever exists between the broker number and the partition number.
fi
6) What is Cluster in Kafka?
--> In Kafka, cluster is a group of one or more brokers(servers) working together to provide
scalable, fault-tolerant and high-performance distributed messaging system. A cluster is the
backbone of kafka's architecture, enabling it to handle large volumes of data ef ciently.
-->We can describe kafka as a distributed cluster that consists of key components
likebrokers, topics and partitions.

You might also like