0% found this document useful (0 votes)

39 views17 pages

Apache Kafka

Uploaded by

dyvikmanju5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views17 pages

Apache Kafka

Uploaded by

dyvikmanju5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

INTERNET OF THINGS

TOPIC: APACHE KAFKA

Group members
Taniya Souza
[1DA21CS150]
Srinivasan r Guide:
[1DA21CS143] Prof. Lavanya Santosh
Yashwanth B K
CSE Dept
[1DA21CS171]
Yashwanth Gowda B
[1DA21CS172]
What is Apache Kafka?
•Apache Kafka is an open-source distributed event-streaming platform.

•Originally developed by LinkedIn and donated to the Apache Software

Foundation in 2011.

•Designed to handle high-throughput, low-latency, real-time data streams.

Key Features of Kafka
 Distributed System: Runs as a cluster of brokers for scalability and fault tolerance.

 Durable Storage: Data is stored on disk and replicated across brokers.

 High Throughput: Can handle millions of messages per second.

 Low Latency: Ensures quick delivery of messages.

 Decoupling Systems: Allows independent development and scaling of producers and consumers.
Why Use Kafka?
•Ideal for modern data-driven applications.

•Helps in building real-time analytics systems.

•Serves as a backbone for microservices communication.

•Ensures scalability to handle large datasets.

•Integrates with popular big data frameworks like Spark, Flink, and Hadoop.
Core Functions:
•Publish and Subscribe: Enables real-time messaging between producers and consumers through

topics.

•Durable Storage: Persistently stores data streams on disk, allowing replay and recovery.

•Scalable Partitioning: Divides topics into partitions for parallel and distributed data processing.

•Fault Tolerance: Ensures data availability and reliability through replication across brokers.

•Real-Time Stream Processing: Processes and analyzes data streams in real time using Kafka

Streams or external tools.

Kafka Architecture
Overview
• Kafka is a publish-subscribe messaging
system with the following components:
• Producers: Publish messages to topics.
• Consumers: Subscribe to topics to consume
messages.
• Brokers: Manage the storage and retrieval
of messages.
• Topics: Categories to which messages are
published.
• Partitions: Break down topics for scalability.
Kafka Topics
 A topic is a logical channel for data streams.

 Each topic is divided into partitions for parallel processing.

 Data in topics is retained for a configurable period, even after

consumption.

 Topics can have configurations for replication and data retention.

 Example: A “Sales Data” topic could have partitions based on

regions.
Producers and Consumers
•Producers: Send data to Kafka topics.

•Push messages to specific partitions.

•Can define custom partitioning logic (e.g., based on keys).

•Consumers: Read data from topics.

•Join consumer groups for parallel processing.

•Kafka ensures that each partition is read by one consumer in a group.

Brokers and Clusters
• A Kafka cluster consists of multiple brokers.

• Brokers: Handle storage and management of data

streams.

• Each broker handles a subset of partitions.

• Collaborate to provide fault tolerance and scalability.

• Clusters use ZooKeeper (or KRaft, in newer versions)

for managing configurations and leader election.

Kafka Partitions
 Topics are divided into partitions to distribute data

and allow parallelism.

 Data Placement: Messages in partitions are stored

in the order they arrive.

 Key-Based Partitioning: Ensures that messages

with the same key go to the same partition.

 Example: A “User Activity” topic could have partitions

for different user IDs.

Offset and Message
Ordering
• Offset: A unique identifier for each message in a partition.

• Used to keep track of consumed messages.

• Kafka guarantees message order within a partition but not

across partitions.

• Consumers can reset offsets for reprocessing messages.

Durability and Replication
•Kafka ensures durability by replicating data across brokers.

•Leader Replica: Handles all read and write requests for a partition.

•Follower Replicas: Maintain copies and take over if the leader fails.

•Acknowledgments: Producers can configure how many replicas must confirm

a message before it's considered successful.

Use Cases of Kafka
•Real-Time Analytics:
Monitor and analyze social media feeds or website activities.
•Log Aggregation:
Centralized logging from distributed systems.
•Event Sourcing:
Capture application changes as a sequence of events.
•Data Integration:
Sync databases and applications.
•Stream Processing:
Process and analyze data in real-time with Kafka Streams or other tools.
Advantages of Kafka
•Scalability: Can scale horizontally by adding brokers.
•Flexibility: Works with multiple programming languages.
•Resilience: Fault-tolerant with replication and partitioning.
•Performance: Handles millions of events per second with low latency.
•Integration: Seamlessly integrates with popular tools like Spark and Flink.
Challenges with Kafka
 Complex Setup: Requires expertise to configure and maintain.

 Resource-Intensive: High memory usage for durability and performance.

 Message Duplication: Can occur without proper configuration.

 Operational Overhead: ZooKeeper dependency in older versions.

SUMMAR
Y
 Apache Kafka is a distributed platform for real-time data streaming and
processing, designed for high-throughput, low-latency, and fault-tolerant
communication.
 Kafka uses topics for organizing data, partitions for scalability, and
replication for reliability, enabling efficient handling of massive data
streams.
 Common applications include real-time analytics, event-driven
architectures, log aggregation, and data integration between diverse
systems.
THANK YOU

Kafka for Developers and Engineers
No ratings yet
Kafka for Developers and Engineers
7 pages
Kafka
No ratings yet
Kafka
12 pages
Kafkha
No ratings yet
Kafkha
32 pages
Kafka
No ratings yet
Kafka
23 pages
Kafka & Confluent: A Technical Guide
No ratings yet
Kafka & Confluent: A Technical Guide
72 pages
Unit 3
No ratings yet
Unit 3
26 pages
Introduction to Apache Kafka Architecture
No ratings yet
Introduction to Apache Kafka Architecture
27 pages
Understanding Apache Kafka Architecture
No ratings yet
Understanding Apache Kafka Architecture
10 pages
Kafka
No ratings yet
Kafka
19 pages
Kafka Notes 20250814
No ratings yet
Kafka Notes 20250814
6 pages
Introduction to Apache Kafka Overview
No ratings yet
Introduction to Apache Kafka Overview
18 pages
Apache Kafka for Tech Students
No ratings yet
Apache Kafka for Tech Students
21 pages
Apache Kafka 101: Overview & Setup
No ratings yet
Apache Kafka 101: Overview & Setup
25 pages
Real-Time Data Pipelines with Kafka
No ratings yet
Real-Time Data Pipelines with Kafka
43 pages
Kafka Overview
No ratings yet
Kafka Overview
36 pages
Kafka Interview Preparation
No ratings yet
Kafka Interview Preparation
13 pages
Apache Kafka Guide: Setup & Messaging
No ratings yet
Apache Kafka Guide: Setup & Messaging
15 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Comprehensive Guide to Apache Kafka
No ratings yet
Comprehensive Guide to Apache Kafka
137 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Kafka Presentation
No ratings yet
Kafka Presentation
16 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Essential Kafka Notes for Beginners
No ratings yet
Essential Kafka Notes for Beginners
19 pages
Apache Kafka Beginner Guide
No ratings yet
Apache Kafka Beginner Guide
40 pages
Apache Kafka Beginner Guide Final
No ratings yet
Apache Kafka Beginner Guide Final
3 pages
Big Data - Group 14
No ratings yet
Big Data - Group 14
26 pages
Understanding Apache Kafka Architecture
No ratings yet
Understanding Apache Kafka Architecture
7 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Spring Boot Interview Questions 2023
No ratings yet
Spring Boot Interview Questions 2023
44 pages
Spring Boot PDF
100% (4)
Spring Boot PDF
102 pages
Spring Boot and Microservices Overview
81% (31)
Spring Boot and Microservices Overview
572 pages
Guide To Clear Spring Boot Microservice Interviews (Free Sample Copy)
No ratings yet
Guide To Clear Spring Boot Microservice Interviews (Free Sample Copy)
41 pages
Java Interview Questions and Answers Includes Java Version Till Java 12 (BooxRack)
60% (5)
Java Interview Questions and Answers Includes Java Version Till Java 12 (BooxRack)
159 pages
Spring Boot Overview and Features
100% (4)
Spring Boot Overview and Features
118 pages
Grokking The Java Developer Interview - More Than 200 Questions To Crack The Java, Spring
100% (2)
Grokking The Java Developer Interview - More Than 200 Questions To Crack The Java, Spring
330 pages
SpringBoot BeanLife Cycle
No ratings yet
SpringBoot BeanLife Cycle
3 pages
Spring Boot Interview Questions
No ratings yet
Spring Boot Interview Questions
12 pages
101 Java SpringBoot Interview Questions
No ratings yet
101 Java SpringBoot Interview Questions
9 pages
Springboot Interview Questions
100% (1)
Springboot Interview Questions
151 pages
Spring Boot Tutorial PDF
100% (5)
Spring Boot Tutorial PDF
332 pages
Spring Material by Ashok
82% (11)
Spring Material by Ashok
284 pages
Interview Prepration Harish Goyal
100% (1)
Interview Prepration Harish Goyal
5 pages
Transaction Propagation and Isolation in Spring @transactional
No ratings yet
Transaction Propagation and Isolation in Spring @transactional
11 pages
Cracking Microservices Interviews v1.1
100% (4)
Cracking Microservices Interviews v1.1
152 pages
Cracking Java Interview 1697302706
No ratings yet
Cracking Java Interview 1697302706
262 pages
SpringBoot Material
100% (2)
SpringBoot Material
254 pages
Microservices Interview Q&A Guide
No ratings yet
Microservices Interview Q&A Guide
5 pages
Spring Boot Interview Questions PDF
100% (1)
Spring Boot Interview Questions PDF
18 pages
Core Java Notes by Durga Sir
84% (25)
Core Java Notes by Durga Sir
850 pages
Execute Spring Batch Jobs Manually
No ratings yet
Execute Spring Batch Jobs Manually
53 pages
Kubernetes Basic To Advance End To End
100% (8)
Kubernetes Basic To Advance End To End
295 pages
1000 Java Interview Questions PDF
100% (4)
1000 Java Interview Questions PDF
1,112 pages
AshokIT Core Java
75% (4)
AshokIT Core Java
256 pages
Spring Boot Micro Services
100% (3)
Spring Boot Micro Services
503 pages
SpringBoot Data-JPA Guide
100% (1)
SpringBoot Data-JPA Guide
44 pages
Pro Spring 6 An in Depth Guide To The Spring Framework 6
100% (2)
Pro Spring 6 An in Depth Guide To The Spring Framework 6
886 pages
Jaimin Desai
No ratings yet
Jaimin Desai
4 pages
Java 8 New Features
53% (47)
Java 8 New Features
30 pages
Web & Mobile Study Material
No ratings yet
Web & Mobile Study Material
36 pages
Rooman Tech
No ratings yet
Rooman Tech
32 pages
Java Full Stack BOOK Store Management Report
No ratings yet
Java Full Stack BOOK Store Management Report
8 pages
Lifi Aurdino
No ratings yet
Lifi Aurdino
27 pages
DBMS GYM Management Sysytem
No ratings yet
DBMS GYM Management Sysytem
18 pages
Acceptance by Empty Stack in Pda
No ratings yet
Acceptance by Empty Stack in Pda
11 pages
Internship PPT
No ratings yet
Internship PPT
16 pages
Dbms-Practical Questions
50% (6)
Dbms-Practical Questions
2 pages
SQL Server Health Check Essentials
No ratings yet
SQL Server Health Check Essentials
18 pages
Database Basics for Beginners
100% (1)
Database Basics for Beginners
64 pages
Business Intelligence and Data Warehousing
No ratings yet
Business Intelligence and Data Warehousing
23 pages
Programmer and Software Case Study Analysis
No ratings yet
Programmer and Software Case Study Analysis
6 pages
Database Management Systems Skills
No ratings yet
Database Management Systems Skills
8 pages
Brio Ir
No ratings yet
Brio Ir
11 pages
Database and Application Security Overview
No ratings yet
Database and Application Security Overview
50 pages
When AI Meets Information Privacy The Adversarial Role of AI in Data Sharing Scenario
No ratings yet
When AI Meets Information Privacy The Adversarial Role of AI in Data Sharing Scenario
19 pages
Veritas Ransomware Recovery Guide
No ratings yet
Veritas Ransomware Recovery Guide
16 pages
NLS 2
No ratings yet
NLS 2
19 pages
Outsource Work Time Report Summary
No ratings yet
Outsource Work Time Report Summary
32,767 pages
Create Datastore Connection in BODS
No ratings yet
Create Datastore Connection in BODS
9 pages
Teradata Parallel Transporter
0% (1)
Teradata Parallel Transporter
4 pages
ORACLE 9 1Z0-051 Manipulating Data
No ratings yet
ORACLE 9 1Z0-051 Manipulating Data
6 pages
R Programming Lab
No ratings yet
R Programming Lab
57 pages
Practicing DML Commands
No ratings yet
Practicing DML Commands
10 pages
Pengembangan Modul Praktikum Kimia Dasar Berbasis Smart Book Dengan Pemanfaatan QR Code Pada Android
No ratings yet
Pengembangan Modul Praktikum Kimia Dasar Berbasis Smart Book Dengan Pemanfaatan QR Code Pada Android
11 pages
From MARC To Bibframe
No ratings yet
From MARC To Bibframe
28 pages
ER Diagram-Bank Application
No ratings yet
ER Diagram-Bank Application
5 pages
DB2 Fundamentals for SAP Integration
No ratings yet
DB2 Fundamentals for SAP Integration
48 pages
File Module
No ratings yet
File Module
10 pages
MidTerm Exam 1
No ratings yet
MidTerm Exam 1
5 pages
VAR Config Report for SAP Users
No ratings yet
VAR Config Report for SAP Users
4 pages
Legal Case Database Management System
No ratings yet
Legal Case Database Management System
8 pages
Snowflake Template Light-2019
No ratings yet
Snowflake Template Light-2019
57 pages
Crystal Reports in Visual Studio 2008 Guide
No ratings yet
Crystal Reports in Visual Studio 2008 Guide
52 pages
Data Structures: Stack, Queue, Dequeue Concepts
No ratings yet
Data Structures: Stack, Queue, Dequeue Concepts
4 pages
Data Privacy Act 2023 Overview
No ratings yet
Data Privacy Act 2023 Overview
5 pages
Doubly Linked List Operations Guide
No ratings yet
Doubly Linked List Operations Guide
48 pages

Apache Kafka

Uploaded by

Apache Kafka

Uploaded by

INTERNET OF THINGS

TOPIC: APACHE KAFKA

•Originally developed by LinkedIn and donated to the Apache Software

•Designed to handle high-throughput, low-latency, real-time data streams.

 Durable Storage: Data is stored on disk and replicated across brokers.

 High Throughput: Can handle millions of messages per second.

 Low Latency: Ensures quick delivery of messages.

•Helps in building real-time analytics systems.

•Serves as a backbone for microservices communication.

•Ensures scalability to handle large datasets.

Streams or external tools.

 Each topic is divided into partitions for parallel processing.

 Data in topics is retained for a configurable period, even after

 Topics can have configurations for replication and data retention.

 Example: A “Sales Data” topic could have partitions based on

•Push messages to specific partitions.

•Can define custom partitioning logic (e.g., based on keys).

•Consumers: Read data from topics.

•Join consumer groups for parallel processing.

•Kafka ensures that each partition is read by one consumer in a group.

• Brokers: Handle storage and management of data

• Each broker handles a subset of partitions.

• Collaborate to provide fault tolerance and scalability.

• Clusters use ZooKeeper (or KRaft, in newer versions)

for managing configurations and leader election.

and allow parallelism.

 Data Placement: Messages in partitions are stored

in the order they arrive.

 Key-Based Partitioning: Ensures that messages

with the same key go to the same partition.

 Example: A “User Activity” topic could have partitions

for different user IDs.

• Used to keep track of consumed messages.

• Kafka guarantees message order within a partition but not

• Consumers can reset offsets for reprocessing messages.

•Acknowledgments: Producers can configure how many replicas must confirm

a message before it's considered successful.

 Resource-Intensive: High memory usage for durability and performance.

 Message Duplication: Can occur without proper configuration.

 Operational Overhead: ZooKeeper dependency in older versions.

You might also like