0% found this document useful (0 votes)
45 views

Apache Kafka

Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records. It is horizontally scalable and fault tolerant. Key components include brokers that run the Kafka cluster, producers that publish events to topics, and consumers that subscribe to topics. Topics are partitioned for parallel processing. Producers write data to topics which are stored immutably by partitions. Consumers read data from topics in parallel. Kafka Streams and KSQL provide stream processing capabilities.

Uploaded by

phanthephan141
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Apache Kafka

Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records. It is horizontally scalable and fault tolerant. Key components include brokers that run the Kafka cluster, producers that publish events to topics, and consumers that subscribe to topics. Topics are partitioned for parallel processing. Producers write data to topics which are stored immutably by partitions. Consumers read data from topics in parallel. Kafka Streams and KSQL provide stream processing capabilities.

Uploaded by

phanthephan141
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Apache Kafka

Table content
1. What?Why?
2. Advantages
3. Key components of Kafka
4. Kafka architecture
5. Summary
What is Kafka?
Kafka is a distributed streaming platform:
– publish-subscribe messaging system
A messaging system lets you send messages between processes, applications, and
servers.
– Store streams of records in a fault-tolerant durable way.
– Process streams of records as they occur.
Kafka is used for building real-time data pipelines and streaming apps
It is horizontally scalable, fault-tolerant, fast and runs in production in thousands of
companies.
Why apache kafka?
• Distributed, resilient architecture, fault tolerant
• Horizontal scalability:
◦ Can scale to 100s of brokers
◦ Can scale to millions of messages per second
• High performance (latency of less then 10 ms) –real time
Sample
Key components of Kafka
● Broker - Apache Kafka runs as a cluster on one or more servers across multiple data
centers. A broker is a Kafka cluster instance.
● Producers - client applications that publish (write) events to Kafka the brokers.
● Consumers - that subscribe to (read and process) these events.
● Kafka Topic - A Topic is a category/feed name to which messages are stored and
published. Producer applications write data to topics and consumer applications
read from topics.
Broker
● Servers: Kafka is run as a cluster of one or more servers that can span multiple data
centers or cloud regions.
Topics and Partitions
● Producer: this writes data to the brokers.
● Consumer: this consumes data from brokers.

Topics are partitioned, meaning a topic is spread over a number of “buckets” located on
different Kafka brokers .
Topics
A Topic is a category/feed name where records/messages are stored and published from.
To send a record/message you send it to a specific topic, and to read a record/message
you read it from a specific topic.

Why topics: In a Kafka Cluster, data comes from many different sources at the same time.
Ex. logs, web activities, metrics, etc. So Topics are useful in identifying where this data is
stored.
Producers write data to specific topics and consumers read data from specific topics.
Partitions
● Topics are divided into partitions, which contain records/messages in an
unchangeable sequence (immutable).
● Each record/message in a specific partition is assigned and identified by its unique
offset.
● One Topic can also have multiple partition logs. This allows multiple consumers to
read from a topic in parallel.
● Partitions allow clients to parallelize a topic by splitting the data in a particular
topic across multiple brokers
Partitions
Consumer and producers
● Producer: this writes data to the brokers.
● Consumer: this consumes data from brokers.
Consumer
Consumer and producers
● Producer: this writes data to the brokers.
● Consumer: this consumes data from brokers.
Producers
Producers
● If the value of key = NULL, it means that the data is sent without a key. Thus, it will
be distributed in a round-robin manner (i.e., distributed to each partition).
● If the value of the key != NULL, it means the key is attached with the data, and thus
all messages will always be delivered to the same partition.
Producers
Kafka architecture
Kafka architecture
Kafka Architecture Diagram at
LinkedIn
Kafka Architecture Diagram at
LinkedIn
KafkaStream & Ksql
KafkaStream & Ksql
KStream Ksql
Advantages: Advantages:
● Kafka Streams allows you to ● KSQL wants to abstract that
write some complex topologies, it complexity away by providing
requires some substantial you with a SQL semantic
programming knowledge and can
be harder to read Disadvantages:
● If you want to have complex
● (Available for Java and Scala), transformations, explode arrays,
which enables you to write either or need a feature that’s not yet
a High Level DSL (resembling to available, sometimes you’ll have
a functional programming / to revert back to using Kafka
Apache Spark type of program), Streams
or the Low Level API (resembling
more to Apache Storm)

Disadvantages:
● You will have to write some code
and it could become quite messy
and complicated.
Ksql in action
Kafka connect
● Kafka connect is use to perform stream integration between kafka and other
systems like databases, cloud services, search indexes, file systems, and key-value
stores. It makes easy to stream data from different sources to kafka and from kafka
to targets.

Features of Kafka Connect


● It simplifies the development, deployment, and management of connectors by
connecting external systems with Kafka.
● It is distributed and standalone cluster that helps us to deploy large clusters,
provides setups for development, testing.
● We can manage connectors using a REST API
● Kafka Connect helps us to handle the offset commit process
● By default kafka Connect is distributed and scalable.
Kafka connect
Thanks you

You might also like