0% found this document useful (0 votes)
20 views

CloudApps2 Kafka

CloudApps2_Kafka

Uploaded by

Moustapha SY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

CloudApps2 Kafka

CloudApps2_Kafka

Uploaded by

Moustapha SY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Space reserved for video

Do not put anything here

Kafka
Thanks to public domain slides Jiangjie (Becket)
Qin
Contents
• What is Kafka
• Key concepts
• Kafka clients

Space reserved for video


Do not put anything here
Kafka: a distributed, partitioned, replicated publish subscribe system
providing commit log service

Space reserved for video


Do not put anything here
Description
• Kafka maintains feeds of messages in categories called topics.

• Processes that publish messages to a Kafka topic are producers.

• Processes that subscribe to topics and process the feed of


published messages are consumers.

• Kafka is run as a cluster comprised of one or more servers each of


Space reserved for video
which is called a broker. Do not put anything here

• Communication uses TCP, Clients include Java

4
Characteristics
• Scalability (Kafka is backed by file system)
• Hundreds of MB/sec/server throughput
• Many TB per server
• Strong guarantees about messages
• Strictly ordered (within partitions)
• All data persistent
Space reserved for video
• Distributed Do not put anything here
• Replication
• Partitioning model
Topics

Space reserved for video


● A Topic has several Partitions Do not put anything here

● Partitions of a Topic are distributed


across Brokers
Topics and Logs
• Kafka store messages about a topic in a partition as an append
only log.

Space reserved for video


Do not put anything here

Each partition is an ordered, numbered, immutable append only sequence


of messages--- like a commit log.
7
Kafka Server Cluster Implementation
• Each partition is replicated across a configurable number of
servers.

• Each partition has one “leader” server and 0 or more followers.


• A leader handles read and write requests

• A follower replicates the leader and acts as backup.


Space reserved for video
• Each server is a leader for some of its partitions Do
and a follower
not put anything herefor
others to load balance

• Zookeeper is used to keep the servers consistent


8
Kafka in a big picture (Linked In)
Tracking events
App2App msg.
Distributed Data Systems (DDS)
Metrics ETL / Data deployment
Data deployment Kafka
(Messaging)
Change Log
Samza
Media upload/Download (Stream proc.)
(Images/Docs/Videos) Ambry Databus
streams
(Blob store)
Databus Data
Online Media
Media processing processing
/Datastreams Analytics
Applicatio (Images/Docs/Videos)
Vector (Change capture) Infra
ns Voldemo Nuage (DAI)
User data update Space
(Our reserved for video
rt/Venice ETL Result AWS)
Do not put anything here
(K-V store)
User data update Espresso ETL Result
(NoSQL DB)

Stream Media Storage Nuage


Producer in Kafka
• Send messages to Kafka Brokers
• Messages are sent to a Topic
• Messages with same Key go to same partition (so they are in order)
• Messages without a key go to a random partition (no order guarantee
here)
• Number of partitions changed? - Sorry…Same key might go to
another partition...
Space reserved for video
Do not put anything here
Consumer in Kafka
- A consumer can belong to a
Consumer Group (CG)
- Consumers in the same CG
- Coordinate with each other to determine
which consumer will consume from which
partition
- Share the Consumer Offsets
Space reserved for video
Do not put anything here
Offset
From Brokers’ View
- The Index of a message in a log
- Message Offset does not change
From Consumers’ View
- Consumer Offset
- The position from where I am consuming
- Consumer Offset can change
Space reserved for video
Do not put anything here
More about Consumer Offsets
• Consumer Offsets are per
Topic/Partition/ConsumerGroup
(For a given group, look up the last consumed position in a topic/partition)
• Consumer Offsets can be committed as a checkpoint of
consumption so it can be used when
• Another consumer in the same CG takes over the partition
• Resuming consumption later from committed offsets
Space reserved for video
Do not put anything here
Consumer Rebalance
Topic_1 Topic_2
P0 P1 P2 P3 P0 P1

● Each consumer can have several


consumer threads (essentially one
queue per thread)
C1 C2 C3 C4 ● Each consumer thread can consume
Space reserved for video
from multipleDopartitions
not put anything here
Consumer 1 Consumer 2 Consumer 3 Consumer 4
● Each partition will be consumed by
exactly one consumer in the entire
group
Consumer Rebalance
Topic_1 Topic_2
P0 P1 P2 P3 P0 P1

● Consumer rebalance occurs when


consumer 4 is down
● Consumer 1, 2, 3 takes over
C1 C2 C3 C4 consumer 4’s partitions and resume
Space reserved for video
from the last committed Consumer
Do not put anything here
Consumer 1 Consumer 2 Consumer 3 Consumer 4 Offsets of the CG
● Transparent to user

You might also like