Open In App

What is a Kafka Broker?

Last Updated : 19 Apr, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Kafka brokers are important parts of Apache Kafka. Apache Kafka is a system that helps handle and share large amounts of data quickly. Kafka brokers store data messages. They also manage and send these data messages to the other parts of the system that need them.

This article will explain in the Kafka brokers are and how do they work.

What is Apache Kafka?

Apache Kafka is like a big, fast room where lots of information comes in from many places. It makes sure all the information is kept and processed in the right order. This allows us to look at and understand what is happening right now. Kafka is great for dealing with huge amounts of information that keep coming all the time.

For example, imagine a big river where thousands of different colored balls are thrown in regularly. Kafka is like a special machine that catches each ball, sorts them by color, and puts them in separate containers. We can then find and look at the balls based on their colors.

What is a Kafka Broker?

1

A Kafka broker is like a helper that lets information go between those who send information (producers) and those who receive information (consumers). The broker handles all requests to write new information and read existing information. The Kafka cluster is the group of one or more Kafka brokers working together. Each broker in the cluster has its own unique number ID. For example lets say we have the cluster of the 3 Kafka brokers. Each of these 3 brokers has its own special number ID that is different from the others.

Kafka Broker Architecture

2

Kafka Broker

A Kafka broker is like a single worker or machine in the Kafka system. Its main jobs are to receive new messages coming in safely store those messages and provide the stored messages to any consumers that need them. The broker acts as the middle person between producers sending messages and consumers receiving messages.

Cluster

A Kafka cluster is a group of multiple Kafka brokers all working together. Having a cluster allows Kafka to handle very large amounts of data. If more data needs to be processed new brokers can easily be added to make the cluster bigger. If less data needs processing, brokers can be removed to make the cluster smaller.

Topic

A topic is like a labelled box or category that related messages go into in Kafka. Producers publish their messages into a specific topic box. Consumers subscribe to one or more topic boxes to receive all the messages placed into those boxes. Using topics helps organize messages and allows parallel processing of different message categories.

Partitions

Each topic is further divided into partitions. A partition is like a sub-box inside the main topic box. Having partitions allows a topics messages to be spread across multiple brokers enabling parallel processing. Each partition is stored on a separate Kafka broker in the cluster. This prevents any single broker from getting overloaded with data.

Working of Kafka Broker

3

Producers send messages

Producers are programs or applications that create and send data messages to Kafka brokers. These messages can contain any type of data like logs, events, records or other information from the producer. Producers are responsible for pushing their data into the Kafka system.

Message storage

When producers send messages, the Kafka brokers receive and safely store those messages. The brokers act like secure storage spaces that hold onto the messages until they are needed. The messages are kept in an organized way that allows fast reading and writing, so they can be easily accessed later.

Topics and partitions

Inside Kafka, related messages are grouped together into categories called topics. A topic is like a big labeled box that holds all messages of the same type or category. However, each topic is further divided into smaller partitions, which are like sub boxes inside the main topic box. Having these partitions allows different parts of the big topic to be processed in parallel by multiple brokers at the same time. Partitions also make it easy to increase processing power by simply adding more partitions as the amount of data grows.

Replication for reliability

To ensure no data is lost if a broker fails, Kafka makes multiple copies or replicas of each partition across different brokers in the cluster. So if one broker goes down the replicas on other brokers can still serve the messages, providing reliability and preventing data loss.

Leaders and followers

For the each partition one broker acts as the leader and is responsible for the handling all read and write requests for that partitions messages. The other brokers that have the replicas of that partition are called the followers. The followers constantly copy over any new data from the leader to stay update. If the leader broker fails one of the follower brokers is automatically elected as the new leader to take over.

Consumer consumption

Consumers are the applications that subscribe to one or more topics in order to receive and process the messages from those topics. As the new messages are published to the topic by the producers the Kafka brokers deliver those messages to all the subscribed consumers for that topic. Importantly consumers receive the messages in the exact same order they were originally sent by the producers allowing for the proper sequential and real time processing.

How Kafka Brokers Connect with Producers and Consumers

Apache Kafka brokers are the intermediary between producers (who write data) and consumers (who read data). This is how they interact on both sides:

Producers Interaction

Producers send a message to a broker's leader partition of a specific Kafka topic.

  • The broker writes the message and sends an acknowledgment (ACK) back to the producer after successfully writing the message.
  • Producers can be configured for high reliability with features like acks=all and idempotent producers to avoid duplications.

Consumers Interaction

Consumers pull messages directly from the topic partitions of the broker.

  • Kafka uses the consumer groups to manage the message delivery across multiple consumers.
  • Each consumer gets one partition out of the partitioned group such that there's parallel processing, load balancing, and efficient consumption of real-time data.

Kafka Broker Performance Optimization

In order to make your Kafka cluster efficiently process high-volume data, you must optimize your Kafka broker configurations.

Tuning num.io.threads (for I/O operations)

This parameter determines the number of threads used by Kafka to read and write data from/to disk and the network. num.io.threads can be raised to enhance throughput in active clusters with heavy workloads or huge traffic.

Adjusting log.flush.interval.messages (for disk writes)

This property specifies the number of messages that Kafka writes prior to requiring a flush to disk. Lower settings increase durability (safe in case of failures), but decrease performance. Higher settings raise throughput since Kafka writes less often, but risk losing messages in case the broker crashes ahead of flushing.

Optimizing replica.fetch.max.bytes (for replication)

This setting determines the largest data size that can be read by a follower replica from the leader in a single operation. Boosting replica.fetch.max.bytes will accelerate replication, particularly if you have high-bandwidth networks or large messages. It is beneficial for consistency and failover recovery improvement.

Features of Kafka Broker

Scalability

Kafka brokers can grow bigger by adding more broker machines to the cluster. This allows Kafka to handle increasing amounts of data and more workloads without slowing down.

Fault Tolerance

Kafka provides the fault tolerance by making the multiple copies ( replicas ) of the data. Each partitions data is copied across different brokers. If one broker fails the another broker with the replica can easily take over as the leader ensuring operations keep running and the data remains available.

Durability

Kafka brokers store the messages on disks ensuring the data remains safe even if there's a failure. Messages are kept for the set period of time allowing you to look at the historical data whenever it needed.

Parallel Processing

In Kafka the messages can be processed in parallel using partitions. Multiple consumers can independently process each partition at the same time. This allows for efficient and scalable data processing.

Conclusion

The Kafka brokers play the vital role in the Kafka system for handling and processing the large amounts of data efficiently. They act like the post offices that receives, stores, and delivers the messages between the senders ( producers ) and receivers ( consumers ). By working together in the clusters and using partitions Kafka brokers enable the scalable parallel processing while ensuring the data remains durable and available even if failures occur. With its robust architecture and useful features Kafka brokers make it possible to manage and analyze the huge volumes of real time data streams reliably.


Article Tags :

Similar Reads