0% found this document useful (0 votes)
5 views

Kafka Clustering v1.0.0

Uploaded by

sarathraj.argus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Kafka Clustering v1.0.0

Uploaded by

sarathraj.argus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

DOCUMENTATION ON KAFKA CLUSTERING

CONTENT
1. INTRODUCTION

2. LIFECYCLE

3. CLUSTERING

4. INSTALLATION & CONFIGUARTION

5. REFERENCES

INTRODUCTION

What is Apache Kafka?

Apache Kafka is a distributed, open-source system designed for publishing and


subscribing to streams of data. It employs a broker-based architecture to ensure fault-tolerant
message duplication and persistence, while categorizing data into topics.

Kafka combines three critical event streaming capabilities:

1. Publish and Subscribe:

Producers: These are the entities that generate and send messages (events) to Kafka
topics. Topics act as channels where events are categorized.

Consumers: They subscribe to topics and receive the messages. Kafka allows multiple
consumers to subscribe to a topic, providing a scalable and distributed system for
handling streams of data.
2. Storage:
Kafka is designed for durability and fault tolerance. Events are stored on disk, and Kafka
brokers (nodes in the Kafka cluster) are responsible for managing and storing these
events.

Data is distributed across multiple brokers, ensuring reliability and availability even if
some nodes fail. This distributed storage model contributes to the scalability and fault
tolerance of Kafka.
3. Stream Processing:

Kafka facilitates real-time stream processing, allowing developers to build applications


that can analyze and react to events in real-time or process historical data.

Developers can implement various operations on the event stream, such as filtering,
transformation, and aggregation. This enables the creation of reactive applications,
event-driven architectures, and real-time analytics.

Overview of some key features of Apache Kafka:

1. Low Latency:

Apache Kafka offers remarkably low end-to-end latency, making it well-suited for
real-time data streaming.

With latencies as low as 10 milliseconds, Kafka enables quick retrieval of data


records, allowing consumers to access messages shortly after they are produced.

2. Seamless Messaging Functionality:

Kafka's unique ability to decouple and store messages efficiently facilitates


seamless messaging.

It supports publish-subscribe and real-time data processing, making it easy to deal


with large volumes of data and providing a significant advantage over traditional
communication options.

3. High Scalability:

Kafka's distributed design enables high scalability, allowing the system to handle
varying volumes and speeds of incoming messages.
The ability to scale Kafka up or down without causing downtime ensures flexibility
in adapting to changing application and processing demands.
4. High Fault Tolerance:

Kafka is designed to be highly fault-tolerant and reliable.

Data duplication and distribution across servers or brokers ensure that if one
server goes down, data remains available on others, ensuring continuous access to
information.

5. Multiple Integrations:

Kafka supports seamless integration with various data-processing frameworks and


services, including Apache Spark, Apache Storm, Hadoop, and cloud platforms like
Amazon Web Services (AWS).

This flexibility allows organizations to easily incorporate Kafka into their real-time
data pipelines, connecting it with different applications and leveraging its benefits
in diverse ecosystems.

Comparison between Apache Kafka and a few other notable Event Streaming
technologies:

1. Apache Kafka vs. RabbitMQ:

Kafka:

Kafka is designed for high-throughput, low-latency event streaming at scale.

It excels in scenarios where durability, fault tolerance, and real-time


processing of large data volumes are crucial.

Kafka's publish-subscribe model allows decoupling of producers and


consumers.

RabbitMQ:

RabbitMQ is a message broker that follows the AMQP protocol.

It is known for simplicity, ease of use, and support for various messaging
patterns.

RabbitMQ is often chosen for scenarios where a lightweight and


straightforward messaging system is preferred.

2. Apache Kafka vs. Apache Pulsar:

Kafka:

Kafka is widely adopted, has a mature ecosystem, and is known for its
scalability and fault tolerance.

It uses a partitioned log design for storing and distributing events.

Kafka's architecture allows it to handle massive amounts of data and high


traffic.

Pulsar:

Pulsar is an open-source distributed messaging and event streaming platform.

It supports multi-tenancy, has a simpler architecture, and provides native


support for serving multiple organizations.
Pulsar has a different architecture, separating serving and storage, which
allows for easier scaling.

LIFECYCLE

1. Kafka Producer:

A producer is an application that is source of data stream. It generates tokens or


messages and publish it to one or more topics in the Kafka cluster. The Producer
API from Kafka helps to pack the message or token and deliver it to Kafka Server.

2. Kafka Cluster (Brokers):

Brokers manage topics, store partitions, and handle message replication.

Kafka brokers are individual Kafka server instances within a Kafka cluster.

Brokers handle the storage, retrieval, and replication of data. They are responsible
for receiving and serving messages from producers and consumers.

Each broker in a Kafka cluster is identified by a unique integer ID.

When a Kafka client (producer or consumer) wants to interact with the Kafka
cluster, it needs to connect to one of the brokers. The broker it initially connects to
is called the bootstrap broker.

The bootstrap broker provides the client with metadata about the Kafka cluster,
such as the list of brokers, partitions, and their leaders. Once a client has this
information, it can communicate directly with the appropriate broker for producing
or consuming messages.

3. Kafka Topic:

A topic is a logical channel for organizing and categorizing messages.

Topic refers to a category or a common name used to store and publish a


particular stream of data. Basically, topics in Kafka are similar to tables in the
database, but not containing all constraints.
In Kafka, we can create n number of topics as we want. It is identified by its name,
which depends on the user's choice. A producer publishes data to the topics, and a
consumer reads that data from the topic by subscribing it.
4. Kafka Partition:

Topics are divided into partitions, which are the unit of parallelism and distribution.

The Partitions that make up a Topic are dispersed among the servers of the Kafka
Clusters. Each server in the cluster is in charge of its own data and Partition
requests. When a Broker receives the messages, it also receives a key.

The key can be used to indicate which Partition a message should be sent to.
Messages with the same key are sent to the same Partition. This allows several
users to read from the same Topic at the same time.

5. Kafka Broker:

Brokers store partitions, serve reads, and handle communication between


producers and consumers.

Kafka cluster typically consists of multiple brokers to maintain load balance. Kafka
brokers are stateless, so they use ZooKeeper for maintaining their cluster state.
One Kafka broker instance can handle hundreds of thousands of reads and writes
per second and each bro-ker can handle TB of messages without performance
impact. Kafka broker leader election can be done by ZooKeeper.

6. Kafka Connector:

Kafka Connect is a component of Kafka that provides data integration between


databases, key-value stores, search indexes, file systems and Kafka brokers.

Kafka Connect provides a common framework for you to define connectors that
integrate Kafka with external systems , which do the work of moving data in and
out of Kafka.

There are two different types of connectors:

Source connectors that act as producers for Kafka

Sink connectors that act as consumers for Kafka

7. Kafka Consumer:

Consumers subscribe to Kafka topics, process messages, and can be part of a


consumer group for load balancing.

Kafka brokers are stateless, which means that the consumer has to maintain how
many messages have been consumed by using partition offset. If the consumer
acknowledges a particular message offset, it implies that the consumer has
consumed all prior messages. The consumer issues an asynchronous pull request
to the broker to have a buffer of bytes ready to consume. The consumers can
rewind or skip to any point in a partition simply by supplying an offset value.
Consumer offset value is notified by ZooKeeper.

8. Zookeeper:

Zookeeper is used for cluster coordination, managing metadata, leader election,


and other distributed coordination tasks.
ZooKeeper service is mainly used to notify producer and consumer about the
presence of any new broker in the Kafka system or failure of the broker in the
Kafka system. As per the notification received by the Zookeeper regarding presence
or failure of the broker then producer and consumer takes decision and starts
coordinating their task with some other broker.

+-------------------+ +-------------------------+ +------------


------+
| Kafka Producer | --> | Kafka Cluster (Brokers) | --> | Kafka Topic
|
| | | | |
|
| Produces Messages| | Manages Topic Metadata | | Logical
Log |
+-------------------+ +-------------------------+ +------------
------+
| | |
v v v
+-------------------+ +------------------+ +------------------+
| Kafka Partition | --> | Kafka Broker | --> | Kafka Connector |
| | | | | |
| Part of a Topic | | Stores Partitions| | Integrates
|
| Physical Log | | and Serves Reads | | with External
|
+-------------------+ +------------------+ | Systems |
| | +------------------+
v v |
+-------------------+ +------------------+ |
| Kafka Consumer | --> | Zookeeper | |
| | | | |
| Subscribes to | | Manages Cluster | |
| Kafka Topics | | Metadata, Leader | |
| Processes | | Election, etc. | |
| Messages | +------------------+ |
+-------------------+ |
|

Kafka Cluster Management and Coordination |


|

Here's a simplified overview of the Kafka workflow:

1. Producer Produces Messages:

Producers are applications or systems that generate and send messages to Kafka
topics.

Messages are key-value pairs and are produced to specific topics.

2. Topic and Partitions:


Topics are logical channels or feeds to which messages are published.

Topics are divided into partitions, allowing parallel processing and scalability.
3. Partitioning and Replication:

Partitions are assigned to brokers. Each partition has a leader and zero or more
followers (replicas).

Replication ensures data durability and availability. Replicas are distributed across
brokers.

4. Leader Election:

The leader for each partition is dynamically elected from the in-sync replicas (ISRs)
by the Kafka controller.

The leader handles all reads and writes for its partition.

5. Message Ingestion:

Producers send messages to Kafka brokers. The producer may choose a specific
partition or let Kafka handle partitioning.

6. Storage in Logs:

Messages are appended to logs within each partition. Logs represent an


immutable, ordered sequence of messages.

7. Consumer Groups:

Consumers are applications or systems that subscribe to topics and process


messages.

Consumers can be part of a consumer group, which allows parallel consumption


and load balancing.

8. Consumer Polling:

Consumers pull messages from partitions they are subscribed to.

The offset, which represents the position in the log, is managed by Kafka and helps
track the progress of consumption.

9. Acknowledgment and Commit:

Consumers process messages and acknowledge them back to Kafka once


processed.

Consumers commit their offsets to Kafka, indicating the point up to which they
have consumed messages.

10. Scaling and Fault Tolerance:

Kafka provides horizontal scalability by distributing partitions across multiple


brokers.

Replication and leader election ensure fault tolerance and high availability.

11. ZooKeeper Coordination:

Kafka uses ZooKeeper for cluster coordination, managing metadata, and leader
election.

12. Dynamic Cluster Changes:

Kafka can adapt to dynamic changes, such as broker additions or removals, and
reassigning leaders as needed.
CLUSTERING
Kafka cluster typically consists of multiple brokers to maintain load balance.

Below are the key types of clustering in Apache Kafka:

1. Single-Node Cluster:

This is the simplest form of Kafka setup, where all components (ZooKeeper and Kafka
broker) run on a single machine. While suitable for development and testing, it lacks the
benefits of fault tolerance and high availability.

2. Multi-Node Cluster:

In a production environment, Kafka is often configured as a multi-node cluster. Multiple


Kafka broker nodes are deployed across different machines, providing fault tolerance,
scalability, and load balancing. This type of clustering is crucial for handling large
volumes of data and ensuring system reliability.
The primary differences between a single-node cluster and a multi-node cluster in Apache
Kafka lie in their architecture, scalability, fault tolerance, and overall performance. Let's
explore these differences:

1. Architecture:
Single-Node Cluster:

Consists of a single machine running a Kafka broker and potentially ZooKeeper.

Suitable for development, testing, and scenarios with relatively low data volume.

Lacks the distribution benefits of a multi-node cluster.

Multi-Node Cluster:

Involves multiple machines, each running a Kafka broker, forming a distributed


system.

Enables horizontal scaling, distributing data across multiple nodes for better
performance and fault tolerance.

Common in production environments with higher data volumes and processing


requirements.

2. Scalability:
Single-Node Cluster:

Limited in terms of scalability, as it can only utilize the resources of a single


machine.

Difficult to scale horizontally to handle increased workloads.

Multi-Node Cluster:
Horizontal scalability allows for the addition of more machines (nodes) to the
cluster.

Distributes workloads across nodes, improving overall scalability and performance.

3. Fault Tolerance:
Single-Node Cluster:

Vulnerable to a single point of failure. If the machine or Kafka broker fails, the
entire system becomes unavailable.

Limited options for data recovery in case of hardware failure.

Multi-Node Cluster:

Enhanced fault tolerance through data replication. Each partition has multiple
replicas distributed across different nodes.

Even if one or more nodes fail, the system remains operational, as replicas on other
nodes can take over.

4. High Availability:
Single-Node Cluster:

Limited availability due to the reliance on a single machine.

Not suitable for scenarios requiring high availability.

Multi-Node Cluster:

Improved availability by distributing replicas across multiple nodes.

Ensures continuous data availability even if some nodes or brokers go down.

5. Parallel Processing:
Single-Node Cluster:

Limited parallelism in data processing due to the constraints of a single machine.

Multi-Node Cluster:

Enables parallel processing of data by distributing partitions across multiple nodes.

Improves throughput and responsiveness.

6. Load Balancing:
Single-Node Cluster:

No load balancing since there is only one machine.

Potential for performance bottlenecks.

Multi-Node Cluster:

Distributes the load of producing and consuming messages across multiple nodes.

Prevents any single node from becoming a bottleneck.


7. Use Cases:
Single-Node Cluster:

Suitable for development, testing, and learning Kafka.

Small-scale applications with low data volume.

Multi-Node Cluster:

Production environments with larger data volumes and higher processing


requirements.

Mission-critical applications requiring high availability and fault tolerance.

INSTALLATION & CONFIGURATION


Installing and configuring Apache Kafka on Windows involves several steps :

Step 1: Prerequisites

Before you start, make sure you have the following installed on your Windows machine:

Java: Kafka requires Java to run. Download and install the latest version of Java from the
official website: https://www.oracle.com/java/technologies/javase-downloads.html

Step 2: Download and Extract Kafka

1. Go to the Kafka official website: https://kafka.apache.org/downloads

2. Download the latest stable release for Windows.

Step 3: Extract Kafka Archive

1. Extract the downloaded Kafka archive to a directory of your choice. For example, you can
extract it to C:\kafka .

Step 4: Configure Kafka

1. Navigate to the Kafka directory (e.g., C:\kafka ).

2. Open the config directory and edit the server.properties file using a text editor.

Find the line listeners=PLAINTEXT://:9092 and make sure it is uncommented.

Change the log.dirs property to a directory where Kafka will store its data. For
example, log.dirs=C:/kafka/data .

Step 5: Start Zookeeper

Kafka uses Zookeeper, so you need to start it before starting Kafka.

1. Navigate to the Kafka directory (e.g., C:\kafka ).

2. Run the following command in a new command prompt window:

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
Step 6: Start Kafka

1. Open a new command prompt window.

2. Navigate to the Kafka directory (e.g., C:\kafka ).

3. Run the following command:

.\bin\windows\kafka-server-start.bat .\config\server.properties

Step 7: Create a Kafka Topic

1. Open a new command prompt window.

2. Navigate to the Kafka directory (e.g., C:\kafka ).

3. Create a topic with the following command:

.\bin\windows\kafka-topics.bat --create --topic myTopic --bootstrap-server


localhost:9092 --partitions 1 --replication-factor 1

If you want to create desired topic with desired number of partition then :

Step1: Edit server.properties File

Open the server.properties file in a text editor. You can use Notepad or any other text editor of
your choice.

Step 2: Configure Topics

Add or modify the following lines in the server.properties file to configure the topics:

# Topic 1 with 2 partitions


auto.create.topics.enable=false
num.partitions=2
default.replication.factor=1

# Topic 2 with 1 partition


create.topic=true
default.replication.factor=1

These settings will apply to newly created topics. If you want these settings to apply to existing
topics as well, you need to update the topic configuration separately.

Step 3: Save and Close the File

Save the changes to the server.properties file and close the text editor.

Step 4: Restart Kafka

If Kafka is already running, restart it to apply the new configuration. Stop the Kafka server, and
then start it again:

1. Stop Kafka by closing the Kafka server command prompt.

2. Start Kafka again:


.\bin\windows\kafka-server-start.bat .\config\server.properties

Step 5: Create Topics

Now that the configuration is in place, you can create the three topics with the specified number
of partitions:

Create Topic 1 (2 partitions):

.\bin\windows\kafka-topics.bat --create --topic topic1 --bootstrap-server


localhost:9092 --partitions 2 --replication-factor 1

Create Topic 2 (1 partition):

.\bin\windows\kafka-topics.bat --create --topic topic2 --bootstrap-server


localhost:9092 --partitions 1 --replication-factor 1

Step 8: Produce and Consume Messages

1. Open two new command prompt windows.

2. In one window, run the following command to produce messages:

.\bin\windows\kafka-console-producer.bat --topic myTopic --bootstrap-server


localhost:9092

3. In the other window, run the following command to consume messages:

.\bin\windows\kafka-console-consumer.bat --topic myTopic --bootstrap-server


localhost:9092

Now, you have Apache Kafka installed and running on your Windows machine. You can start
producing and consuming messages on the myTopic topic.

CONFIGURATION PROPERTIES :

ZooKeeper Configuration ( zookeeper.properties ):

Create a zookeeper.properties file with the following content:

# zookeeper.properties

dataDir=C:\\zookeeper-3.6.3\\data
clientPort=2181

Explanation:
dataDir : Specifies the directory where ZooKeeper will store its data.

clientPort : Defines the port on which ZooKeeper clients (including Kafka) will connect.

Kafka Broker Configuration ( server.properties ):

Create a server.properties file for each Kafka broker with the following content:

Broker 1 ( server-1.properties ):

# server-1.properties

broker.id=0
listeners=PLAINTEXT://localhost:9092
log.dirs=C:\\kafka_2.13-2.8.1\\kafka-logs-1
zookeeper.connect=localhost:2181

Broker 2 ( server-2.properties ):

# server-2.properties

broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=C:\\kafka_2.13-2.8.1\\kafka-logs-2
zookeeper.connect=localhost:2181

Explanation:

broker.id : Unique identifier for each Kafka broker.

listeners : The address and port where the broker listens for connections.

log.dirs : The directory where Kafka stores its logs.

zookeeper.connect : The connection string for ZooKeeper, specifying the address and port.

Replication Configuration:

num.replica.fetchers : Number of fetcher threads used to replicate messages between


brokers.

default.replication.factor : Default replication factor for automatically created topics.

num.replica.fetchers=2
default.replication.factor=2
ZooKeeper Connection:

zookeeper.connect : Connection string for ZooKeeper. List of comma-separated host:port


pairs.

zookeeper.connect=localhost:2181

Kafka Producer Configuration:

1. Bootstrap Servers:

bootstrap.servers : A list of host and port pairs to use for establishing the initial
connection to the Kafka cluster.

bootstrap.servers=<broker1>:<port1>,<broker2>:<port2>

Acknowledgment Settings:

acks : The number of acknowledgments the producer requires the broker to receive before
considering a message as sent.

acks=all

Retries and Timeout:

retries : The number of times the producer should retry sending a message in case of
failures.

delivery.timeout.ms : The maximum time in milliseconds the producer will wait for an
acknowledgment.

retries=3
delivery.timeout.ms=30000

Kafka Consumer Configuration:

1. Bootstrap Servers:

bootstrap.servers : A list of host and port pairs used for establishing the initial
connection to the Kafka cluster.

bootstrap.servers=<broker1>:<port1>,<broker2>:<port2>

2. Consumer Group and ID:

group.id : The ID of the consumer group to which a consumer belongs.

client.id : A user-specified string used to identify the consumer.


group.id=my-consumer-group
client.id=my-consumer

3. Auto Commit and Commit Interval:

enable.auto.commit : If true , consumer offsets are committed automatically at regular


intervals.

auto.commit.interval.ms : The frequency at which the consumer's offsets are committed.

enable.auto.commit=true
auto.commit.interval.ms=1000

4. Fetch Settings:

fetch.min.bytes : The minimum amount of data the server should return for a fetch
request.

fetch.max.wait.ms : The maximum amount of time the server should wait for more data to
arrive before sending a response.

fetch.min.bytes=1
fetch.max.wait.ms=500

SINGLE NODE CONFIGURATION

Here's a basic configuration for a single-node Kafka setup.

Note:

1. You need to adjust some settings based on your specific requirements and environment.

2. .sh is used for linux and . .bat or cmd is used for linux

1. Broker Configuration ( server.properties ):

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

# Port the socket server listens on


listeners=PLAINTEXT://localhost:9092

# A comma-separated list of directories under which to store log files


log.dirs=/tmp/kafka-logs

# Number of log partitions per topic


num.partitions=1

# Zookeeper connection string(ip address)


zookeeper.connect=localhost:2181

# Replication configuration
default.replication.factor=1

broker.id : A unique integer for each broker in the Kafka cluster.

listeners : The address the socket server listens on. Adjust the hostname and port
accordingly.

log.dirs : The directory where Kafka stores its data.

num.partitions : Number of partitions for each topic.

zookeeper.connect : Connection string for the Zookeeper ensemble.

2. Zookeeper Configuration ( zookeeper.properties ):

Kafka relies on Zookeeper, so you need to have a Zookeeper ensemble running. Here is a
basic configuration:

# The number of milliseconds of each tick


tickTime=2000

# The number of ticks that the initial synchronization phase can take
initLimit=10

# The number of ticks that can pass between sending a request and getting an
acknowledgement
syncLimit=5

# Zookeeper data directory


dataDir=/tmp/zookeeper-data

# Zookeeper client port


clientPort=2181

tickTime : The length of a ZooKeeper tick in milliseconds.

initLimit : The number of ticks that the ZooKeeper servers in quorum can take to
connect to a leader.

syncLimit : The number of ticks that can pass between sending a request and getting
an acknowledgment.

dataDir : The directory where ZooKeeper will store its snapshots and transaction logs.

clientPort : The port at which clients will connect.

3. Start Kafka and Zookeeper:

After configuring, start Zookeeper and then Kafka. You can use the following commands:
# Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka
bin/kafka-server-start.sh config/server.properties

4. Create a Topic:

Open a new command prompt, navigate to the Kafka directory, and run the following command to
create a topic:

.\bin\windows\kafka-topics.bat --create --topic myTopic --bootstrap-server


localhost:9092 --partitions 1 --replication-factor 1

5.Produce Messages to a Topic:

Open a new command prompt, navigate to the Kafka directory, and run the following command to
start a producer:

.\bin\windows\kafka-console-producer.bat --topic myTopic --bootstrap-server


localhost:9092

6.Consume Messages from a Topic:

Open a new command prompt, navigate to the Kafka directory, and run the following command to
start a consumer:

.\bin\windows\kafka-console-consumer.bat --topic myTopic --bootstrap-server


localhost:9092

Replace myTopic with the name of your desired topic.

This is a basic configuration for a single-node Kafka setup. Depending on your use case, you may
need to tweak additional settings or enable specific features. Always refer to the official Kafka
documentation for the most up-to-date and detailed information.

MULTI- NODE CONFIGURATION

Setting up a multi-node Kafka cluster involves configuring both Kafka and Zookeeper on each
node, Rest of the configuration is same as single node . Below are the steps to set up a Kafka
multi-node cluster:

1.Zookeeper Configuration (zoo.cfg):

Edit the Zookeeper configuration file ( zoo.cfg ) on each node:


# Zookeeper tick time
tickTime=3000

# The number of ticks that the initial synchronization phase can take
initLimit=10

# The number of ticks that can pass between sending a request and getting an
acknowledgement
syncLimit=5

# Zookeeper data directory(directory you need to set accordingly)


dataDir=/temp/hadoop/zookeeper1

# Purge interval to delete old snapshots and transaction logs


autopurge.purgeInterval=24

# Number of snapshots to retain in data directory


autopurge.snapRetainCount=30

# Zookeeper server configurations


server.1=tdslave01:2888:3888
server.2=tdslave02:2888:3888
server.3=tdslave03:2888:3888
server.4=tdmaster01:2888:3888
server.5=tdmaster02:2888:3888

Update the server.x configurations for each node with the appropriate hostnames and
ports.

Ensure that the dataDir directory exists and is writable.

2.Kafka Configuration (server.properties):

Edit the Kafka server configuration file ( server.properties ) on each node:

# Broker ID for each Kafka broker (unique integer for each broker)
broker.id=1 # For tdslave01
broker.id=2 # For tdslave02
broker.id=3 # For tdslave03
broker.id=4 # For tdmaster01
broker.id=5 # For tdmaster02

# Port the socket server listens on


listeners=PLAINTEXT://<hostname>:9092

# A comma-separated list of directories under which to store log files


log.dirs=/tmp/kafka-logs

# Number of log partitions per topic


num.partitions=1

# Zookeeper connection string


zookeeper.connect=tdslave01:2181,tdslave02:2181,tdslave03:2181,tdmaster01:2181,td
master02:2181
Set a unique broker.id for each Kafka broker.

Update the listeners configuration with the appropriate hostname.

Update the zookeeper.connect configuration with the Zookeeper connection string.

3.Start Zookeeper and Kafka:

Start Zookeeper on each node:

bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka on each node:

bin/kafka-server-start.sh config/server.properties

REFERENCES
https://kafka.apache.org/documentation/#configuration

https://sharebigdata.wordpress.com/2015/09/09/kafka-installation-single-node-setup/

https://www.bogotobogo.com/Hadoop/BigData_hadoop_Zookeeper_Kafka_single_node_Multiple_
broker_cluster.php

You might also like