0% found this document useful (0 votes)
5 views

kafka (1)

The document outlines a series of experiments and steps for setting up and using Apache Kafka, including installation on a single node, creating a multi-broker cluster, and developing Java programs for producing and consuming messages. It details the installation process, configuration of system unit files for Kafka and Zookeeper, and the creation of topics with specific settings. Additionally, it provides guidance on using Maven for Java projects that integrate with Kafka, including setting up dependencies for Kafka clients and logging.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

kafka (1)

The document outlines a series of experiments and steps for setting up and using Apache Kafka, including installation on a single node, creating a multi-broker cluster, and developing Java programs for producing and consuming messages. It details the installation process, configuration of system unit files for Kafka and Zookeeper, and the creation of topics with specific settings. Additionally, it provides guidance on using Maven for Java projects that integrate with Kafka, including setting up dependencies for Kafka clients and logging.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

lOMoARcPSD|40864992

ETL- KAFKA/TALEND LIST OF


EXPERIMENTS:
1. Install Apache Kafka on a single node.
2. Demonstrate setting up a single-node, single-broker Kafka cluster and show basic
operations such as creating topics and producing/consuming messages.
3. Extend the cluster to multiple brokers on a single node.
4. Write a simple Java program to create a Kafka producer and Produce messages to a topic.
5. Implement sending messages both synchronously and asynchronously in the producer.
6. Develop a Java program to create a Kafka consumer and subscribe to a topic and
consume messages.
7. Write a script to create a topic with specific partition and replication factor settings.
8. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.
9. Implement operations such as listing topics, modifying configurations, and deleting topics.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
external systems.
11. Implement a simple word count stream processing application using Kafka Stream
12. Implement Kafka integration with the Hadoop ecosystem.
lOMoARcPSD|40864992

1.Install Apache Kafka on a single node.

Aim: ApacheInstalling Kafka into ubantu system


Program:

step 1 — Installing
Java

Apache Kafka can be run on all platforms supported by Java. In order to set up Kafka on
the Ubuntu system, you need to install java first. As we know, Oracle java is now
commercially available, So we are using its open-source version OpenJDK.

sudo apt
update
sudo apt install default-
jdk java --version

openjdk version "11.0.9.1"2020-11-


04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-
0ubuntu1.20.04)
OpenJDK64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode,
sharing)

Step 2 — Download Latest Apache


Kafka

Download the Apache Kafka binary files from its official download website. You can
also select any nearby mirror to download.

wget https://downloads.apache.org/kafka/3.4.0/kafka_2.12-
3.4.0.tgz

Then extract the archive file

tarxzf kafka_2.12-3.4.0.tgz
lOMoARcPSD|40864992

sudomv kafka_2.12-3.4.0
/usr/local/kafka
lOMoARcPSD|40864992

Step 3 — Creating System Unit Files

Now, you need to create system unit files for the Zookeeper and Kafka services. Which
will help you to start/stop the Kafka service in an easy way.

nano /etc/systemd/system/zookeeper.service

And add the following content:

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.o
rg Requires=network.target remote-
fs.target After=network.target remote-
fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh
/usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-
stop.sh Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save the file and close it.

Next, to create a system unit file for the Kafka service:

nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.ht
ml Requires=zookeeper.service

[Service]
lOMoARcPSD|40864992

Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh
/usr/local/kafka/config/server.properties ExecStop=/usr/local/kafka/bin/kafka-server-
stop.sh

[Install]
WantedBy=multi-user.target

Reload the systemd daemon to apply new

changes. systemctl daemon-reload

Step 4 — Start Kafka and Zookeeper Service

First, you need to start the ZooKeeper service and then start Kafka. Use the
systemctl command to start a single-node ZooKeeper instance.

sudosystemctlstart zookeeper

Now start the Kafka server and view the running status:

sudosystemctl start
kafka
sudosystemctlstatuskafk
a
lOMoARcPSD|40864992

All done. The Kafka installation has been successfully completed. The part of this tutorial
will help you to work with the Kafka server.

Step 5 — Create a Topic in Kafka

Kafka provides multiple pre-built shell scripts to work on it. First, create a topic named

“myTopic” with a single partition with a single replica:

cd /usr/local/kafka
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --
partitions 1 -- topic myTopic

Created topic myTopic.

The replication factor describes how many copies of data will be created. As we are running
with a single instance keep this value 1. Set the partition options as the number of brokers
you want your data to be split between. As we are running with a single broker keep this
value 1. You can create multiple topics by running the same command as above.

After that, you can see the created topics on Kafka by the running below command:

bin/kafka-topics.sh--list--bootstrap-server localhost:9092
lOMoARcPSD|40864992

2.Demonstrate setting up a single-node, single-broker Kafka cluster and


show basic operations such as creating topics and producing/consuming
messages.
program

To set up a Kafka cluster, you will need to follow these general steps:

1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties.
3. Start the ZooKeeper service on each node. This is required for Kafka to function.
4. Start the Kafka brokers on each node by running the kafka-server-start command
and specifying the location of the server.properties file.
5. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes.

Here is a more detailed guide to follow:

1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties. For example, here is a
configuration for a simple Kafka cluster with three brokers:

broker.id=1listeners=PLAINTEXT://localhost:9092num.partitions=3
log.dirs=/tmp/kafka-logs-1zookeeper.connect=localhost:2181broker.id=2
listeners=PLAINTEXT://localhost:9093 num.partitions=3 log.dirs=/tmp/kafka-logs-2
zookeeper.connect=localhost:2181 broker.id=3 listeners=PLAINTEXT://localhost:9094
num.partitions=3 log.dirs=/tmp/kafka-logs-3 zookeeper.connect=localhost:2181

In this example, each broker has a unique broker.id and listens on a different port for client
connections. The num.partitions property specifies the default number of partitions for new
topics, and log.dirs specifies the directory where Kafka should store its data on disk.
zookeeper.connect specifies the ZooKeeper connection string, which should point to the
ZooKeeper ensemble.

1. Start the ZooKeeper service on each node. This is required for Kafka to function. You
can start ZooKeeper by running the following command:

bin/zookeeper-server-start.shconfig/zookeeper.properties

This will start a single-node ZooKeeper instance using the default configuration.
lOMoARcPSD|40864992

1. Start the Kafka brokers on each node by running the kafka-server-start command
and specifying the location of the server.properties file. For example:

bin/kafka-server-start.shconfig/server.properties

This will start the Kafka broker on the default port (9092) using the configuration in
config/server.properties.

1. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes. You can use the kafka-topics, kafka-
console-producer, and kafka-console-consumer command-line tools to perform
these tasks. For example:

bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --


partitions 3 --topic my-topic bin/kafka-console-producer.sh --broker-list
localhost:9092,localhost:9093,localhost:9094 --topic my-topic bin/kafka-console-
consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --topic my-
topic --from-beginning

These commands will create a topic with three partitions and three replicas, produce
messages to the topic, and consume them from all three brokers. You can verify that the
messages are replicated across all nodes by stopping one of the brokers and observing that the
other brokers continue to serve messages.

3. Extend the cluster to multiple brokers on a single node.


Multiple Brokers in Kafka
→ To start a multiple what needs to be done.
Create new server.properties files for every new broker.
Example: Previous port no. was 9092 and broker-id was 0, Kafka log directory was kafka-
logs
Setting up a cluster (configuration)
New server.properties files with the new broker details.

Example: server.properties broker.id=1


listeners=PLAINTEXT://localhost:9093
lOMoARcPSD|40864992

log.dirs=c:/kafka/kafka-logs-1
auto.create.topics.enable=false (optional)
Creating new Broker-1
Follow these steps to add a new broker.

Do the following changes in the file.


1. change id to 1

2. Changing port no. to 9093 and auto-create to false

3. change log directory to Kafka-log-1

Creating new Broker-2


Please follow to set up a new broker-2
Edit: server-2.properties broker.id=2
listeners=PLAINTEXT://localhost:9094
log.dirs=c:/kafka/kafka-logs-2
auto.create.topics.enable=false
Starting up these 2 Kafka brokers
Note: Please keep your existing Kafka broker and Zookeeper running.
lOMoARcPSD|40864992

1. starting the first broker


.\bin\windows\kafka-server-start.bat .\config\server-1.properties
2. starting the second broker
.\bin\windows\kafka-server-start.bat .\config\server-2.properties
Kafka Cluster
→ So we have successfully started 3 Kafka brokers and now we have a Kafka cluster that is
up and running in our machine with 3 brokers.

Running 3 brokers simultaneously.


Creating new Topic
It's time to create a new topic, then we will produce and consume the messages with our new
cluster setup.
.\bin\windows\kafka-topics.bat --create --topic test-topic-replicated -zookeeper
localhost:2181 --replication-factor 3 --partitions 3

a new topic is created


The --replication-factor 3 is used here and normally it is recommended to use if you
are using a cluster setup and this value will be either equal to or less than brokers that
you have in a Kafka cluster. Here we have 3 brokers right? So I am going
to replication-factor 3
We will topic name to test-topic-replicated from test-topic
The partition we will keep is partitions 3 , we are just keeping in sync with the
numbers of brokers that we have. It doesn't matter you can have n number of values
you have, I am just giving the partition value 3 here.
Produce the messages using console producer.
lOMoARcPSD|40864992

.\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test-topic-


replicated

message sent: Hi
Instantiate a new Consumer to receive the messages.
.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test-
topic-replicated --from-beginning

message received: Hi
Now whatever message we have sent is received to console consumers. Now the interesting
part is that we have 3 new Kafka folders right? Let’s go ahead and check that what we have
in it.
Log directories
close the producer console now and you know have created a kafka-logs-1 and kafka-
logs-2 directories are created.

Now each broker got a new folder and that is where it is actually persisting all the
messages that are produced to a particular broker. So we have three different
directories for each and every broker.
Conclusion: we have successfully a Kafka Cluster with 3 brokers and created a topic in the
cluster and successfully produced and consumed the messages into the Kafka cluster.
lOMoARcPSD|40864992

4. Write a simple Java program to create a Kafka producer and


Produce messages to a topic.

re-requisites for Kafka Programming with Java


Installing Kafka (including the part about installing the Java 11 JDK)
Preferred: install IntelliJ Community Edition

Kafka Programming Activities


In this section, we'll use Java programming language to programmatically
replicate what we were able to achieve with Kafka CLI.

The following tutorials are recommended:


Creating a Kafka Project base with (whichever you prefer):
o Maven
o Gradle
Complete Kafka Producer
Complete Kafka Consumer

Maven is a popular choice for Kafka projects in Java.


Before developing Kafka producers and consumers in Java, we'll have to
set up a simple Kafka Java project that includes common dependencies
that we'll need, namely:

Kafka dependencies
Logging dependencies

Follow these steps to create a Java project with the above dependencies.

Creating a Maven project with pom.xml and setting up dependencies


In IntelliJ IDEA, create a new Java maven project (File > New > Project)
lOMoARcPSD|40864992

Then add your Maven project attributes

The build tool Maven contains a pom.xml 昀椀 le. The pom.xml is a default XML
昀椀 le that carries all the information regarding the GroupID, ArtifactID, as
well as the Version values. The user needs to de 昀椀 ne all the necessary
project dependencies in the pom.xml 昀椀 le. Go to the pom.xml 昀椀 le.
lOMoARcPSD|40864992

pom.xl

De 昀椀 ne the Kafka Dependencies. Create


a <dependencies>...</dependencies> block within which we will de 昀椀 ne
the required dependencies.
Add a dependency for Kafka client as shown below
1
2
3

4
lOMoARcPSD|40864992

5
6
7
8
9
10
11
12
13
<project>
...

<dependencies>

<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->


<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>
</dependencies>

</project>

If the version number appears red in color, it means the user missed to
enable the 'Auto-Import' option. If so, go to View > Tool Windows >
Maven. A Maven Projects Window will appear on the right side of the
screen. Click on the 'Refresh' button appearing right there. This will
enable the missed Auto-Import Maven Projects. If the color changes to
black, it means the missed dependency is downloaded.
lOMoARcPSD|40864992

Add another dependency for logging. This will enable us to print


diagnostic logs while our application runs.
1
2
3
4
5
6
7
8
9
10
11
12
13
<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-api -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>

</dependency>
lOMoARcPSD|40864992

<!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-simple -->


<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.32</version>

</dependency>

Now, we have set all the required dependencies. Let's try the Simple Hello
World example.

Creating your 昀椀 rst class


Create a java package say, io.conduktor.demos.kafka.HelloWorld

While creating the java package, follow the package naming conventions.
Finally, create the sample application program as shown below.
1
2
3
4
5
6
7

8
lOMoARcPSD|40864992

9
10
11
12
packageio.conduktor.demos.kafka;

importorg.slf4j.Logger;
importorg.slf4j.LoggerFactory;

publicclassHelloWorld{
privatestatic 昀椀 nalLogger log =LoggerFactory.getLogger(HelloWorld.class);

publicstaticvoidmain(String[]args){
log.info("Hello World");
}

Run the application (the play green button on line 9 in the screenshot
below) and verify that it runs and prints the message, and exits with
code 0. This means that your Java application has run successfully.
Expand the 'External Libraries' on the Project panel and verify that it
displays the dependencies that we added for the project in pom.xml.
lOMoARcPSD|40864992

We have created a sample Java project that includes all the needed
dependencies. This will form the basis for creating Java producers and
consumers next.

5. Implement sending messages both synchronously and asynchronously


in the producer.
lOMoARcPSD|40864992

To implement sending messages both synchronously and asynchronously in a Kafka producer


using IntelliJ IDEA, you would first need to set up a Maven project with the Kafka
dependency. Then, you can create a Java class for the producer and implement both
synchronous and asynchronous message sending. Below is an example with step-by-step
instructions:
Step 1: Set up a Maven project in IntelliJ IDEA.
Open IntelliJ IDEA and create a new project.
Choose "Maven" as the project type.
Configure the project settings and click "Finish".
Step 2: Add Kafka dependency to your Maven pom.xml file.
Add the following dependency for Kafka in the pom.xml file:

<dependency>

<groupId>org.apache.ka 昀欀 a</groupId>

<ar 琀椀 factId>ka 昀欀 a-clients</ar 琀椀 factId>

<version>3.1.0</version>

</dependency>

Step 3: Create a Java class for the Kafka producer.


Right-click on the src/main/java directory in IntelliJ IDEA.
Select "New" > "Java Class" and name it KafkaProducerExample.
Step 4: Implement synchronous and asynchronous message sending in the
KafkaProducerExample class
PROGRAM
import org.apache.kafka.clients.producer.*;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

public class KafkaProducerExample {

private static final String TOPIC_NAME = "test-topic";


private static final String BOOTSTRAP_SERVERS = "localhost:9092";

public static void main(String[] args) { Properties props = new


Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
lOMoARcPSD|40864992

props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");

// Create KafkaProducer instance


KafkaProducer<String, String> producer = new KafkaProducer<>(props);

String message = "Hello, Kafka!";

// Synchronous message sending


try {
ProducerRecord<String, String> record = new
ProducerRecord<>(TOPIC_NAME, message);
producer.send(record).get(); // Wait for acknowledgment
System.out.println("Message sent synchronously successfully.");
} catch (InterruptedException | ExecutionException e) {
System.err.println("Error sending message synchronously: " + e.getMessage());
}

// Asynchronous message sending


ProducerRecord<String, String> record = new
ProducerRecord<>(TOPIC_NAME, message);
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception == null) {
System.out.println("Message sent asynchronously successfully.");
} else {
System.err.println("Error sending message asynchronously: " +
exception.getMessage());
}
}
});

// Flush and close the producer


producer.flush();
producer.close();
}
}

OUTPUT
lOMoARcPSD|40864992

6.Develop a Java program to create a Kafka consumer and


subscribe to a topic and consume messages

DEPENDENCIES TO ADD IN MAVEN


<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.0.0</version>
</dependency>

PROGRAM
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class KafkaConsumerExample {

public static void main(String[] args) {


Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);


consumer.subscribe(Collections.singletonList("test-topic"));
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
records.forEach(record -> {
lOMoARcPSD|40864992

System.out.printf("Consumed message: key=%s, value=%s%n",


record.key(), record.value());
});
}
} finally {
consumer.close();
}
}
}

OUTPUT

7. Write a script to create a topic with specific partition and replication factor
settings.

Below is a script written in Scala for creating a Kafka topic with speci 昀 椀 c
partition and replication factor settings. This script can be executed in IntelliJ
IDEA with the Kafka dependencies included in the project.

Program

import java.util.Properties
import org.apache.kafka.clients.admin.{AdminClient, NewTopic}
import scala.collection.JavaConverters._

object KafkaTopicCreator {
def main(args: Array[String]): Unit = {
// Kafka broker properties val props = new
Properties() props.put("bootstrap.servers",
"localhost:9092")

// Create AdminClient
val adminClient = AdminClient.create(props)

// Define topic configurations


val topicName = "test-topic"
val partitions = 3
val replicationFactor = 2

// Create NewTopic instance


val newTopic = new NewTopic(topicName, partitions, replicationFactor.toShort)

// Create topic
lOMoARcPSD|40864992

val results = adminClient.createTopics(List(newTopic).asJava)


results.values().asScala.foreach { (topicName, future) =>
try {
future.get()
println(s"Topic $topicName created successfully.")
} catch {
case e: Exception =>
println(s"Failed to create topic $topicName: ${e.getMessage}")
}
}

// Close AdminClient
adminClient.close()
}
}

To run this script in IntelliJ IDEA, follow these steps:


1. Open IntelliJ IDEA and create a new Scala project.
2. Add Kafka dependencies to your project. You can do this by adding the following
lines to your build.sbt file:

libraryDependencies += "org.apache.kafka" % "kafka-clients" % "3.0.0"


libraryDependencies += "org.apache.kafka" % "kafka-streams" % "3.0.0"

3. Create a new Scala file (e.g., KafkaTopicCreator.scala) and paste the script into it.
4. Make sure your Kafka broker is running on localhost:9092.
5. Run the KafkaTopicCreator object in IntelliJ IDEA.

OUTPUT

We should see the output indicating whether the topic creation was successful or not.

8. Simulate fault tolerance by shutting down one broker and observing the cluster
behavior.

To simulate fault tolerance by shutting down one broker and observing the cluster behavior in
IntelliJ, you'll need to set up a Kafka cluster and create a sample producer and consumer
application. Then, you'll shut down one of the brokers to observe the behavior. Here's a step-
by-step example:
1. Set Up Kafka Cluster:
Ensure you have Kafka installed and configured with multiple brokers. You
can refer to the Kafka documentation for detailed instructions.
2. Create a Topic:
Let's assume we have a topic named "test_topic" with a replication factor of 3
and 3 partitions. Run the following command in your Kafka installation
directory:
lOMoARcPSD|40864992

bin/ka 昀欀 a-topics.sh --create --zookeeper localhost:2181 --replica 琀椀 on-factor 3 --par 琀椀琀椀 ons 3 --
topic test_topic

3. Create IntelliJ Project:


Create a new Maven or Gradle project in IntelliJ.
Add Kafka dependencies to your pom.xml or build.gradle.
4. Producer Application:
Create a Java class for the producer application. This application will send
messages to the Kafka topic.

Producer program

import org.apache.kafka.clients.producer.*;

import java.util.Properties;
public class KafkaProducerExample {
public static void main(String[] args) { Properties
props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

Producer<String, String> producer = new KafkaProducer<>(props);

String topic = "test_topic";

try {
for (int i = 0; i < 10; i++) {
String message = "Message " + i;
producer.send(new ProducerRecord<>(topic, Integer.toString(i), message));
System.out.println("Sent message: " + message);
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}

5. Consumer Application:
lOMoARcPSD|40864992

Create a Java class for the consumer application. This application will consume
messages from the Kafka topic.

Consumer code

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class KafkaConsumerExample {
public static void main(String[] args) { Properties
props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", "test_group");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);


String topic = "test_topic";

consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
{ System.out.printf("Received message: offset = %d, key = %s, value =
%s
%n",
record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
}
}
6. Run Applications:
Run the producer application and then the consumer application in IntelliJ.
7. Observe Behavior:
lOMoARcPSD|40864992

While both producer and consumer are running, shut down one of the Kafka
brokers in your Kafka cluster. You can do this by stopping the Kafka process
associated with that broker.
Output:

Observe how the consumer continues to receive messages without interruption despite the
broker shutdown. Kafka automatically handles the fault tolerance by reassigning partitions to
the remaining brokers.
We can monitor the logs in IntelliJ to see how Kafka handles the failure and reassignment of
partitions.

9. Implement operations such as listing topics, modifying configurations,


and deleting topics.

To implement operations such as listing topics, modifying configurations, and deleting topics
in IntelliJ IDEA, you would typically interact with Apache Kafka, a distributed streaming
platform. Here's a step-by-step guide on how to perform these operations using the Kafka
command line tools (kafka-topics.sh) within IntelliJ IDEA:
1. Setting up Kafka in IntelliJ IDEA:
First, make sure you have Apache Kafka installed and running on your local
machine or on a server accessible from IntelliJ IDEA.
Open your IntelliJ IDEA project.
2. Create a new Kotlin/Java file:
Right-click on your project folder in the project explorer.
Select "New" -> "Kotlin File/Java Class" to create a new Kotlin/Java file.
3. List Topics:
To list topics, you can use the Kafka command-line tool kafka-topics.sh.
Execute the following Kotlin/Java code to list topics:

Listing Topic

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
val runtime = Runtime.getRuntime()
val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --list --bootstrap-
server localhost:9092")

val reader = BufferedReader(InputStreamReader(process.inputStream))


var line: String?

while (reader.readLine().also { line = it } != null) {


println(line)
}
}
lOMoARcPSD|40864992

Replace /path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script in


your Kafka installation directory.
4. Modify Configurations:
To modify configurations, you can use the Kafka command-line tool kafka-
configs.sh.
Execute the following Kotlin/Java code to modify configurations:

Modify Configaration:

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
val topicName = "your_topic_name"
val configKey = "compression.type"
val configValue = "gzip"

val runtime = Runtime.getRuntime()


val process = runtime.exec("/path/to/kafka/bin/kafka-configs.sh --bootstrap-server
localhost:9092 --entity-type topics --entity-name $topicName --alter --add-config
$configKey=$configValue")

val reader = BufferedReader(InputStreamReader(process.inputStream))


var line: String?

while (reader.readLine().also { line = it } != null) {


println(line)
}
}

Replace your_topic_name with the name of the topic you want to modify, and
/path/to/kafka/bin/kafka-configs.sh with the actual path to kafka-configs.sh script.
5. Delete Topics:
To delete topics, you can use the Kafka command-line tool kafka-topics.sh.
Execute the following Kotlin/Java code to delete topics:

Delete Topics

import java.io.BufferedReader
import java.io.InputStreamReader

fun main() {
val topicName = "your_topic_name"
lOMoARcPSD|40864992

val runtime = Runtime.getRuntime()


val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --bootstrap-server
localhost:9092 --delete --topic $topicName")

val reader = BufferedReader(InputStreamReader(process.inputStream))


var line: String?

while (reader.readLine().also { line = it } != null) {


println(line)
}
}

Replace your_topic_name with the name of the topic you want to delete, and
/path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script.
6. Run the code:
Run the Kotlin/Java file in IntelliJ IDEA.
You should see the output in the console showing the list of topics,
configuration modification status, or topic deletion status.

Make sure we have appropriate permissions and the Kafka server is running when executing
these commands. Additionally, replace placeholders such as /path/to/kafka/bin/ and
localhost:9092 with actual paths and addresses relevant to your Kafka setup.

10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
external systems.

Kafka Connect is a framework that provides scalable and reliable streaming data integration
between Apache Kafka and other systems. It simplifies the process of building and managing
connectors to move data in and out of Kafka.
To demonstrate how to use Kafka Connect and connectors to integrate with external systems,
let's walk through an example of setting up a simple connector to move data from a CSV file
to a Kafka topic. We'll use IntelliJ IDEA as our IDE.
Step 1: Setup Kafka and Kafka Connect
Ensure you have Apache Kafka installed and running on your local machine. Additionally,
you'll need to have Kafka Connect installed. You can find installation instructions in the
Apache Kafka documentation.
Step 2: Create a Kafka Connector Configuration File
Create a JSON configuration file for your Kafka connector. For this example, let's call it csv-

Program

source-connector.json:
lOMoARcPSD|40864992

{
"name": "csv-source-connector",
"config": {
"connector.class": "FileStreamSource",
"tasks.max": "1",
"file": "<path_to_your_csv_file>",
"topic": "csv-topic"
}
}

Replace <path_to_your_csv_file> with the path to your CSV file.

Step 3: Start Kafka Connect


Start Kafka Connect with the following command:

./bin/connect-standalone.sh config/connect-standalone.properties csv-source-connector.json


./bin/connect-standalone.sh config/connect-standalone.properties csv-source-con

This command assumes you're using the standalone mode of Kafka Connect. Adjust the paths
as necessary for your setup.

Step 4: Verify Connector Status


Once Kafka Connect is running, you can verify the status of your connector using the
following command:
curl localhost:8083/connectors/csv-source-connector/status

tep
5: Produce and Consume Data
Now, let's produce some data to the CSV file and consume it from the Kafka topic.
Example CSV File (data.csv):

OUTPUT

11. Implement a simple word count stream processing application using Kafka Stream
lOMoARcPSD|40864992

To implement a simple word count stream processing application using

Kafka Streams in IntelliJ IDEA, you'll 昀椀 rst need to set up a Kafka cluster. You can

use Docker to set up a local Kafka cluster quickly. Then, you'll create a Maven

project in IntelliJ IDEA and add the necessary dependencies for Kafka Streams.

Program:

import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;

public class WordCountApp {


public static void main(String[] args) {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "word-count-
application");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

StreamsBuilder builder = new StreamsBuilder();


KStream<String, String> textLines = builder.stream("word-count-topic",
Consumed.with(
org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.String()
));
KStream<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.count(Grouped.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()))
.toStream();

wordCounts.to("word-count-output",
Produced.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()));

KafkaStreams streams = new KafkaStreams(builder.build(), config);


streams.start();
lOMoARcPSD|40864992

Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}

Output

12. Implement Kafka integration with the Hadoop ecosystem.

Kafka Hadoop Integration


In order to build a pipeline which is available for real-time processing or
monitoring as well as to load the data into Hadoop, NoSQL, or data
warehousing systems for offline processing and reporting, especially for
real-time publish-subscribe use cases, we use Kafka.
a. Hadoop producer
In order to publish the data from a Hadoop Cluster to Kafka, a Hadoop
producer offers a bridge you can see in the below image:
Moreover, Kafka topics are considered as URIs, for a Kafka producer.
Though, URIs are specified below, to connect to a specific Kafka broker:
kafka://<kafka-broker>/<kafka-topic>
Well, for getting the data from Hadoop, the Hadoop producer code suggests
two possible approaches, they are:
 Using the Pig script and writing messages in Avro format
Basically, for writing data in a binary Avro format, Kafka producers use Pig
scripts, in this approach. Here each row refers to a single message.
Further, the AvroKafkaStorage class picks the Avro schema as its first
argument and then connects to the Kafka URI, in order to push the data into
the Kafka cluster. Moreover, we can easily write to multiple topics and
brokers in the same Pig script-based job, by using the AvroKafkaStorage
producer.
 Using the Kafka OutputFormat class for jobs
Now, in the second method, for publishing data to the Kafka cluster, the
Kafka OutputFormat class (extends Hadoop’s OutputFormat class) is used.
Here, by using low-level methods of publishing, it publishes messages as
bytes and also offers control over the output.
Although, for writing a record (message) to a Hadoop cluster, the Kafka
OutputFormat class uses the KafkaRecordWriter class.
In addition, we can also configure Kafka Producer parameters and Kafka
Broker information under a job’s configuration, for Kafka Producers.
b. Hadoop Consumer
Whereas, a Hadoop job which pulls data from the Kafka broker and further
pushes it into HDFS, is what we call a Hadoop consumer. Though, from
below image, you can see the position of a Kafka Consumer in the
architecture pattern:
lOMoARcPSD|40864992

As a process, a Hadoop job does perform parallel loading from Kafka to


HDFS also some mappers for purpose of loading the data which depends on
the number of files in the input directory. Moreover, data coming from
Kafka and the updated topic offsets is in the output directory.
Further, at the end of the map task, individual mappers write the offset of
the last consumed message to HDFS. Though, each mapper simply restarts
from the offsets stored in HDFS, if a job fails and jobs get restarted.

You might also like