kafka (1)
kafka (1)
step 1 — Installing
Java
Apache Kafka can be run on all platforms supported by Java. In order to set up Kafka on
the Ubuntu system, you need to install java first. As we know, Oracle java is now
commercially available, So we are using its open-source version OpenJDK.
sudo apt
update
sudo apt install default-
jdk java --version
Download the Apache Kafka binary files from its official download website. You can
also select any nearby mirror to download.
wget https://downloads.apache.org/kafka/3.4.0/kafka_2.12-
3.4.0.tgz
tarxzf kafka_2.12-3.4.0.tgz
lOMoARcPSD|40864992
sudomv kafka_2.12-3.4.0
/usr/local/kafka
lOMoARcPSD|40864992
Now, you need to create system unit files for the Zookeeper and Kafka services. Which
will help you to start/stop the Kafka service in an easy way.
nano /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.o
rg Requires=network.target remote-
fs.target After=network.target remote-
fs.target
[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh
/usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-
stop.sh Restart=on-abnormal
[Install]
WantedBy=multi-user.target
nano /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.ht
ml Requires=zookeeper.service
[Service]
lOMoARcPSD|40864992
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh
/usr/local/kafka/config/server.properties ExecStop=/usr/local/kafka/bin/kafka-server-
stop.sh
[Install]
WantedBy=multi-user.target
First, you need to start the ZooKeeper service and then start Kafka. Use the
systemctl command to start a single-node ZooKeeper instance.
sudosystemctlstart zookeeper
Now start the Kafka server and view the running status:
sudosystemctl start
kafka
sudosystemctlstatuskafk
a
lOMoARcPSD|40864992
All done. The Kafka installation has been successfully completed. The part of this tutorial
will help you to work with the Kafka server.
Kafka provides multiple pre-built shell scripts to work on it. First, create a topic named
cd /usr/local/kafka
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --
partitions 1 -- topic myTopic
The replication factor describes how many copies of data will be created. As we are running
with a single instance keep this value 1. Set the partition options as the number of brokers
you want your data to be split between. As we are running with a single broker keep this
value 1. You can create multiple topics by running the same command as above.
After that, you can see the created topics on Kafka by the running below command:
bin/kafka-topics.sh--list--bootstrap-server localhost:9092
lOMoARcPSD|40864992
To set up a Kafka cluster, you will need to follow these general steps:
1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties.
3. Start the ZooKeeper service on each node. This is required for Kafka to function.
4. Start the Kafka brokers on each node by running the kafka-server-start command
and specifying the location of the server.properties file.
5. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes.
1. Install Kafka on all nodes of the cluster. You can download Kafka from the Apache
Kafka website.
2. Configure the server.properties file on each node to specify the broker ID, the
ZooKeeper connection string, and other properties. For example, here is a
configuration for a simple Kafka cluster with three brokers:
broker.id=1listeners=PLAINTEXT://localhost:9092num.partitions=3
log.dirs=/tmp/kafka-logs-1zookeeper.connect=localhost:2181broker.id=2
listeners=PLAINTEXT://localhost:9093 num.partitions=3 log.dirs=/tmp/kafka-logs-2
zookeeper.connect=localhost:2181 broker.id=3 listeners=PLAINTEXT://localhost:9094
num.partitions=3 log.dirs=/tmp/kafka-logs-3 zookeeper.connect=localhost:2181
In this example, each broker has a unique broker.id and listens on a different port for client
connections. The num.partitions property specifies the default number of partitions for new
topics, and log.dirs specifies the directory where Kafka should store its data on disk.
zookeeper.connect specifies the ZooKeeper connection string, which should point to the
ZooKeeper ensemble.
1. Start the ZooKeeper service on each node. This is required for Kafka to function. You
can start ZooKeeper by running the following command:
bin/zookeeper-server-start.shconfig/zookeeper.properties
This will start a single-node ZooKeeper instance using the default configuration.
lOMoARcPSD|40864992
1. Start the Kafka brokers on each node by running the kafka-server-start command
and specifying the location of the server.properties file. For example:
bin/kafka-server-start.shconfig/server.properties
This will start the Kafka broker on the default port (9092) using the configuration in
config/server.properties.
1. Test the cluster by creating a topic, producing and consuming messages, and verifying
that they are replicated across all nodes. You can use the kafka-topics, kafka-
console-producer, and kafka-console-consumer command-line tools to perform
these tasks. For example:
These commands will create a topic with three partitions and three replicas, produce
messages to the topic, and consume them from all three brokers. You can verify that the
messages are replicated across all nodes by stopping one of the brokers and observing that the
other brokers continue to serve messages.
log.dirs=c:/kafka/kafka-logs-1
auto.create.topics.enable=false (optional)
Creating new Broker-1
Follow these steps to add a new broker.
message sent: Hi
Instantiate a new Consumer to receive the messages.
.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test-
topic-replicated --from-beginning
message received: Hi
Now whatever message we have sent is received to console consumers. Now the interesting
part is that we have 3 new Kafka folders right? Let’s go ahead and check that what we have
in it.
Log directories
close the producer console now and you know have created a kafka-logs-1 and kafka-
logs-2 directories are created.
Now each broker got a new folder and that is where it is actually persisting all the
messages that are produced to a particular broker. So we have three different
directories for each and every broker.
Conclusion: we have successfully a Kafka Cluster with 3 brokers and created a topic in the
cluster and successfully produced and consumed the messages into the Kafka cluster.
lOMoARcPSD|40864992
Kafka dependencies
Logging dependencies
Follow these steps to create a Java project with the above dependencies.
The build tool Maven contains a pom.xml 昀椀 le. The pom.xml is a default XML
昀椀 le that carries all the information regarding the GroupID, ArtifactID, as
well as the Version values. The user needs to de 昀椀 ne all the necessary
project dependencies in the pom.xml 昀椀 le. Go to the pom.xml 昀椀 le.
lOMoARcPSD|40864992
pom.xl
4
lOMoARcPSD|40864992
5
6
7
8
9
10
11
12
13
<project>
...
<dependencies>
</project>
If the version number appears red in color, it means the user missed to
enable the 'Auto-Import' option. If so, go to View > Tool Windows >
Maven. A Maven Projects Window will appear on the right side of the
screen. Click on the 'Refresh' button appearing right there. This will
enable the missed Auto-Import Maven Projects. If the color changes to
black, it means the missed dependency is downloaded.
lOMoARcPSD|40864992
</dependency>
lOMoARcPSD|40864992
</dependency>
Now, we have set all the required dependencies. Let's try the Simple Hello
World example.
While creating the java package, follow the package naming conventions.
Finally, create the sample application program as shown below.
1
2
3
4
5
6
7
8
lOMoARcPSD|40864992
9
10
11
12
packageio.conduktor.demos.kafka;
importorg.slf4j.Logger;
importorg.slf4j.LoggerFactory;
publicclassHelloWorld{
privatestatic 昀椀 nalLogger log =LoggerFactory.getLogger(HelloWorld.class);
publicstaticvoidmain(String[]args){
log.info("Hello World");
}
Run the application (the play green button on line 9 in the screenshot
below) and verify that it runs and prints the message, and exits with
code 0. This means that your Java application has run successfully.
Expand the 'External Libraries' on the Project panel and verify that it
displays the dependencies that we added for the project in pom.xml.
lOMoARcPSD|40864992
We have created a sample Java project that includes all the needed
dependencies. This will form the basis for creating Java producers and
consumers next.
<dependency>
<groupId>org.apache.ka 昀欀 a</groupId>
<version>3.1.0</version>
</dependency>
import java.util.Properties;
import java.util.concurrent.ExecutionException;
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");
OUTPUT
lOMoARcPSD|40864992
PROGRAM
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
OUTPUT
7. Write a script to create a topic with specific partition and replication factor
settings.
Below is a script written in Scala for creating a Kafka topic with speci 昀 椀 c
partition and replication factor settings. This script can be executed in IntelliJ
IDEA with the Kafka dependencies included in the project.
Program
import java.util.Properties
import org.apache.kafka.clients.admin.{AdminClient, NewTopic}
import scala.collection.JavaConverters._
object KafkaTopicCreator {
def main(args: Array[String]): Unit = {
// Kafka broker properties val props = new
Properties() props.put("bootstrap.servers",
"localhost:9092")
// Create AdminClient
val adminClient = AdminClient.create(props)
// Create topic
lOMoARcPSD|40864992
// Close AdminClient
adminClient.close()
}
}
3. Create a new Scala file (e.g., KafkaTopicCreator.scala) and paste the script into it.
4. Make sure your Kafka broker is running on localhost:9092.
5. Run the KafkaTopicCreator object in IntelliJ IDEA.
OUTPUT
We should see the output indicating whether the topic creation was successful or not.
8. Simulate fault tolerance by shutting down one broker and observing the cluster
behavior.
To simulate fault tolerance by shutting down one broker and observing the cluster behavior in
IntelliJ, you'll need to set up a Kafka cluster and create a sample producer and consumer
application. Then, you'll shut down one of the brokers to observe the behavior. Here's a step-
by-step example:
1. Set Up Kafka Cluster:
Ensure you have Kafka installed and configured with multiple brokers. You
can refer to the Kafka documentation for detailed instructions.
2. Create a Topic:
Let's assume we have a topic named "test_topic" with a replication factor of 3
and 3 partitions. Run the following command in your Kafka installation
directory:
lOMoARcPSD|40864992
bin/ka 昀欀 a-topics.sh --create --zookeeper localhost:2181 --replica 琀椀 on-factor 3 --par 琀椀琀椀 ons 3 --
topic test_topic
Producer program
import org.apache.kafka.clients.producer.*;
import java.util.Properties;
public class KafkaProducerExample {
public static void main(String[] args) { Properties
props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
try {
for (int i = 0; i < 10; i++) {
String message = "Message " + i;
producer.send(new ProducerRecord<>(topic, Integer.toString(i), message));
System.out.println("Sent message: " + message);
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
}
}
5. Consumer Application:
lOMoARcPSD|40864992
Create a Java class for the consumer application. This application will consume
messages from the Kafka topic.
Consumer code
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class KafkaConsumerExample {
public static void main(String[] args) { Properties
props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", "test_group");
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
{ System.out.printf("Received message: offset = %d, key = %s, value =
%s
%n",
record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
}
}
6. Run Applications:
Run the producer application and then the consumer application in IntelliJ.
7. Observe Behavior:
lOMoARcPSD|40864992
While both producer and consumer are running, shut down one of the Kafka
brokers in your Kafka cluster. You can do this by stopping the Kafka process
associated with that broker.
Output:
Observe how the consumer continues to receive messages without interruption despite the
broker shutdown. Kafka automatically handles the fault tolerance by reassigning partitions to
the remaining brokers.
We can monitor the logs in IntelliJ to see how Kafka handles the failure and reassignment of
partitions.
To implement operations such as listing topics, modifying configurations, and deleting topics
in IntelliJ IDEA, you would typically interact with Apache Kafka, a distributed streaming
platform. Here's a step-by-step guide on how to perform these operations using the Kafka
command line tools (kafka-topics.sh) within IntelliJ IDEA:
1. Setting up Kafka in IntelliJ IDEA:
First, make sure you have Apache Kafka installed and running on your local
machine or on a server accessible from IntelliJ IDEA.
Open your IntelliJ IDEA project.
2. Create a new Kotlin/Java file:
Right-click on your project folder in the project explorer.
Select "New" -> "Kotlin File/Java Class" to create a new Kotlin/Java file.
3. List Topics:
To list topics, you can use the Kafka command-line tool kafka-topics.sh.
Execute the following Kotlin/Java code to list topics:
Listing Topic
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
val runtime = Runtime.getRuntime()
val process = runtime.exec("/path/to/kafka/bin/kafka-topics.sh --list --bootstrap-
server localhost:9092")
Modify Configaration:
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
val topicName = "your_topic_name"
val configKey = "compression.type"
val configValue = "gzip"
Replace your_topic_name with the name of the topic you want to modify, and
/path/to/kafka/bin/kafka-configs.sh with the actual path to kafka-configs.sh script.
5. Delete Topics:
To delete topics, you can use the Kafka command-line tool kafka-topics.sh.
Execute the following Kotlin/Java code to delete topics:
Delete Topics
import java.io.BufferedReader
import java.io.InputStreamReader
fun main() {
val topicName = "your_topic_name"
lOMoARcPSD|40864992
Replace your_topic_name with the name of the topic you want to delete, and
/path/to/kafka/bin/kafka-topics.sh with the actual path to kafka-topics.sh script.
6. Run the code:
Run the Kotlin/Java file in IntelliJ IDEA.
You should see the output in the console showing the list of topics,
configuration modification status, or topic deletion status.
Make sure we have appropriate permissions and the Kafka server is running when executing
these commands. Additionally, replace placeholders such as /path/to/kafka/bin/ and
localhost:9092 with actual paths and addresses relevant to your Kafka setup.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
external systems.
Kafka Connect is a framework that provides scalable and reliable streaming data integration
between Apache Kafka and other systems. It simplifies the process of building and managing
connectors to move data in and out of Kafka.
To demonstrate how to use Kafka Connect and connectors to integrate with external systems,
let's walk through an example of setting up a simple connector to move data from a CSV file
to a Kafka topic. We'll use IntelliJ IDEA as our IDE.
Step 1: Setup Kafka and Kafka Connect
Ensure you have Apache Kafka installed and running on your local machine. Additionally,
you'll need to have Kafka Connect installed. You can find installation instructions in the
Apache Kafka documentation.
Step 2: Create a Kafka Connector Configuration File
Create a JSON configuration file for your Kafka connector. For this example, let's call it csv-
Program
source-connector.json:
lOMoARcPSD|40864992
{
"name": "csv-source-connector",
"config": {
"connector.class": "FileStreamSource",
"tasks.max": "1",
"file": "<path_to_your_csv_file>",
"topic": "csv-topic"
}
}
This command assumes you're using the standalone mode of Kafka Connect. Adjust the paths
as necessary for your setup.
tep
5: Produce and Consume Data
Now, let's produce some data to the CSV file and consume it from the Kafka topic.
Example CSV File (data.csv):
OUTPUT
11. Implement a simple word count stream processing application using Kafka Stream
lOMoARcPSD|40864992
Kafka Streams in IntelliJ IDEA, you'll 昀椀 rst need to set up a Kafka cluster. You can
use Docker to set up a local Kafka cluster quickly. Then, you'll create a Maven
project in IntelliJ IDEA and add the necessary dependencies for Kafka Streams.
Program:
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Produced;
import java.util.Arrays;
import java.util.Properties;
wordCounts.to("word-count-output",
Produced.with(org.apache.kafka.common.serialization.Serdes.String(),
org.apache.kafka.common.serialization.Serdes.Long()));
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
Output