Apache Kafka Key Concepts
Apache Kafka Key Concepts
Page 1 of 8
Apache Kafka
Let’s discuss about few concepts in Kafka before digging deeper into it.
Producer: Application that sends the messages using Kafka Producer API.
Consumer: Application that receives the messages.
Message: Information that is sent from the producer to a consumer through Apache Kafka.
Broker: Cluster consists of one or more servers which are running with kafka.
Topic: A Topic is a category/feed name to which messages are stored and published.
Topic partition: Kafka topics are divided into a number of partitions, which allows you to
split data across multiple brokers.
Replicas A replica of a partition is a "backup" of a partition. Replicas never read or write
data. They are used to prevent data loss.
Consumer Group: A consumer group includes the set of consumer processes that are
subscribing to a specific topic.
Page 2 of 8
Offset: The offset is a unique identifier of a record within a partition. It denotes the position
of the consumer in the partition.
Node: A node is a single computer in the Apache Kafka cluster.
Cluster: A cluster is a group of nodes i.e., a group of computers.
Let’s discuss about the Kafka Producer and Consumer properties in order to set in your application
code for a better and efficient message transmission and consumption
Producer properties:
acks= The number of acknowledgments the producer requires the leader to have received
before considering a request complete.
Eg: acks = 1
security.protocol= Protocol used to communicate with brokers
Eg:security.protocol=SASL_SSL
sasl.mechanism= SASL mechanism used for client connections
Eg: sasl.mechanism=PLAIN
ssl.enabled.protocols= The list of protocols enabled for SSL connections
Eg: ssl.enabled.protocols=TLSv1.2
request.timeout.ms= The configuration controls the maximum amount of time the client
will wait for the response of a request.
Eg: request.timeout.ms=3000
retries= Setting a value greater than zero will cause the client to resend any request that
fails with a potentially transient error.
Eg: retries=3
batch.size=Producer will batch together the requests until it reaches its size.
Eg:batch.size=65536
linger.ms= Producer will batch together the requests until it reaches the time in millis.
Eg:linger.ms=90
max.in.flight.requests.per.connection=Max number of unacknowledged requests the client
will send on a single connection before blocking
Page 3 of 8
Eg: max.in.flight.requests.per.connection=2
key.serializer – serializer class for the key in order to send a message to kafka topic.
Eg; ByteArraySerializer, JsonSerializer, StringSerializer
value.serializer – serializer class for the value in order to send a message to kafka topic.
Eg; ByteArraySerializer, JsonSerializer, StringSerializer
bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to
the Kafka cluster.
Eg: localhost:9092
Consumer Properties:
key.deserializer – deserializer class for the key in order to deserialize message from kafka
topic.
Eg; ByteArrayDeserializer, JsonDeserializer, StringDeserializer
value.deserializer – deserializer class for the value in order to deserialize message from
kafka topic.
Eg; ByteArrayDeserializer, JsonDeserializer, StringDeserializer
group.id - A unique string that identifies the consumer group this consumer belongs to.
Eg: group.id = something.driver
enable.autocommit.false - If true the consumer's offset will be periodically committed in
the background
Eg: enable.autocommit.false = false
max.poll.records - The maximum number of records returned in a single call to poll().
Eg: max.poll.records=1
auto.offset.reset - What to do when there is no initial offset in Kafka or if the current offset
does not exist any more on the serve
Eg: auto.offset.rest = earliest
bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to
the Kafka cluster to receive the messages
Eg: localhost:9092
Page 4 of 8
If Kafka cluster is secured, please discus with the admin in your team before setting these
properties in the application code.
Zookeeper:
Zookeeper is a top-level software developed by Apache that acts as a centralized service and is
used to maintain naming and configuration data and to provide flexible and robust synchronization
within distributed systems. Zookeeper keeps track of status of the Kafka cluster nodes and it also
keeps track of Kafka topics, partitions etc.
Start the zookeeper first, and then start the kafka-server always. Otherwise you will see the
connection refused
Page 5 of 8
You will see the messages in the consumer console after starting the kafka consumer.
Stop the zookeeper and Kafka server:
zookeeper-server-stop
kafka-server-stop
import java.util.Properties
import org.apache.kafka.clients.producer._
// props.put("ssl.enabled.protocols", "TLSv1.2")
// props.put("security.protocol", "SASL_SSL")
// props.put("sasl.mechanism", "PLAIN")
// .set("bootstrap.servers", "localhost:9092")
// props.put("sasl.jaas.config", //s"org.apache.kafka.common.security.plain.PlainLoginModule required
//username=$username password=$password;")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("acks", "1")
val TOPIC="test"
for(i<- 1 to 50){
val record = new ProducerRecord(TOPIC, "key", s"hello $i")
producer.send(record)
}
producer.close()
}
This code will produce a total of 51 messages to the kafka topic. Commented some of the
properties in the code because my local kafka server is not enabled with any security
mechansims. You will get an idea on how to use those properties when you try to produce the
messages to the kafka cluster which is enabled with security mechansims.
Page 6 of 8
package org.apache.spark.sql.execution.streaming.vertica
import java.util
import org.apache.kafka.clients.consumer.KafkaConsumer
import scala.collection.JavaConverters._
import java.util.Properties
val TOPIC="test"
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "something.driver")
props.put("auto.offset.reset", "latest")
props.put("max.poll.records", "1")
props.put("enable.auto.commit", "false")
// props.put("ssl.enabled.protocols", "TLSv1.2")
// props.put("security.protocol", "SASL_SSL")
// props.put("sasl.mechanism", "PLAIN")
props.put("bootstrap.servers", "localhost:9092”)
consumer.subscribe(util.Collections.singletonList(TOPIC))
while(true){
val records=consumer.poll(0)
for (record<-records.asScala){
println(record.toString)
}
}
}
References:
https://kafka.apache.org/
Page 7 of 8
Page 8 of 8