0% found this document useful (0 votes)
12 views

AK

Uploaded by

testmailnew19128
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

AK

Uploaded by

testmailnew19128
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Apache Kafka Basics for Beginners

1. What is Apache Kafka?


• Apache Kafka is an open-source distributed event-streaming platform.
• It's primarily used for real-time data pipelines and stream processing.
• Kafka helps apps send and receive large volumes of data efficiently.

2. Key Concepts
1. Event (Message):
o A unit of data written to Kafka (like a log entry or JSON record).
o Example: "User A purchased item X."
2. Producer:
o Sends events (messages) into Kafka topics.
3. Consumer:
o Reads events from Kafka topics.
4. Topics:
o A named category in Kafka where messages are stored.
o Example: A topic called purchases stores all purchase events.
5. Partitions:
o Each topic is split into partitions for scalability.
o Messages in partitions are ordered.
6. Offset:
o A unique number identifying the position of a message in a partition.
7. Broker:
o A Kafka server that stores data and serves client requests.
o Kafka is a cluster of multiple brokers.
8. Zookeeper:
o Coordinates the Kafka cluster by managing metadata, leader elections, etc.
o Not required in newer Kafka versions (2.8+ uses KRaft mode).
9. Consumer Group:
o A group of consumers that work together to consume messages from a topic.
10. Kafka Connect:
o A tool for moving data between Kafka and external systems.

3. Kafka Architecture
1. Producers:
o Write messages to a topic.
2. Topics and Partitions:
o Topics are split into partitions to handle large-scale data.
o Messages in a partition are immutable and ordered.
3. Consumers:
o Read messages from topics in a pull-based model.
4. Brokers:
o Kafka servers that store topic data and respond to client requests.
5. Replication:
o Kafka replicates topic partitions across brokers to ensure fault tolerance.
6. ZooKeeper:
o Maintains cluster state and handles leader election.
o (Optional in Kafka 2.8+)

4. Key Functionalities
1. Publish-Subscribe:
o Kafka uses a topic-based publish-subscribe model.
2. Fault Tolerance:
o Data is replicated across brokers to ensure reliability.
3. Durability:
o Kafka stores data on disk, making it highly durable.
4. Scalability:
o Kafka handles large-scale data by scaling brokers and partitions.
5. Real-Time Processing:
o Kafka processes messages in near real-time.

Commands for Apache Kafka


1. Setup Kafka and Zookeeper
• Start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
• Start Kafka Broker:
bin/kafka-server-start.sh config/server.properties
2. Basic Kafka CLI Commands
a) Create a Topic:
bin/kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 --partitions 3 --replication-
factor 1
b) List All Topics:
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
c) Describe a Topic:
bin/kafka-topics.sh --describe --topic my-topic --bootstrap-server localhost:9092
d) Delete a Topic:
bin/kafka-topics.sh --delete --topic my-topic --bootstrap-server localhost:9092

3. Producer and Consumer CLI


a) Start a Producer:
bin/kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092
• Enter messages manually.
b) Start a Consumer:
bin/kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092

4. Kafka Connect Commands


a) Start Kafka Connect:
bin/connect-distributed.sh config/connect-distributed.properties
b) Check Active Connectors:
curl -X GET http://localhost:8083/connectors
c) Create a Connector:
curl -X POST -H "Content-Type: application/json" \
-d '{
"name": "my-source-connector",
"config": {
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"tasks.max": "1",
"file": "/path/to/input/file.txt",
"topic": "my-topic"
}
}' http://localhost:8083/connectors
d) Delete a Connector:
curl -X DELETE http://localhost:8083/connectors/my-source-connector

5. Kafka Service Management


a) Kafka Service File:
• Example systemd file for Kafka:
[Unit]
Description=Apache Kafka
After=network.target

[Service]
User=kafka
Group=kafka
ExecStart=/path/to/kafka/bin/kafka-server-start.sh /path/to/kafka/config/server.properties
ExecStop=/path/to/kafka/bin/kafka-server-stop.sh
Restart=on-failure

[Install]
WantedBy=multi-user.target
b) Enable Kafka Service:
sudo systemctl enable kafka
c) Start Kafka Service:
sudo systemctl start kafka
d) Check Kafka Service Status:
sudo systemctl status kafka

6. Zookeeper Service Management


a) Zookeeper Service File:
• Example systemd file for Zookeeper:
[Unit]
Description=Apache Zookeeper
After=network.target

[Service]
User=zookeeper
Group=zookeeper
ExecStart=/path/to/zookeeper/bin/zkServer.sh start
ExecStop=/path/to/zookeeper/bin/zkServer.sh stop
Restart=on-failure

[Install]
WantedBy=multi-user.target
b) Enable Zookeeper Service:
sudo systemctl enable zookeeper
c) Start Zookeeper Service:
sudo systemctl start zookeeper
d) Check Zookeeper Service Status:
sudo systemctl status zookeeper

7. Advanced Kafka CLI Commands


a) View Consumer Groups:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
b) Describe a Consumer Group:
bin/kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092
c) Reset Offsets for a Consumer Group:
bin/kafka-consumer-groups.sh --reset-offsets --group my-group --topic my-topic --to-earliest --execute --
bootstrap-server localhost:9092
1. Event (Message)
Definition: A single unit of data sent to Kafka.
• Structure:
o Key: Used for partitioning or filtering (optional).
o Value: The actual data (payload).
o Headers: Metadata (optional).
o Timestamp: When the event was created.
Example: Imagine a shopping app:
• Event Key: user_123
• Event Value: { "item": "laptop", "price": 1200 }
• Timestamp: 2025-01-07T12:30:00Z

2. Topics
Definition: A topic is like a folder where Kafka stores messages. Producers write to topics, and consumers
read from topics.
• Example:
o Topic name: orders
o Messages:
▪ { "order_id": 1, "user": "John", "amount": 250 }
▪ { "order_id": 2, "user": "Alice", "amount": 500 }

3. Partitions
Definition: Each topic is divided into partitions for scalability and parallelism.
• Key Points:
o Messages in a partition are ordered.
o Partitions are distributed across Kafka brokers.
o A topic can have multiple partitions.
Example:
• Topic: orders with 3 partitions.
• Partition 0: { "order_id": 1 }
• Partition 1: { "order_id": 2 }
• Partition 2: { "order_id": 3 }
Messages are assigned to partitions based on:
1. Key Hashing: e.g., "order_id" % number_of_partitions
2. Round-robin (if no key is provided).

4. Producers
Definition: Producers send events (messages) to Kafka topics.
• Example: A payment gateway produces messages:
o Topic: transactions
o Messages:
▪ { "transaction_id": 101, "status": "success" }
▪ { "transaction_id": 102, "status": "failed" }
How Producers Work:
• Use Kafka’s Producer API.
• Specify:
o Topic: Where the message should go.
o Key (optional): Determines the partition.
o Value: The actual message.
Example Code (Java):
java

Properties props = new Properties();


props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);


producer.send(new ProducerRecord<>("orders", "order_id_1", "{ 'item': 'phone', 'price': 700 }"));
producer.close();

5. Consumers
Definition: Consumers read messages from Kafka topics.
• Example:
o Consumer reads from the orders topic:
▪ { "order_id": 1 }
▪ { "order_id": 2 }
How Consumers Work:
• Use Kafka’s Consumer API.
• Specify:
o Topic: Which topic to read from.
o Group ID: Groups consumers for parallel processing.
o Offset: Controls where to start reading:
▪ earliest (start from the beginning).
▪ latest (start from new messages).
Example Code (Java):
java

Properties props = new Properties();


props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "order-processing-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);


consumer.subscribe(Arrays.asList("orders"));

while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println("Consumed record: " + record.value());
}
}

6. Offset
Definition: The position of a message in a partition.
• Example:
o Partition 0 contains:
▪ Offset 0: { "order_id": 1 }
▪ Offset 1: { "order_id": 2 }
Consumers use offsets to track what they’ve read:
• Offset 0 → Read.
• Offset 1 → Next to be read.

7. Consumer Groups
Definition: A set of consumers sharing the workload.
• Example:
o Topic: orders with 3 partitions.
o Consumer Group: order-group with 3 consumers.
▪ Consumer 1 → Reads Partition 0.
▪ Consumer 2 → Reads Partition 1.
▪ Consumer 3 → Reads Partition 2.
If a consumer fails, its partition is reassigned to another consumer in the group.

8. Kafka Brokers
Definition: Kafka servers that store and manage messages.
• Cluster Example:
o Broker 1: Stores Partition 0 of orders.
o Broker 2: Stores Partition 1 of orders.
o Broker 3: Stores Partition 2 of orders.
If one broker goes down, replicas (copies) on other brokers ensure data availability.

9. Replication
Definition: Each partition is copied (replicated) to other brokers for fault tolerance.
• Example:
o Topic: orders with 2 partitions.
o Replication Factor: 2.
o Partition 0:
▪ Leader: Broker 1.
▪ Replica: Broker 2.
o Partition 1:
▪ Leader: Broker 2.
▪ Replica: Broker 3.
Leader handles all read/write operations. Replicas are backups.

10. ZooKeeper
Definition: Manages Kafka cluster metadata, leader election, and configurations.
• Tasks:
o Keeps track of brokers.
o Handles partition leader election.

11. Kafka Connect


Definition: A tool to integrate Kafka with external systems (e.g., databases, file systems).
• Example Use Case:
o Source Connector: Read records from a database and push to a Kafka topic.
o Sink Connector: Read records from a Kafka topic and write to a database.
Source Connector Example (File):
json

{
"name": "file-source-connector",
"config": {
"connector.class": "FileStreamSource",
"tasks.max": "1",
"file": "/tmp/input.txt",
"topic": "file-topic"
}
}
Sink Connector Example (Database):
json

{
"name": "db-sink-connector",
"config": {
"connector.class": "JDBC",
"connection.url": "jdbc:mysql://localhost:3306/mydb",
"connection.user": "root",
"connection.password": "password",
"topics": "db-topic"
}
}

12. Real-Life Analogy


Think of Kafka like a post office system:
1. Producer: You (sending letters).
2. Topic: Mailbox (different topics for different letters, e.g., bills, ads).
3. Partition: Sections inside the mailbox.
4. Consumer: Postman reading and delivering the letters.
5. Broker: A specific post office branch.
6. Replication: Backups of letters in case one branch loses them.
Confluent Kafka Concepts and Features
1. Kafka Ecosystem Overview
• Kafka Core: Topics, producers, consumers, brokers, partitions, offsets.
• ZooKeeper (legacy): Manages metadata and leader election (replaced by KRaft in Kafka 2.8+).
• Kafka Connect: Integrates Kafka with external systems.
• Kafka Streams: A library for building real-time stream processing applications.
• KSQL (ksqlDB):
o A SQL-like interface for real-time stream processing.
o Queries Kafka topics for analytics, filtering, and transformations.
• Confluent Control Center: A GUI to monitor and manage Kafka clusters.

2. Confluent Platform-Specific Features


1. Schema Registry:
o Stores and manages Avro, JSON Schema, or Protobuf schemas.
o Enforces schema compatibility (backward, forward, full).
o Enables efficient serialization/deserialization.
o Schema Registry APIs:
▪ Register a schema.
▪ Fetch schema by ID or subject.
2. Rest Proxy:
o Allows Kafka operations (produce/consume messages) via REST APIs.
o Ideal for lightweight clients or non-Java applications.
3. Tiered Storage:
o Moves older Kafka log segments to cheaper storage (e.g., S3, GCS).
o Reduces broker disk usage.
o Enables querying of older data without consuming broker resources.
4. RBAC (Role-Based Access Control):
o Granular permissions for producers, consumers, and admins.
o Roles: ClusterAdmin, ResourceOwner, etc.
5. Multi-Region Replication:
o Uses Confluent Replicator or MirrorMaker 2 for geo-replication.
o Ensures high availability and disaster recovery across regions.

3. Kafka Data Model and Formats


1. Serialization/Deserialization:
o Common formats: Avro, JSON, Protobuf, and raw strings.
o Use Confluent Schema Registry for schema evolution and compatibility.
2. Message Structure:
o Headers: Metadata (e.g., tracing info, content type).
o Key: Determines partitioning.
o Value: The payload of the message.
3. Partitioning Strategies:
o Key-based hashing (default).
o Custom partitioners (e.g., round-robin, sticky partitioning).

4. Kafka Connect Advanced Concepts


1. Source and Sink Connectors:
o Examples:
▪ Source: JDBC Source Connector, FileStreamSource.
▪ Sink: JDBC Sink Connector, Elasticsearch Sink.
o Understand how to configure connectors with properties:
▪ tasks.max: Number of parallel tasks.
▪ batch.size: Optimize performance.
▪ Error handling: errors.tolerance, errors.deadletterqueue.topic.name.
2. Distributed Mode:
o Shares workloads across multiple workers for scalability and fault tolerance.
3. Transformations:
o Single Message Transforms (SMTs):
▪ Modify messages (e.g., mask fields, flatten structures).
▪ Examples:
▪ ReplaceField: Include/exclude fields.
▪ ExtractField: Extract nested fields.
4. Debezium:
o A popular connector for Change Data Capture (CDC) from databases (MySQL,
PostgreSQL, MongoDB).

5. Kafka Streams and KSQL (ksqlDB)


1. Kafka Streams:
o A Java library for real-time stream processing.
o Key concepts:
▪ KStream: Represents an unbounded stream of records.
▪ KTable: Represents a changelog or table view of a stream.
▪ Stateless and stateful operations (e.g., filtering, aggregations).
2. ksqlDB:
o SQL-like syntax for Kafka streams.
o Key operations:
▪ Persistent Queries:
sql

CREATE STREAM filtered_stream AS


SELECT *
FROM original_stream
WHERE order_amount > 100;
▪ Windowed Aggregations:
sql

SELECT COUNT(*) AS total_orders


FROM orders
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY customer_id;

6. Monitoring and Metrics


1. Kafka Metrics:
o Understand key broker metrics:
▪ UnderReplicatedPartitions: Indicates replication issues.
▪ ConsumerLag: Measures lag between producer and consumer.
▪ Disk and network I/O metrics.
2. Confluent Control Center:
o GUI for monitoring:
▪ Consumer lag.
▪ Broker health.
▪ Throughput.
3. JMX Monitoring:
o Exposes metrics via JMX (Java Management Extensions).
o Integrate with tools like Prometheus or Grafana.

7. Security in Kafka
1. Encryption:
o TLS (SSL): Secures communication between Kafka clients and brokers.
o Key configurations:
▪ ssl.keystore.location
▪ ssl.truststore.location
2. Authentication:
o SASL (Simple Authentication and Security Layer):
▪ Mechanisms: PLAIN, SCRAM, GSSAPI (Kerberos).
▪ Configurations for SASL/PLAIN:
properties

sasl.mechanism=PLAIN
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user"
password="pass";
3. Authorization:
o ACLs (Access Control Lists):
▪ Define permissions for topics, consumer groups, etc.
▪ Example:
bash

kafka-acls --add --allow-principal User:alice --operation READ --topic my-topic


4. Role-Based Access Control (RBAC):
o Confluent-specific.
o Assign roles to users or services (e.g., ResourceOwner for topic management).

8. Advanced Kafka Topics


1. Rebalancing:
o When partitions are reassigned among consumers in a group.
o Avoid excessive rebalancing with sticky assignors or tuning:
▪ session.timeout.ms
▪ max.poll.interval.ms
2. Compaction:
o Retains the latest record for each key in a topic.
o Use case: Maintaining stateful logs or changelogs.
3. Dead Letter Queues (DLQs):
o Stores failed messages for debugging and retries.
o Example configuration:
properties

errors.deadletterqueue.topic.name=my-dlq
errors.deadletterqueue.context.headers.enable=true
4. Idempotence and Exactly-Once Semantics (EOS):
o Ensure messages are not duplicated during retries.
o Idempotent producer:
properties

enable.idempotence=true
o Transactions:
java

producer.initTransactions();
producer.beginTransaction();
producer.send(record);
producer.commitTransaction();
5. Cluster Balancing:
o Tools: Cruise Control for automatic rebalancing and monitoring.

9. Multi-Tenancy
• Namespace Management:
o Use prefixes or separate clusters for tenants.
o Example: Tenant-specific topics (tenantA.orders, tenantB.orders).
• Quota Management:
o Limit producer and consumer throughput:
properties
quota.producer.default=1000000
quota.consumer.default=2000000

10. Disaster Recovery and Backup


1. Multi-Region Clusters:
o Use Confluent Replicator or MirrorMaker 2.
o Synchronous or asynchronous replication.
2. Backup Strategies:
o Backup Kafka logs to S3 or HDFS.
o Archive schemas in the Schema Registry.

11. Kafka Optimization


1. Producer Performance:
o Use batching:
properties
linger.ms=10
batch.size=16384
o Compression:
properties
compression.type=snappy
2. Consumer Performance:
o Increase fetch.min.bytes for fewer but larger fetches.
3. Broker Tuning:
o Log segment size:
properties

log.segment.bytes=1073741824
o Retention settings:
properties

log.retention.hours=168
Practice Questions with Explanations
1. Kafka Topics and Partitions
Question 1:
You have a topic user-logins with 5 partitions and a replication factor of 3. What happens if one broker goes
offline?
• A) All partitions become unavailable.
• B) All partitions will have reduced replication.
• C) Only some partitions will become unavailable.
• D) The cluster will stop accepting writes.
Answer:
• B) All partitions will have reduced replication.
Explanation: If one broker goes offline, the partitions hosted on that broker will still be available if
their replicas are on other brokers. However, the replication factor will temporarily decrease until the
broker is restored or replicas are reassigned.

2. Schema Registry
Question 2:
You register a schema for a topic using Schema Registry. What happens if a producer sends a message that
does not match the schema?
• A) The message is rejected.
• B) The message is accepted but logged as a warning.
• C) The consumer fails to deserialize the message.
• D) The message is dropped silently.
Answer:
• A) The message is rejected.
Explanation: Schema Registry enforces schema validation during message production. If the data
does not conform to the schema, it is rejected before being written to the topic.

Question 3:
Which compatibility modes does Confluent Schema Registry support?
• A) Backward, Forward, Full.
• B) Strict, Relaxed, None.
• C) Additive, Non-Additive, Full.
• D) None of the above.
Answer:
• A) Backward, Forward, Full.
Explanation: Schema Registry supports compatibility modes to ensure smooth schema evolution.
Examples:
o Backward Compatibility: Consumers using older schemas can read new data.
o Forward Compatibility: Consumers using newer schemas can read older data.
o Full Compatibility: Both backward and forward compatibility are maintained.

3. Kafka Connect
Question 4:
You configure a Kafka Connect source connector for a database. However, you notice duplicate messages in
the Kafka topic. What is the likely cause?
• A) The connector tasks are set too high.
• B) The source database has duplicate records.
• C) Offset management is not properly configured.
• D) The topic has too many partitions.
Answer:
• C) Offset management is not properly configured.
Explanation: Kafka Connect tracks offsets to ensure exactly-once delivery. If offset management fails
(e.g., connector restarts or misconfiguration), duplicate messages can occur.

Question 5:
What property determines the maximum number of parallel tasks in Kafka Connect?
• A) tasks.parallel
• B) connect.tasks
• C) tasks.max
• D) max.tasks
Answer:
• C) tasks.max.
Explanation: tasks.max defines the number of parallel tasks that a connector can spawn, enabling
parallelism and scalability.

4. Kafka Streams
Question 6:
You are using Kafka Streams to aggregate order totals per user. Which Kafka Streams concept should you
use to store the state of the aggregation?
• A) KStream
• B) GlobalKTable
• C) KTable
• D) Processor API
Answer:
• C) KTable.
Explanation: A KTable is used for aggregations and maintains the latest state of a keyed record. In
this case, it stores the total order amounts per user.

Question 7:
Write a Kafka Streams application to filter out transactions below $100 and write valid transactions to a new
topic.
Answer (Java):
java

Properties props = new Properties();


props.put(StreamsConfig.APPLICATION_ID_CONFIG, "transaction-filter");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

StreamsBuilder builder = new StreamsBuilder();


KStream<String, String> transactions = builder.stream("transactions");
KStream<String, String> validTransactions = transactions.filter((key, value) -> {
JsonObject transaction = JsonParser.parseString(value).getAsJsonObject();
return transaction.get("amount").getAsDouble() >= 100;
});
validTransactions.to("valid-transactions");

KafkaStreams streams = new KafkaStreams(builder.build(), props);


streams.start();

5. Monitoring and Troubleshooting


Question 8:
What metric indicates that a consumer is not keeping up with the producer?
• A) UnderReplicatedPartitions
• B) BytesOutPerSec
• C) ConsumerLag
• D) ActiveConsumerGroups
C) ConsumerLag.
Explanation: Consumer lag measures the difference between the latest offset in the partition and the last
committed offset by the consumer.
Question 9:
You observe high UnderReplicatedPartitions. What should you check first?
• A) Network bandwidth between brokers.
• B) Consumer group lag.
• C) Number of partitions in the topic.
• D) Kafka topic configuration.
Answer:
• A) Network bandwidth between brokers.
Explanation: UnderReplicatedPartitions occurs when followers cannot keep up with the leader.
This is often caused by network bottlenecks or disk I/O issues.

6. Security
Question 10:
You are setting up TLS encryption for Kafka. Which configuration is required on the broker?
• A) ssl.enabled=true
• B) security.protocol=SSL
• C) ssl.keystore.location
• D) Both B and C.
D) Both B and C.
Explanation: For TLS encryption, you must specify the security protocol as SSL and provide the keystore
and truststore locations.

7. Advanced Kafka Features


Question 11:
Explain how exactly-once semantics (EOS) works in Kafka.
Answer:
• Kafka ensures EOS by combining:
1. Idempotent Producers: Prevent duplicate messages during retries.
▪ Config: enable.idempotence=true.
2. Transactional Producers: Group messages into atomic transactions.
▪ Config: transactional.id.
3. Consumer Offsets in Transactions: Commit offsets atomically with the messages.

Question 12:
You need to replicate data between two Kafka clusters in different regions. Which tool should you use?
• A) Kafka Streams
• B) Confluent Replicator
• C) MirrorMaker 2
• D) Both B and C.
Answer:
• D) Both B and C.
Explanation:
• Confluent Replicator: Ideal for Confluent Platform users with schema registry support.
• MirrorMaker 2: Open-source solution for geo-replication.

Hands-On Lab Questions


1. Kafka Connect:
o Create a Kafka source connector to read from a PostgreSQL database.
o Ensure exactly-once semantics and configure a dead-letter queue.
2. Kafka Streams:
o Build a stream processing application that reads from a topic, aggregates sales per product,
and writes the result to a new topic.
3. Monitoring:
o Use JMX to monitor Kafka broker metrics and identify bottlenecks.
4. Security:
o Configure a Kafka broker to use SASL/PLAIN authentication and authorize a user to read a
specific topic.

Mock Scenario
Scenario:
You are managing a Kafka cluster with the following requirements:
1. Messages must be replicated across 3 brokers with fault tolerance.
2. Consumer lag should be minimized.
3. Integrate with a database for CDC (Change Data Capture).
Questions:
1. What replication factor should you configure for the topic?
o Answer: 3.
2. How do you monitor and reduce consumer lag?
o Answer: Use the Consumer Lag metric, and ensure consumers are properly distributed across
partitions.
3. Which Kafka Connect plugin is ideal for CDC?
o Answer: Debezium.

You might also like