0% found this document useful (0 votes)
166 views

Lab - Qlik Replicate Database With Kafka

This document provides instructions for replicating data from a MySQL database to Apache Kafka using Qlik Replicate. It outlines the prerequisites, describes how to configure the MySQL source endpoint and Kafka target endpoint, and how to set up and run a replication task. Key steps include configuring the MySQL server details and database as the source, setting the Kafka brokers and JSON message format as the target, and mapping source tables to target topics in the replication task. Running the task will begin replicating data from MySQL in near real-time to the Kafka messaging system.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views

Lab - Qlik Replicate Database With Kafka

This document provides instructions for replicating data from a MySQL database to Apache Kafka using Qlik Replicate. It outlines the prerequisites, describes how to configure the MySQL source endpoint and Kafka target endpoint, and how to set up and run a replication task. Key steps include configuring the MySQL server details and database as the source, setting the Kafka brokers and JSON message format as the target, and mapping source tables to target topics in the replication task. Running the task will begin replicating data from MySQL in near real-time to the Kafka messaging system.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Lab: Qlik Replicate for

MySQL Database to Kafka


TABLE OF CONTENTS
Introduction ............................................................................................................................................... 3
Prerequisites .............................................................................................................................................. 3
Accessing Replicate Environment ............................................................................................................. 4
MySQL Source Configuration ..................................................................................................................... 5
Kafka Target Configuration ....................................................................................................................... 9
Configure Replication Task ...................................................................................................................... 13
Run Task ................................................................................................................................................... 19
Summary .................................................................................................................................................. 23
TIPS & Tricks ............................................................................................................................................. 24
Kafka JSON Format 24
Attunity Kafka Messages 25
Metadata Message 25
Data Message 26

Transaction processing – ......................................................................................................................... 27


How to configure the Attunity Kafka endpoint and what needs to be done in the consuming application 27
How does it work then? 28
What does it mean to the consumer, if transaction consistency is important? 28
Introduction
The objective of this lab is to provide the hands-on knowledge to participants on Qlik Replicate capability of
migrating the data from MySQL database to Kafka messaging application at real time. End of this lab
participant should be able to perform the data migration from database to Kafka.

Prerequisites
• Network connectivity should be good while executing the lab.

MySQL Database to Kafka | Introduction 3


Accessing Replicate Environment
Steps
1. Ensure that the URL for the Replicate Server is available, and access granted.
• This will be provided by your Systems Administrator.

2. Open browser** and enter the URL of the Replication Server.

The format is: https://replicate.attunitydemo.com:3552/attunityreplicate/7.0.0.267/#!/tasks

• This will prompt you to log into Replicate.

3. Enter your Username and Password.


Credentials would be provided by your System Administrator.
• This will take you to the Replicate Console.

MySQL
MySQL Database to Kafka Database
| Accessing to Kafka
Replicate in Qlik Replicate Environment
Environment 4
4
MySQL Source Configuration
The first thing we need to do is create a source endpoint. We do this by clicking the Manage Endpoint
Connections button at the top of the screen.

4. Select Manage Endpoint Connections.


The following window will appear:

We will now create the MySQL Endpoint.

MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 5
5
5. Select + New Endpoint Connection.

6. Enter a meaningful Endpoint Name and Description for the Endpoint Connector.

7. With Source button selected, Select dropdown arrow to select MySQL.

You will notice as we proceed that the content of the configuration window is context sensitive.

MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 6
6
8. Fill in both Source, Target Servers and MySQL Database credentials as provided
by your Systems Administrator.

Server:
Port:
User:
Password:
Security/SSL Mode:

MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 7
7
9. Select Test Connection.
This tells whether your configuration is correct.

Look for the “Test Connection succeeded” message. Any other message means something may be
incorrect with your Server/Database definitions, or the Server/Database is unavailable.

This is how the configuration will appear:

10. Select Save.


The configuration of your MySQL source endpoint is complete.

11. Click Close.

MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 8
8
Kafka Target Configuration
Next, we need to configure our Kafka target endpoint. The process is much the same as you saw with the endpoints,
and once again you will note that the configuration process is context sensitive as we move along.
As before, the first step in the configuration process is to tell Replicate that we want to create a new endpoint.
1. In the Replicate Console, Select Manage Endpoint Connections.

2. Select + New Endpoint Connection.

3. Enter a meaningful Endpoint Name and Description for the Endpoint Connector.
Replace the text “New Endpoint Connection 1” with something more descriptive like Confluent Kafka
JSON Target, make sure the Target radio button is selected, and then select Kafka from the dropdown
selection box.

4. Select Target button selected in Role.

MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 9
9
5. Select Data Message Publishing.

MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 10
10
6. Select Metedata Message Publishing.

The settings should be as indicated in the images above:

• Broker servers: kafka.attunitydemo.com:9092

• Security/Use SSL: NOT checked

• Security/Authentication: None

• Message Properties/Format: JSON

• Message Properties/Compression: None

• Data Message Publishing/Separate Topics for each table

• Data Message Publishing/Partition strategy: By message key

• Data Message Publishing/Message key: Primary key columns

• Metadata Message Publishing/Publish: Do not publish metadata messages

• Metadata Message Publishing/Wrap data…: NOT checked

MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 11
11
7. Select Test Connection.

Again, look for the “Test Connection succeeded” message. Any other message means something may be
incorrect with your Server/Database definitions, or the Server/Database is unavailable.

Your screen should look like the following, indicating that your connection succeeded:

8. Select Save.

9. Select Close.

MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 12
12
Configure Replication Task
Now that we have configured our MySQL source and Kafka target endpoints, we need to tie them together in what we
call a Replicate task. In short, a task defines the following:

• A source endpoint

• A target endpoint

• The list of tables that we want to capture.

• Any transformations we want to make on the data.

We start by doing the following:

1. In the Replication Console, Select +New Task to create a New Task.


The following window will appear:

The following window like this will pop up:

2. Enter a meaningful Task Name.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 13
13
Give this task a meaningful name like MySQL to Kafka. For this task we will take the defaults:
Name: MySQL to Kafka

3. Select Unidirectional radio button.


• This indicates data flows from Source to Target – with no writebacks to source system.

4. Ensure that Full Load and Apply Changes are shaded.

You should have the following filled in:

• Unidirectional
• Full Load: enabled (Blue highlight is enabled; Select to enable / disable.)
• Apply Changes: enabled (Blue highlight is enabled; Select to enable / disable.)
• Store Changes: disabled (Blue highlight is enabled; Select to enable / disable.)

5. Select OK.
• This closes the New Task dialog box.

Once completed, the following window will appear:

Qlik Replicate is all about ease of use. The interface is point-and-click, drag-and-drop. To configure our task,
we need to select a source endpoint (MySQL) and a target endpoint (Kafka). You can either drag the MySQL
Source endpoint from the box on the left of the screen and drop it into the circle that says Drop source
endpoint here, or you can click on the arrow that appears just to the right of the endpoint when you highlight
it.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 14
14
1. On the left of the Replicate Console panel, Select Source.

2. Locate the Source Endpoint created above or one which meets your Source definitions.

3. Drag and drop to Source Endpoint on the right, as indicated in diagram.

4. On the left of the Replicate Console panel, Select Target.

5. Locate the Target Endpoint created above or one which meets your Target definitions.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 15
15
6. Drag and drop to Target Endpoint on the right, as indicated in diagram.

7. Our next step is to select the tables we want to replicate from MySQL into Kafka. Click on the Table
Selection… button in the top center of your browser.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 16
16
8. Select Northwind Schema.

9. Enter % in Table.
10. Press the Search button.
This will retrieve a list of all the tables in the Northwind schema.
Note: entering % is not strictly required. By default, Qlik Replicate will search for all tables (%) if you do
not limit the search.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 17
17
11. Select a few tables:
• Northwind.customers
• Northwind.Employees

Select each table from the Results list and press the > button to move them into the Selected
Tables list. Note that multi-select is enabled. You can select the tables all at once or move them
individually.

At this point we could define transformations on the selected tables if we wanted, but we will keep it simple
for this part of the Northwind and move the data as is instead.

12. Select OK .

That completes configuration of the task. We are now ready to save our task and run it.

MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 18
18
13. Press Save.

14. Select Run.

Run Task
When you press Run, Replicate will automatically switch from Designer mode to Monitor mode. You will be
able to watch the status of the full load as it occurs, and then switch to monitoring change data capture as
well.

1. Select down arrow beside Run.

MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 19
19
2. Select Start Processing.
• If this is not the first time this Task is being used to extract data, Reload Target must be used –
Reload Target will not be an option.

3. Select Yes to reload Data Target.

After Full Load is complete, select on the Completed bar to display the tables. There is DML activity running in
the background.

MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 20
20
4. Select on the Change Processing tab to see it in action.
Note: Changes to the tables occur somewhat randomly in the background. You may need to wait a few
minutes before you will see changes appear in the tables that we selected.

If you would like to see some of the messages we are delivering to Kafka, click on the following link:

http://kafka.attunitydemo.com:8000/#/

This link will open another window that will display the messages we deliver to the
Northwind.customers and northwind.Employees topic using the Kafka “Console Consumer”. As you
look at the messages, you will notice that they are in binary/JSON format, just as we configured the
Replicate Kafka target endpoint.

Data in the JSON format in the Northwind.Customer Topic.

Data in the tabular format.

MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 21
21
Data in the Raw Format.

When you have seen enough, you can declare Victory! for this part of the lab.

5. Press Stop in the top left corner of the Replicate console to end the task.
6. After pressing Stop and clicking Yes in the confirmation dialog.
7. Close the MySQL to Kafka tab or click on the TASKS tab to return to the main window.

MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 22
22
Summary
By the end of this lab we have covered:

• Defined access and authentication into a MySQL source and a Kafka target.

• Defined the source tables from which you want to create Kafka messages.

• Configured the MySQL to Kafka task.

• Captured initial data from the source without while maintaining business continuity (DML activity was
going on in the background to simulate users working on the source database).

• Automatically created Kafka messages from the initial table state.

• Captured all new transactions which were happening while the initial load was running.

• Turned ll net new data into Kafka messages.

• Observed change data being recorded as it is sent to and applied at the target.

MySQL
MySQL Database to Kafka Database to Kafka in Qlik Replicate Environment
| Summary 23
23
TIPS & Tricks
Kafka JSON Format
Replicate produces UTF-8 encoded JSON messages to Kafka.
Each message contains header fields (e.g. task name, source table/schema, timestamp) and the record data (column names and
values).
In case of an UPDATE operation, the previous values of the record are also included.

Below is the JSON structure:

{
"task_name": "task-name",
"table": "table-name",
"schema": "schema-name",
"op": "operation-type",
"stream_position":"position",
"transaction_id":"transaction-id",
"change_seq":"change-seq",
"ts": "change-timestamp",
"data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}] ,
"bu_data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}],
}

Operation types:
F = Full Load
I = Insert
U = Update
D = Delete

Enhancements:

Replicate was producing JSON messages to Kafka, e.g.:

{
"task_name": "SQL_to_Kafka",
"table": "DataTypes",
"schema": "dbo",
"op": "F",
"data": [
{"id": "2"},
{"StrCol": "aaa"},
{"IntCol": "111"},
{"RealCol": "+1.23450005e+00"},
{"BoolCol": "1"},
{"DateTimeCol": "2016-07-25 16:13:39.843"}

MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 24
24
Attunity Kafka Messages

Message Envelope
Field Type Description

magic Fixed Attunity message identifier. Value is "atMSG".


String
Used to distinct Attunity messages from other messages.

type String The type of the embedded message –

"MD" for metadata

"DT" for data

headers Map Reserved for future use. Current value is null.

messageSchemaId String The unique identifier of the message schema.

In Metadata messages, this describes the ID of the message.

In Data messages, this describes the ID of the related Metadata


message.

messageSchema String The Avro schema for deserializing an embedded Avro message.

Used in Metadata (MD) messages.

message Bytes The embedded message, either Metadata or Data.

Metadata Message
Field Type Description

schemaId String The unique identifier of the Avro schema.

lineage Structure Information about the origin of the data (Replicate


server, task, table...).

server String The name of the Replicate server.

task String The name of the task.

schema String The name of the database schema.

table String The name of the table.

tableVersion Integer Replicate maintains a version number of the structure of


source table. Upon DDL change on the source, the
version is increased and a new metadata message is
produced.

MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 25
25
timestamp String The date and time of the metadata message.

tableStructure Structure Describes the structure of the table.

tableColumns Structure Contains the list of columns and their properties.

{columns} Structure For each column - a record with the below properties.

ordinal Integer The position of the column in the record.

type String The column data type. See Replicate documentation for
list of data types.

length Integer The maximum size of the data (in bytes) permitted for
the column.

precision Integer For NUMERIC data type, the maximum number of digits
required to represent the value.

scale Integer For NUMERIC data type, the maximum number of digits
to the right of the decimal point permitted for a number.

primaryKeyPosition Integer The position of the column in the table’s Primary Key or
Unique Index. The value is zero if the column is not part
of the table’s Primary Key.

dataSchema String The Avro schema for deserializing the Data messages.

Data Message
Field Type Description

headers Structure Information about the current record.

operation Enum The operation type.

Full Load (Replicate transfers the existing records from


source table)

REFRESH – insert of a record during Full Load stage

CDC (Replicate transfers the changes from source table)

INSERT – insertion of new record

UPDATE – update of existing record

DELETE – deletion of a record

MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 26
26
changeSequence String A monotonically increasing change sequencer that is
common to all change tables of a task.

Use this field to order the records in chronological order.

Applicable to CDC operations.

timestamp String The original change UTC timestamp.

Applicable to CDC operations.

streamPosition String The source CDC stream position.

Applicable to CDC operations.

transactionId String The ID of the transaction that the change record belongs to.

Use this field to gather all changes of a specific transaction.

Applicable to CDC operations.

data Structure The data of the table record.

{columns} The columns names and values in the current record.

beforeData Structure The data of the table record, before the change.

{columns} The columns names and values, before the change.

Applicable to UPDATE operation.

Transaction processing –
How to configure the Attunity Kafka endpoint and what needs to be done in the
consuming application
When configuring Attunity Replicate Kafka endpoint, the user can configure various settings that can affect where
messages would be published to within the Kafka infrastructures (topics/partitions).

During the CDC stage, whenever committed changes are detected by the source endpoint, these changes are grouped
by transaction, sorted internally by the time (thus the order) they happened and then propagated to the target
endpoint which can handle them in various ways like applying the changes or storing the changes in dedicated
change tables.

Each CDC message has both a transaction ID as well as change sequence. The change sequence is a monotonically
growing number, so sorting things by change sequence always achieves chronological order while grouping the
sorted events by transaction ID would yield transactions containing the matching, chronological sorted changes
within the transactions.

Since Kafka is a messaging infrastructure in which applying a change is not feasible but the concept of storing change
tables is also not meaningful, the Replicate Kafka endpoint takes a different approach which is to report all
transactional events as messages.

MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 27
27
How does it work then?
Each change in the source system is translated to a data message containing the details of the change including the
transaction ID and change sequence in the source. The data message also includes the changed columns before and
after the change. As explained above, the order in which the Kafka target writes the messages is the order of the
changes within each transaction.

Once a data message is ready to be sent to Kafka, the topic and partition it should go to are determined by analyzing
the endpoint settings as well as potentially transformation settings. For example, the user might decide to configure
the endpoint in a way that every table is sent to a different topic and might calculate the partition to be based on a
random strategy meaning each message (within the same table, to follow the example) will go to a different partition.

What does it mean to the consumer, if transaction consistency is important?


If maintaining transaction consistency is important for the consumer implementation, it means that although the
transaction ID does exist in all data messages, the challenge is gathering them in a way that would make identifying a
whole transaction possible. An additional challenge is getting the transaction in the original order they were
committed. This could be a challenge if transactions are spread across multiple topics and partitions.

The simplest way of achieving the above goal is to direct replicate to a specific topic and a specific partition. This
means all data messages will go through one partition, thus guaranteeing ordered delivery both of transactions and
of changes within a transaction. The consuming application can consume messages, accumulating a transaction in
some intermediate memory buffer and when a new transaction ID is spotted, the previous transaction could be
processed as complete.

Although the simple way may work, it’s not of high efficiency on the task level because all messages go to the same
topic and partition and not necessarily utilize the full parallelism of the Kafka cluster. One might consider this not an
issue if there are multiple task each taking advantage of a different topic/partition and the collection of those tasks
may very well utilize the cluster optimally.

The more generic way where data may be spread over multiple topics and partitions means that some intermediate
buffer such as memory, table in a relational database or even another Kafka topic(s) can be used to collect
information about transactions and at a given time interval (every few minutes/hours) the events collected from
Replicate’s Kafka output would be sorted by the change sequence and then grouped by transaction id. This way,
transactions could be rebuilt.

MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 28
28
About Qlik
Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging
problems. Qlik provides an end-to-end, real-time data integration and analytics cloud platform to close the gaps between data, insights and action.
By transforming data into active intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer
relationships. Qlik does business in more than 100 countries and serves over 50,000 customers around the world.
qlik.com
© 2021 QlikTech International AB. All rights reserved. All company and/or product names may be trade names, trademarks and/or registered trademarks of the respective owners with which they are
associated. CODE NEEDED FOR PUBLISHING

MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 29
29

You might also like