Lab - Qlik Replicate Database With Kafka
Lab - Qlik Replicate Database With Kafka
Prerequisites
• Network connectivity should be good while executing the lab.
MySQL
MySQL Database to Kafka Database
| Accessing to Kafka
Replicate in Qlik Replicate Environment
Environment 4
4
MySQL Source Configuration
The first thing we need to do is create a source endpoint. We do this by clicking the Manage Endpoint
Connections button at the top of the screen.
MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 5
5
5. Select + New Endpoint Connection.
6. Enter a meaningful Endpoint Name and Description for the Endpoint Connector.
You will notice as we proceed that the content of the configuration window is context sensitive.
MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 6
6
8. Fill in both Source, Target Servers and MySQL Database credentials as provided
by your Systems Administrator.
Server:
Port:
User:
Password:
Security/SSL Mode:
MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 7
7
9. Select Test Connection.
This tells whether your configuration is correct.
Look for the “Test Connection succeeded” message. Any other message means something may be
incorrect with your Server/Database definitions, or the Server/Database is unavailable.
MySQL
MySQL Database to Kafka Database
| MySQL to Kafka in Qlik Replicate Environment
Source Configuration 8
8
Kafka Target Configuration
Next, we need to configure our Kafka target endpoint. The process is much the same as you saw with the endpoints,
and once again you will note that the configuration process is context sensitive as we move along.
As before, the first step in the configuration process is to tell Replicate that we want to create a new endpoint.
1. In the Replicate Console, Select Manage Endpoint Connections.
3. Enter a meaningful Endpoint Name and Description for the Endpoint Connector.
Replace the text “New Endpoint Connection 1” with something more descriptive like Confluent Kafka
JSON Target, make sure the Target radio button is selected, and then select Kafka from the dropdown
selection box.
MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 9
9
5. Select Data Message Publishing.
MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 10
10
6. Select Metedata Message Publishing.
• Security/Authentication: None
MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 11
11
7. Select Test Connection.
Again, look for the “Test Connection succeeded” message. Any other message means something may be
incorrect with your Server/Database definitions, or the Server/Database is unavailable.
Your screen should look like the following, indicating that your connection succeeded:
8. Select Save.
9. Select Close.
MySQL
MySQL Database to Kafka Database
| Kafka to Kafka in Qlik Replicate Environment
Target Configuration 12
12
Configure Replication Task
Now that we have configured our MySQL source and Kafka target endpoints, we need to tie them together in what we
call a Replicate task. In short, a task defines the following:
• A source endpoint
• A target endpoint
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 13
13
Give this task a meaningful name like MySQL to Kafka. For this task we will take the defaults:
Name: MySQL to Kafka
• Unidirectional
• Full Load: enabled (Blue highlight is enabled; Select to enable / disable.)
• Apply Changes: enabled (Blue highlight is enabled; Select to enable / disable.)
• Store Changes: disabled (Blue highlight is enabled; Select to enable / disable.)
5. Select OK.
• This closes the New Task dialog box.
Qlik Replicate is all about ease of use. The interface is point-and-click, drag-and-drop. To configure our task,
we need to select a source endpoint (MySQL) and a target endpoint (Kafka). You can either drag the MySQL
Source endpoint from the box on the left of the screen and drop it into the circle that says Drop source
endpoint here, or you can click on the arrow that appears just to the right of the endpoint when you highlight
it.
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 14
14
1. On the left of the Replicate Console panel, Select Source.
2. Locate the Source Endpoint created above or one which meets your Source definitions.
5. Locate the Target Endpoint created above or one which meets your Target definitions.
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 15
15
6. Drag and drop to Target Endpoint on the right, as indicated in diagram.
7. Our next step is to select the tables we want to replicate from MySQL into Kafka. Click on the Table
Selection… button in the top center of your browser.
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 16
16
8. Select Northwind Schema.
9. Enter % in Table.
10. Press the Search button.
This will retrieve a list of all the tables in the Northwind schema.
Note: entering % is not strictly required. By default, Qlik Replicate will search for all tables (%) if you do
not limit the search.
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 17
17
11. Select a few tables:
• Northwind.customers
• Northwind.Employees
Select each table from the Results list and press the > button to move them into the Selected
Tables list. Note that multi-select is enabled. You can select the tables all at once or move them
individually.
At this point we could define transformations on the selected tables if we wanted, but we will keep it simple
for this part of the Northwind and move the data as is instead.
12. Select OK .
That completes configuration of the task. We are now ready to save our task and run it.
MySQL
MySQL Database to Kafka Database
| Configure to Kafka
Replication in Qlik Replicate Environment
Task 18
18
13. Press Save.
Run Task
When you press Run, Replicate will automatically switch from Designer mode to Monitor mode. You will be
able to watch the status of the full load as it occurs, and then switch to monitoring change data capture as
well.
MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 19
19
2. Select Start Processing.
• If this is not the first time this Task is being used to extract data, Reload Target must be used –
Reload Target will not be an option.
After Full Load is complete, select on the Completed bar to display the tables. There is DML activity running in
the background.
MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 20
20
4. Select on the Change Processing tab to see it in action.
Note: Changes to the tables occur somewhat randomly in the background. You may need to wait a few
minutes before you will see changes appear in the tables that we selected.
If you would like to see some of the messages we are delivering to Kafka, click on the following link:
http://kafka.attunitydemo.com:8000/#/
This link will open another window that will display the messages we deliver to the
Northwind.customers and northwind.Employees topic using the Kafka “Console Consumer”. As you
look at the messages, you will notice that they are in binary/JSON format, just as we configured the
Replicate Kafka target endpoint.
MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 21
21
Data in the Raw Format.
When you have seen enough, you can declare Victory! for this part of the lab.
5. Press Stop in the top left corner of the Replicate console to end the task.
6. After pressing Stop and clicking Yes in the confirmation dialog.
7. Close the MySQL to Kafka tab or click on the TASKS tab to return to the main window.
MySQL
MySQL Database to Kafka | Run Database
Task to Kafka in Qlik Replicate Environment 22
22
Summary
By the end of this lab we have covered:
• Defined access and authentication into a MySQL source and a Kafka target.
• Defined the source tables from which you want to create Kafka messages.
• Captured initial data from the source without while maintaining business continuity (DML activity was
going on in the background to simulate users working on the source database).
• Captured all new transactions which were happening while the initial load was running.
• Observed change data being recorded as it is sent to and applied at the target.
MySQL
MySQL Database to Kafka Database to Kafka in Qlik Replicate Environment
| Summary 23
23
TIPS & Tricks
Kafka JSON Format
Replicate produces UTF-8 encoded JSON messages to Kafka.
Each message contains header fields (e.g. task name, source table/schema, timestamp) and the record data (column names and
values).
In case of an UPDATE operation, the previous values of the record are also included.
{
"task_name": "task-name",
"table": "table-name",
"schema": "schema-name",
"op": "operation-type",
"stream_position":"position",
"transaction_id":"transaction-id",
"change_seq":"change-seq",
"ts": "change-timestamp",
"data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}] ,
"bu_data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}],
}
Operation types:
F = Full Load
I = Insert
U = Update
D = Delete
Enhancements:
{
"task_name": "SQL_to_Kafka",
"table": "DataTypes",
"schema": "dbo",
"op": "F",
"data": [
{"id": "2"},
{"StrCol": "aaa"},
{"IntCol": "111"},
{"RealCol": "+1.23450005e+00"},
{"BoolCol": "1"},
{"DateTimeCol": "2016-07-25 16:13:39.843"}
MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 24
24
Attunity Kafka Messages
Message Envelope
Field Type Description
messageSchema String The Avro schema for deserializing an embedded Avro message.
Metadata Message
Field Type Description
MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 25
25
timestamp String The date and time of the metadata message.
{columns} Structure For each column - a record with the below properties.
type String The column data type. See Replicate documentation for
list of data types.
length Integer The maximum size of the data (in bytes) permitted for
the column.
precision Integer For NUMERIC data type, the maximum number of digits
required to represent the value.
scale Integer For NUMERIC data type, the maximum number of digits
to the right of the decimal point permitted for a number.
primaryKeyPosition Integer The position of the column in the table’s Primary Key or
Unique Index. The value is zero if the column is not part
of the table’s Primary Key.
dataSchema String The Avro schema for deserializing the Data messages.
Data Message
Field Type Description
MySQL
MySQL Database to Kafka | TIPSDatabase
& Tricks to Kafka in Qlik Replicate Environment 26
26
changeSequence String A monotonically increasing change sequencer that is
common to all change tables of a task.
transactionId String The ID of the transaction that the change record belongs to.
beforeData Structure The data of the table record, before the change.
Transaction processing –
How to configure the Attunity Kafka endpoint and what needs to be done in the
consuming application
When configuring Attunity Replicate Kafka endpoint, the user can configure various settings that can affect where
messages would be published to within the Kafka infrastructures (topics/partitions).
During the CDC stage, whenever committed changes are detected by the source endpoint, these changes are grouped
by transaction, sorted internally by the time (thus the order) they happened and then propagated to the target
endpoint which can handle them in various ways like applying the changes or storing the changes in dedicated
change tables.
Each CDC message has both a transaction ID as well as change sequence. The change sequence is a monotonically
growing number, so sorting things by change sequence always achieves chronological order while grouping the
sorted events by transaction ID would yield transactions containing the matching, chronological sorted changes
within the transactions.
Since Kafka is a messaging infrastructure in which applying a change is not feasible but the concept of storing change
tables is also not meaningful, the Replicate Kafka endpoint takes a different approach which is to report all
transactional events as messages.
MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 27
27
How does it work then?
Each change in the source system is translated to a data message containing the details of the change including the
transaction ID and change sequence in the source. The data message also includes the changed columns before and
after the change. As explained above, the order in which the Kafka target writes the messages is the order of the
changes within each transaction.
Once a data message is ready to be sent to Kafka, the topic and partition it should go to are determined by analyzing
the endpoint settings as well as potentially transformation settings. For example, the user might decide to configure
the endpoint in a way that every table is sent to a different topic and might calculate the partition to be based on a
random strategy meaning each message (within the same table, to follow the example) will go to a different partition.
The simplest way of achieving the above goal is to direct replicate to a specific topic and a specific partition. This
means all data messages will go through one partition, thus guaranteeing ordered delivery both of transactions and
of changes within a transaction. The consuming application can consume messages, accumulating a transaction in
some intermediate memory buffer and when a new transaction ID is spotted, the previous transaction could be
processed as complete.
Although the simple way may work, it’s not of high efficiency on the task level because all messages go to the same
topic and partition and not necessarily utilize the full parallelism of the Kafka cluster. One might consider this not an
issue if there are multiple task each taking advantage of a different topic/partition and the collection of those tasks
may very well utilize the cluster optimally.
The more generic way where data may be spread over multiple topics and partitions means that some intermediate
buffer such as memory, table in a relational database or even another Kafka topic(s) can be used to collect
information about transactions and at a given time interval (every few minutes/hours) the events collected from
Replicate’s Kafka output would be sorted by the change sequence and then grouped by transaction id. This way,
transactions could be rebuilt.
MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 28
28
About Qlik
Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging
problems. Qlik provides an end-to-end, real-time data integration and analytics cloud platform to close the gaps between data, insights and action.
By transforming data into active intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer
relationships. Qlik does business in more than 100 countries and serves over 50,000 customers around the world.
qlik.com
© 2021 QlikTech International AB. All rights reserved. All company and/or product names may be trade names, trademarks and/or registered trademarks of the respective owners with which they are
associated. CODE NEEDED FOR PUBLISHING
MySQL
MySQL Database to Kafka Database
| Transaction to Kafka– in Qlik Replicate Environment
processing 29
29