Implementing Change Data Capture with DynamoDB Streams

In the modern world it is required to process the data in real-time so that the business can make decisions faster, in this there is a technique called Change Data Capture (CDC) which can be used for tracking and capturing any type of change that occurs in the database.

This helps companies and large organizations process the data and react to those data changes as soon as they occur.

Introduction

Change data capture is a popular process for identifying and capturing the various types of changes that are made to the data in a database. These changes can include operations such as insert, update and delete etc.

It is important to use the CDC technique whenever the business wants to synchronize the data and make sure there is real-time analytics. CDC makes sure that the applications are up to date with the most current data and helps in improving the decision-making of the companies.

Steps for Change Data Capture with Amazon DynamoDB

To change the data capture with the dynamo DB we will have to perform the following steps:

Step 1: Open DynamoDB in AWS

The first step is to open the dynamoDB to create the database before creating the change data capture, for this simply login into the AWS and search for the dynamoDB service and open it.

Step 2: Create Database Table

Next step is to create the database table so once the dynamoDB opens, click on the create table button on the left corner to create a table using the dynamoDB database.

Step 3: Add Table details:

Next step is to add other required information for the database table, here we will have to add the details such as table name, partition key and other optional details, add these details go down and click on the create table button and then move to the next step.

Step 4: Open Kinesis service:

Just like we opened the dynamoDB service, now we will have to open the kinesis service as well, kinesis is a real time data streaming service which can be used and integrated with the dynamoDB streams as well, open this service to configure it.

Step 5: Create Data Stream:

Once we open the kinesis service, we will have to create a data stream with similar settings. you can also change these settings after creating the service so there is no need to worry, after selecting the settings simply click on the create data stream button to create the data stream.

Step 6: Add source and destination:

Once you click on the create data stream button, it will redirect you to the page where you will have to define the source as well as the destination for the streams. here we will choose the amazon kinesis data streams for the source and for the destination we will choose an amazon S3 bucket.

Step 7: Create Lambda Function:

Next step is to create a lambda function, it is a serverless compute service that allows the users to manage and run the code easily without the requirement for managing it. for this we will create the lambda function from scratch and add the required basic information such as the function name, runtime etc. according to our specific needs.

Step 8: Add function code:

In this step, we will have to add the function code for the function. following is an example of function code which we are going to use you can also add a similar function code into your lambda function:

Python

import json
import boto3
import base64

output = []

def lambda_handler(event, context):
    print(event)
    for record in event['records']:
        payload = base64.b64decode(record['data']).decode('utf-8')
        print('payload:', payload)
        
        row_w_newline = payload + "\n"
        print('row_w_newline type:', type(row_w_newline))
        row_w_newline = base64.b64encode(row_w_newline.encode('utf-8'))
        
        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': row_w_newline
        }
        output.append(output_record)

    print('Processed {} records.'.format(len(event['records'])))
    
    return {'records': output}

After adding the code in the code source, click on the deploy button to deploy the lambda function successfully.

Step 9: Create Bucket:

As we are going to use a bucket for accessing the stream data as the destination, we will have to create a bucket. if you already have a bucket move to the next step otherwise go to Amazon S3 > Buckets > Create bucket and create the bucket by adding the bucket name, AWS region and other details.

Step 10: Connect DynamoDB with Amazon Kinesis data stream:

So far we have the dynamoDB set up and ready as well as the amazon kinesis data stream ready. next step is to enable the amazon kinesis data stream for the dynamoDB for this open the dynamoDB > Tables > Update Settings and go to the exports and streams option and click on the enable button for amazon kinesis data stream:

Step 11: Run Query for Output

Once we successfully add the kinesis data stream to the dynamoDB we can run any query we want for the dynamoDB and see if the changes are being shown in the real time, for this we can run the following query by going to the PartiQL editor inside the DynamoDB:

Java

// DynamoDB Query

INSERT INTO "orders" value {'roll_no' : 5,'price' : 5, 'quantity' : 235}
INSERT INTO "orders" value {'roll_no' : 2,'price' : 10, 'quantity' : 120}
INSERT INTO "orders" value {'roll_no' : 3,'price' : 20, 'quantity' : 100}
INSERT INTO "orders" value {'roll_no' : 4,'price' : 30, 'quantity' : 15}

UPDATE orders SET price=90 WHERE roll_no=5

DELETE FROM "orders" WHERE "roll_no" = 3

After adding the query, simply click on the run button to execute the query to check whether it is being captured in real time or not.

Step 12: Open CloudWatch to view CDC:

Once the query runs successfully, we can go to the CloudWatch > Metrics and go to the graphed metrics to check whether the change data capture is being shown or not.

As you can see in the image below, the cloudwatch has captured the change data for the dynamoDB streams.

This is how change data capture can be seen and created for a dynamoDB streams in AWS.

Conclusion

Capturing the change data capture is important because it helps us to see and observe the analytics in real time and understand how the users are making changes and the particular data in the database they are making the change to so that the businesses can take the necessary actions required from time to time. Real time analysis of the data can be gathered from the CDC which can help the business in making more accurate decisions with more efficiency and make sure that company has up to date data. change data capture for the DynamoDB streams can be created by following the steps mentioned in the article.