0% found this document useful (0 votes)
42 views

07 Automated Data Migration

This document discusses automated data migration into production environments as part of DevOps deployments. It focuses on migrating transactional data, master data, and database objects across environments. Some key challenges discussed include cleansing and transforming data during extract-transform-load processes, and handling complexities when database objects change. The document provides recommendations to decompose large migrations into smaller incremental pieces and leverage integration patterns and queues to migrate data in a flexible way.

Uploaded by

pablo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

07 Automated Data Migration

This document discusses automated data migration into production environments as part of DevOps deployments. It focuses on migrating transactional data, master data, and database objects across environments. Some key challenges discussed include cleansing and transforming data during extract-transform-load processes, and handling complexities when database objects change. The document provides recommendations to decompose large migrations into smaller incremental pieces and leverage integration patterns and queues to migrate data in a flexible way.

Uploaded by

pablo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Deploy  

Module  7    

Welcome to IBM Agile Academy's DevOps Deployment Video


Education Series. This is Module 7, automated data
migration into production. In this module, we want to talk
about some often forgotten aspects of deployment and how to
address them in DevOps.

We are going to talk about data and database objects and


how migration or deployment of those objects can be
potentially more difficult than artifacts such as code and
are every bit as important to a production application as
code.

Often, new applications that are developed are meant to


either in whole or in part replace existing systems. When
that happens, migration of data from existing systems can
be a critical success factor for a new system.

In this module, we will talk about different approaches and


patterns which can be applied as you think about data
migration. We will also discuss some tools which can be
used in this migration space.

We will also touch on some of the complexities associated


with the migration of database manager objects, how they

are different from code and some tools you might consider
to assist you in this area.

-1-
Deploy  Module  7    

There are many different types of objects which support an


application, and those need to be considered as we talk
about DevOps and deployment.

In other modules, we have touched on things like the design


of the infrastructure, the actual code that supports the
application, the test cases and the configuration, both
system and application, all of which are needed.

In this module, we will focus on those particular objects


toward the top of the chart. Transactional data is that
data which represents the business objects that the
application supports.

Those objects might be contracts, inventory, billing


records and so on. Those same objects are typically stored
as rows and tables in a database manager, such as DB2. In
newer systems being developed, they may actually be stored
as larger objects in NoSQL type databases such as Cloudant.

Master data represents those objects which help to drive


application behavior. These might be objects like a list
of country codes, ledger accounts, et cetera. These
objects tend to be less volatile than transactional data,

also less in volume. Nevertheless, they are all critical


to application execution.

-2-
Deploy  Module  7    

And finally, there are database objects. These are the


actual tables, indexes, columns, et cetera that are defined
within the database manager. As thee objects are changed,
those modifications must be migrated as part of any
deployment activities.

Often, the migration of these involves other tasks which


must be done to support the migration. Data needs to be
migrated is rarely ready to be loaded into a new system.

This data may need to be cleansed and transformed first so


that it can be successfully migrated. As an introduction
and overview, this process is typically known as ETL or
extract, transform and load.

A simplified view is shown here. An extract process is run


from a legacy system. That data is transformed in some way
and then it is loaded into the new system. In a waterfall
style implementation, typically ETL processes are done in
bulk because there is probably a large installation of a
brand new application.

To contrast, our mindset now should change to utilizing


Agile techniques in general for smaller increments of work.

We apply this to use an incremental and evolutionary set


of techniques for data migration.

-3-
Deploy  Module  7    

One of these key evolution techniques is to begin thinking


about migrations as something more like transactional or
discrete integration between systems. We will go into
detail more about this concept soon, but as a way of
introduction, we'll call your attention to this slide,
where we've placed a great website reference available on
enterprise integration patterns.

These may well support your migration needs. For now,


we'll focus on three key patterns. First, a remote
procedure call. This applies the principle of
encapsulation to integrating applications.

If an application needs some information that is owned by


another application, it asks that application directly.
Each application can maintain the integrity of the data
that it owns.

Next is messaging, particularly asynchronous messaging,


which allows for systems to communicate whether they are
both online or not. Using a queue manage such as MQ
Series, will allow for messages to be handled by the
receiving system in a time frame that it wishes.

Finally, APIs allows systems to communicate with each other


and can be used to expose data from a provider such as a
legacy system to a consumer who will then potentially load

-4-
Deploy  Module  7    

that data to their own data store. Here are some patterns
and techniques to consider when dealing with the
requirement to migrate data from one environment to another
in a DevOps implementation.

Remember, the key is to think about how to do the migration


in small portions. Also, think about the application of
similar techniques from Agile like those used in user story
decomposition, for example.

As we stated earlier, treat migration like integration.


Incremental integration will provide much more flexibility
than a bulk migration of data. Merge your ETL or extract,
transform, load capabilities with enterprise integration
patterns to provide smaller transactional data migration
capabilities.

Putting data into non-binary formats will allow for more


flexibility in processing the data. Standard formats like
XML and JSON or JavaScript Object Notation allow for the
data to be represented as objects with attributes which
then allows for the use of open source tools for easier ETL
processing.

Be sure to design into your process the ability to throttle


the flow of data. This will allow for the oversight and
management of how much data is coming in to the system.

-5-
Deploy  Module  7    

One way to do this would be to use a queue manager and


limit the number of process instances that are allowed to
load. This is similar to the actor model pattern, which
was described in the born on the cloud module from our
DevOps Development course.

There are always different ways that large migration tasks


can be decomposed into smaller tasks. You might try
categorizing types of data and prioritizing that data with
the customer.

Maybe certain information, though required, can be done


over a longer stretch of time. You may consider
categorizing by how volatile the data is. Static
information could be migrated early, and very dynamic data
might need to be migrated more synchronously, perhaps by
performing incremental updates as new activity hits the
legacy system.

Finally, always make sure the team addresses data


reconciliation requirements. True, when you process data
in bulk, it is easier to keep record counts and dollar
totals so that business analysts can ensure migration

happened as expected.

So, when you use incremental DevOps techniques, those

-6-
Deploy  Module  7    

incremental techniques also need to include and be applied


to ensure business expectations.

One potential way to do this would be to apply the event


sourcing pattern to manage overall application state, which
was also described in our born on the cloud module in the
DevOps Development course.

Shown here are some tools that you may want to consider for
supporting your data migration needs. IBM's own InfoSphere
Datastage supports high performance parallel
transformation. It also support various databases such as
DB2 and Hadoop. There are also a number of open source
tools that support ETL operations and provide additional
capabilities which might be useful to your project.

We have talked quite a bit about data migration, both for


transactional data and master data. The final area that we
should touch on in this module is the migration of data
objects across environments.

Data objects include but are not limited to tables,


indexes, columns, stored procedures, triggers, views, table
spaces and so much more.

These objects are stored within system catalogs for


database managers such as DB2. They are created and

-7-
Deploy  Module  7    

modified using something called DDL or data definition


language. This DDL is basically a scripting language which
is used to manage these objects within the database
manager.

These DDL scripts can be used to migrate database objects


across instances of DB2. The complexities arise depending
on the type of change that is actually being done. Some
changes to database objects may require that application
data be unloaded and reloaded in to the modified table.

Some changes also require that database objects be dropped


and recreated, which may mean that certain attributes like
table or view authorizations might need to be saved and
then reapplied.

These are just some simple examples of how management of


database objects can become more complex and require more
orchestration than other types of object management.

As with the ETL tools we discussed earlier, there are some


tools which can help with the visualization and management
of database objects with a database manager. DB Visualizer
and Squirrel SQL provide visibility.

Liquibase is an open source tool which can help with the


migration of database objects between systems. Datical is

-8-
Deploy  Module  7    

a tool based on Liquibase that we have been investigating


and trialing to support complex DB2 object migrations.

These tools integrate with UrbanCode Deploy via plug-ins


and may be useful to you and your project. Listed here are
some additional educational references to help you further
your reading and study of this topic.
[END OF SEGMENT]

-9-

You might also like