Deploy
Module 7
Welcome to IBM Agile Academy's DevOps Deployment Video
Education Series. This is Module 7, automated data
migration into production. In this module, we want to talk
about some often forgotten aspects of deployment and how to
address them in DevOps.
We are going to talk about data and database objects and
how migration or deployment of those objects can be
potentially more difficult than artifacts such as code and
are every bit as important to a production application as
code.
Often, new applications that are developed are meant to
either in whole or in part replace existing systems. When
that happens, migration of data from existing systems can
be a critical success factor for a new system.
In this module, we will talk about different approaches and
patterns which can be applied as you think about data
migration. We will also discuss some tools which can be
used in this migration space.
We will also touch on some of the complexities associated
with the migration of database manager objects, how they
are different from code and some tools you might consider
to assist you in this area.
-1-
Deploy Module 7
There are many different types of objects which support an
application, and those need to be considered as we talk
about DevOps and deployment.
In other modules, we have touched on things like the design
of the infrastructure, the actual code that supports the
application, the test cases and the configuration, both
system and application, all of which are needed.
In this module, we will focus on those particular objects
toward the top of the chart. Transactional data is that
data which represents the business objects that the
application supports.
Those objects might be contracts, inventory, billing
records and so on. Those same objects are typically stored
as rows and tables in a database manager, such as DB2. In
newer systems being developed, they may actually be stored
as larger objects in NoSQL type databases such as Cloudant.
Master data represents those objects which help to drive
application behavior. These might be objects like a list
of country codes, ledger accounts, et cetera. These
objects tend to be less volatile than transactional data,
also less in volume. Nevertheless, they are all critical
to application execution.
-2-
Deploy Module 7
And finally, there are database objects. These are the
actual tables, indexes, columns, et cetera that are defined
within the database manager. As thee objects are changed,
those modifications must be migrated as part of any
deployment activities.
Often, the migration of these involves other tasks which
must be done to support the migration. Data needs to be
migrated is rarely ready to be loaded into a new system.
This data may need to be cleansed and transformed first so
that it can be successfully migrated. As an introduction
and overview, this process is typically known as ETL or
extract, transform and load.
A simplified view is shown here. An extract process is run
from a legacy system. That data is transformed in some way
and then it is loaded into the new system. In a waterfall
style implementation, typically ETL processes are done in
bulk because there is probably a large installation of a
brand new application.
To contrast, our mindset now should change to utilizing
Agile techniques in general for smaller increments of work.
We apply this to use an incremental and evolutionary set
of techniques for data migration.
-3-
Deploy Module 7
One of these key evolution techniques is to begin thinking
about migrations as something more like transactional or
discrete integration between systems. We will go into
detail more about this concept soon, but as a way of
introduction, we'll call your attention to this slide,
where we've placed a great website reference available on
enterprise integration patterns.
These may well support your migration needs. For now,
we'll focus on three key patterns. First, a remote
procedure call. This applies the principle of
encapsulation to integrating applications.
If an application needs some information that is owned by
another application, it asks that application directly.
Each application can maintain the integrity of the data
that it owns.
Next is messaging, particularly asynchronous messaging,
which allows for systems to communicate whether they are
both online or not. Using a queue manage such as MQ
Series, will allow for messages to be handled by the
receiving system in a time frame that it wishes.
Finally, APIs allows systems to communicate with each other
and can be used to expose data from a provider such as a
legacy system to a consumer who will then potentially load
-4-
Deploy Module 7
that data to their own data store. Here are some patterns
and techniques to consider when dealing with the
requirement to migrate data from one environment to another
in a DevOps implementation.
Remember, the key is to think about how to do the migration
in small portions. Also, think about the application of
similar techniques from Agile like those used in user story
decomposition, for example.
As we stated earlier, treat migration like integration.
Incremental integration will provide much more flexibility
than a bulk migration of data. Merge your ETL or extract,
transform, load capabilities with enterprise integration
patterns to provide smaller transactional data migration
capabilities.
Putting data into non-binary formats will allow for more
flexibility in processing the data. Standard formats like
XML and JSON or JavaScript Object Notation allow for the
data to be represented as objects with attributes which
then allows for the use of open source tools for easier ETL
processing.
Be sure to design into your process the ability to throttle
the flow of data. This will allow for the oversight and
management of how much data is coming in to the system.
-5-
Deploy Module 7
One way to do this would be to use a queue manager and
limit the number of process instances that are allowed to
load. This is similar to the actor model pattern, which
was described in the born on the cloud module from our
DevOps Development course.
There are always different ways that large migration tasks
can be decomposed into smaller tasks. You might try
categorizing types of data and prioritizing that data with
the customer.
Maybe certain information, though required, can be done
over a longer stretch of time. You may consider
categorizing by how volatile the data is. Static
information could be migrated early, and very dynamic data
might need to be migrated more synchronously, perhaps by
performing incremental updates as new activity hits the
legacy system.
Finally, always make sure the team addresses data
reconciliation requirements. True, when you process data
in bulk, it is easier to keep record counts and dollar
totals so that business analysts can ensure migration
happened as expected.
So, when you use incremental DevOps techniques, those
-6-
Deploy Module 7
incremental techniques also need to include and be applied
to ensure business expectations.
One potential way to do this would be to apply the event
sourcing pattern to manage overall application state, which
was also described in our born on the cloud module in the
DevOps Development course.
Shown here are some tools that you may want to consider for
supporting your data migration needs. IBM's own InfoSphere
Datastage supports high performance parallel
transformation. It also support various databases such as
DB2 and Hadoop. There are also a number of open source
tools that support ETL operations and provide additional
capabilities which might be useful to your project.
We have talked quite a bit about data migration, both for
transactional data and master data. The final area that we
should touch on in this module is the migration of data
objects across environments.
Data objects include but are not limited to tables,
indexes, columns, stored procedures, triggers, views, table
spaces and so much more.
These objects are stored within system catalogs for
database managers such as DB2. They are created and
-7-
Deploy Module 7
modified using something called DDL or data definition
language. This DDL is basically a scripting language which
is used to manage these objects within the database
manager.
These DDL scripts can be used to migrate database objects
across instances of DB2. The complexities arise depending
on the type of change that is actually being done. Some
changes to database objects may require that application
data be unloaded and reloaded in to the modified table.
Some changes also require that database objects be dropped
and recreated, which may mean that certain attributes like
table or view authorizations might need to be saved and
then reapplied.
These are just some simple examples of how management of
database objects can become more complex and require more
orchestration than other types of object management.
As with the ETL tools we discussed earlier, there are some
tools which can help with the visualization and management
of database objects with a database manager. DB Visualizer
and Squirrel SQL provide visibility.
Liquibase is an open source tool which can help with the
migration of database objects between systems. Datical is
-8-
Deploy Module 7
a tool based on Liquibase that we have been investigating
and trialing to support complex DB2 object migrations.
These tools integrate with UrbanCode Deploy via plug-ins
and may be useful to you and your project. Listed here are
some additional educational references to help you further
your reading and study of this topic.
[END OF SEGMENT]
-9-