Performance Tuning With InfoSphere CDC
Performance Tuning With InfoSphere CDC
Agenda
• Definitions – Latency vs. Throughput
• Bottleneck identification
• Resolving bottlenecks
• Scalability Options
• Advanced Analysis
Information Management Software
What is Latency?
• Sometimes refered to as “replication lag”
• The amount of time delay between an update applied in the
source database and the same update applied in the backup
target
• The shorter the duration, the lower the latency
• High latency means the target is not up-to-date, but is not
an indication that there are any synchronization issues. The
target may still be accurately synchronized with what the
source was at some point in the (recent) past
Information Management Software
What is Throughput?
• Throughput is the quantity of data processed within a given
period of time
• High volumes of data changes may require high throughput
• If not, the latency may increase if higher volumes than
replication can process have to be replicated
• Latency ≠ Throughput
• Issues with the throughput in high volume environments
typically cause latency to increase, hence the relation
• Note: Tuning for high throughput may be very different than
tuning for low latency
• This document focuses on measuring throughput of the
replication process and the steps to take to improve it
Information Management Software
Bottleneck Identification
Information Management Software
Datastore Memory
• After checking the Time check missed count, proceed to
investigating the instance memory
• Collect the following statistics for the source datastore:
• Datastore - <datastore name> - Free memory – bytes
• Datastore - <datastore name> - Maximum memory – bytes
• Datastore - <datastore name> - Total memory – bytes
• Log Parser – Disk writes – bytes (per sec)
• Single scrape – Disk writes – bytes (per sec)
Information Management Software
Bottlenecks
• The ICDC engine uses a pipeline architecture, the
bottleneck pane displays the component that is not keeping
up with the downward stream ones
• The 5 components that can be a bottleneck include:
• Log Reader
• Source Engine
• Communications
• Target Engine
• Target Database
• The actions taken to address a performance issue depend
on which component is the predominant bottleneck
Information Management Software
********************************************************************************
Standard Architecture
Log reader
Subscription process
1 subscription
Source
Source Target
Target
CDC
CDC Instance
Instance CDC
CDC Instance
Instance
Redo/archive
logs
29
Information Management Software
Multiple subscriptions
Source
Source Target
Target
CDC
CDC Instance
Instance CDC
CDC Instance
Instance
Redo/archive
logs
30
Information Management Software
Source
Single subscription per instance
Source
CDC
CDC Instance
Instance
Target
Target
CDC
CDC Instance
Instance
Source
Source
CDC
CDC Instance
Instance
Redo/archive
logs
Adding an instance ensures source subscriptions from the two instances reads
logs in parallel
Each source instance reads entire log, but discards out-of-scope entries prior to
parsing
Each subscription sends data through a separate network connection and
applies with a dedicated database session on the target
31
Information Management Software
Target
Target
CDC
CDC Instance
Instance
Source
Source
CDC
CDC Instance
Instance
Redo/archive
logs
After tuning for source bottleneck and then for target one, or vice-versa, the
eventual configuration may have multiple instances with many subscriptions
each
Ensures subscriptions from the multiple instances read entire log and parse only
in-scope data in parallel
Multiple subscriptoins sending to target and applying data with dedicated
database sessions
32
Information Management Software
Growth Strategy
Increasing
Increasing number
number of
of sessions
sessions on
on the
the target
target database
database (employ
(employ database
database scalability)
scalability)
33
Information Management Software
Questions?