100% found this document useful (1 vote)

535 views37 pages

Performance Tuning With InfoSphere CDC

The document discusses tuning the performance of IBM InfoSphere CDC replication by identifying bottlenecks, resolving bottlenecks through various methods, and scaling replication. It covers definitions of latency and throughput, identifying bottlenecks using the CDC management console, addressing source-side issues like log reading and parsing, dealing with communications bottlenecks through analyzing network traffic, and advanced performance analysis techniques.

Uploaded by

karthikt27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

535 views37 pages

Performance Tuning With InfoSphere CDC

Uploaded by

karthikt27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

InfoSphere CDC

Latency & Performance Tuning

© 2011 IBM Corporation

Information Management Software

Agenda
• Definitions – Latency vs. Throughput
• Bottleneck identification
• Resolving bottlenecks
• Scalability Options
• Advanced Analysis
Information Management Software

Latency & Throughput - Definitions

Information Management Software

What is Latency?
• Sometimes refered to as “replication lag”
• The amount of time delay between an update applied in the
source database and the same update applied in the backup
target
• The shorter the duration, the lower the latency
• High latency means the target is not up-to-date, but is not
an indication that there are any synchronization issues. The
target may still be accurately synchronized with what the
source was at some point in the (recent) past
Information Management Software

What is Throughput?
• Throughput is the quantity of data processed within a given
period of time
• High volumes of data changes may require high throughput
• If not, the latency may increase if higher volumes than
replication can process have to be replicated
• Latency ≠ Throughput
• Issues with the throughput in high volume environments
typically cause latency to increase, hence the relation
• Note: Tuning for high throughput may be very different than
tuning for low latency
• This document focuses on measuring throughput of the
replication process and the steps to take to improve it
Information Management Software

Performance Tuning Basics

• Performance tuning is an iterative process!
• If one process is optimized and the bottleneck is removed,
another bottleneck can appear further down (or up) the line
• Very seldomly can the tuning be completed with one change
• Performance tuning should be aimed at meeting business
requirements and these should be specific
• Example: Someone may want a minute latency even during a batch
run while the business may not require the batch data until 8 am in the
morning. This means latency during the midnight batch run is
acceptable as long as it is caught up before start of business.
Information Management Software

Bottleneck Identification
Information Management Software

The first step – Identify your Bottleneck(s)

• You may be tempted to immediately work with the system tools to detect system
bottlenecks
• CPU utilization
• Memory utilization & paging
• Disk IO
Not a good idea, rarely identifies the bottleneck (unless extremely obvious)
• First use the CDC Management Console Performance View to identify potential
bottlenecks
• This prevents drawing conclusions based on system activity generated by other applications on
the system
• The internal monitor provide details on pipeline components of the CDC processes
• Your goal is to improve CDC performance, and the only way to do this is remove CDC bottlenecks
• Alternatively, can identify bottlenecks by running tests to isolate CDC components
• You can disable apply to rule out target database bottleneck
• In the java engines you can bypass CDC operations to rule out CDC and target database
bottleneck
• You can set up intra-system replication to rule out communications bottleneck
• The above bypass methods are very useful to quickly determine the area in which
further investigation must occur!!!
Information Management Software

Sanity Check Datastore

• The first step is to determine the
overall health of the datastore
and whether the performance
metrics are being collected at
regular intervals
• Select the source and target
datastore (Datastore - <Datastore
Name>), and monitor the Time
check missed count. If this
number continuously increases,
it indicates CDC is not obtaining
enough resources, either CPU or
memory to perform collection.
This must be addressed prior to
further tuning.
Information Management Software

Datastore Memory
• After checking the Time check missed count, proceed to
investigating the instance memory
• Collect the following statistics for the source datastore:
• Datastore - <datastore name> - Free memory – bytes
• Datastore - <datastore name> - Maximum memory – bytes
• Datastore - <datastore name> - Total memory – bytes
• Log Parser – Disk writes – bytes (per sec)
• Single scrape – Disk writes – bytes (per sec)
Information Management Software

When to Adjust Datastore Memory

• If there is disk writing, if resources are available, increase
the instance memory until writing to disk is not necessary
or it is minimized. After restarting wait some time before
deciding to increase again, as it will take some time to work
through what has already been written to disk.
• If free memory is low, and total memory is constantly close
to maximum memory, increase the instance memory
• Generally, free memory should range between 20%-30% of
the maximum memory. If the free memory is consistently
higher, even during the peak loads, you may consider
reducing the instance memory.
Information Management Software

Healthy Instance Memory

• Zero disk writes and 20-30% free memory
Information Management Software

CDC Management Performance View - Bottlenecks

Information Management Software

Bottlenecks
• The ICDC engine uses a pipeline architecture, the
bottleneck pane displays the component that is not keeping
up with the downward stream ones
• The 5 components that can be a bottleneck include:
• Log Reader
• Source Engine
• Communications
• Target Engine
• Target Database
• The actions taken to address a performance issue depend
on which component is the predominant bottleneck
Information Management Software

Steady State Operation - Performance Tuning

• Use bottlenecks to help select the statistics that are suited
to investigating a performance issue
Information Management Software

Identifying Busy Tables

• The performance view can display the tables with the
highest activity, which can aid in determining which tables
deserve particular attention, or possibly which should be
moved into a separate subscription

View displays the 10 busiest

tables by data volume
Information Management Software

Table Activity Breakdown by Table

• You can select one of the busy
tables and drill down by
selecting specific metrics to
track, such as types of DML ,to
potentially tune the database for
the specific type of data pattern
Information Management Software

Performance Tuning – Addressing Source Side Bottlenecks

• The 3 main areas to focus on are reading the log, parsing and derived columns. Leaving
tracing enabled will decrease performance drastically, ensure it is off before addressing
performance.
• Log reading is usually constrained by disk IO speed and the time it takes to read the logs.
NFS mounts have been observed to hinder performance. If the StagingStore directory is
growing then consider increasing the amount of memory available to the instance.
• Log parsing is usually CPU constrained especially when there are wide tables. Row
selection may also be CPU intensive, consider moving filtering to the target with user exit
or simplifying the row selection expression.
• Derived columns may be moved to the target in some cases or made more efficient.
• An increase in the number of subscriptions can increase the throughput but will require
additional resources.
• LOB type objects are retrieved from the database and can slow down source processing
and increase impact on source, determine if they can be removed from scope
• Oracle online redo logs can be a source of physical contention when both Oracle and CDC
are simultaneously accessing the files
• File system caching allows the log reader to process logs without affecting the physical I/O
Information Management Software

Performance Tuning – Addressing Communications

Bottlenecks
• A communications bottleneck may be perceived as a CDC
target issue
• No or few transactions applied per second
• When setting up intra-system replication, the throughput is acceptable
and subscriptions are keeping up, yet the same subscription setup to
another system is slow
• One cause of transaction latency is the lack of sufficient
bandwidth in the network
• Often transaction latency due to network limitations is
perceived as CDC not utilizing the network effectively
• This is not true most of the time
• Almost all communication issues can be brought back to
configurations in the environment
• CDC is limited by the amount of traffic generated at the source
(transactions) versus the bandwidth available for replication
Information Management Software

Analyzing Communications Bottlenecks – 1

• Definitions:
• MTU (Maximum Transmission Unit) is the size of the largest packet that can be sent over a
network
• If a message is larger than the MTU, it must be fragmented into multiple smaller packets
• In TCP/IP network status (netstat), you can see the number of bytes flowing
in and out of your system
• # netstat -i gives you information about network traffic across network interfaces
• Specifically, watch the incoming packets and outgoing packets
• By taking snapshots with intervals, you’ll be able to estimate the amount of bytes sent across per
second:
• # of bits per second = (number of packets * MTU)/interval in seconds
• Also, be on the outlook for incoming errors and outgoing errors
• If there are errors in the transmission, re-transmission of TCP packets will occur, this can slow
down traffic significantly
• We have seen cases where relatively few re-transmission of TCP packets have a large impact
on the actual bandwidth available
• The larger the packet size, the more “costly” the retransmissions
• Re-transmissions are most of the time caused by network devices not operating in sync
(combination of full/half duplex devices across the route, different TCP/IP buffer sizes)
• There are also other more sophisticated network administrator tools
(sniffers, network statistics) which may help you analyze communications
throughput in a more granular way
Information Management Software

Analyzing Communications Bottlenecks – 2

• A straightforward method to further investigate CDC
communications limitations is to use the same route to FTP a large
file
• A simple calculation can determine the available bandwidth in your
communication network
• Formula to calculate transmission speed in bits per second:
(file_size_in_bytes*8)/(number_of_seconds_transmission_took)
• Bandwidth in Mbps is the outcome of this formula divided by (1,000,000)
• Example: If it took 25 minutes to transfer a file of 10,237,194,382 bytes, the bandwidth is:
(10,237,194,382*8)/(25*60) = 54,598,370 bits per second (~54.6 Mbps)
• FTP is generally considered a trusted protocol for proving the
throughput of the network
• If FTP cannot utilize the expected available bandwidth, neither can CDC
• Do not be tempted to use “ping” to measure throughput, only
response times and round trip times for small packets are measured,
not high volume throughput
Information Management Software

Performance Tuning – Addressing Target Side Bottlenecks

• Determine if a specific table, particularly from the busy ones, has slow
operations by viewing the table specific performance statistics, (Target
Apply slow DB operations – count (per sec))
Information Management Software

Performance Tuning – Addressing Target Side Bottlenecks

• If a table has been identified as slow by the performance

monitor, check the database tuning of the target table. Check if
the table has a unique index, fragmentation of the table or index,
when last statistics were collected, or if the table has a large
number of indices.
• Remove as many indexes as possible on CDC destination tables
• Splitting tables into more subscriptions will increase the number
of applies and can increase the throughput when there are many
busy tables.
• When there are only a few tables, and many operations of the
same type, increasing global_max_batch_size can result in CDC
grouping a larger number of operations and increasing
throughput.
Information Management Software

Interpreting Oracle Database Traces (raw trace)

• In order to enable database tracing of CDC database sessions enable the following
parameters:
• Source Scrape: mirror_scrape_db_trace = true (to enable, false by default)
• Target Apply: mirror_apply_db_trace = true (to enable, false by default)
• Oracle monitoring capabilities can be used to determine if there is database tuning that
could improve the performance of the apply
• Demonstrates how Oracle handles the operations
• First parse (PARSE), then execute (EXEC)
• Identifies how long an individual operation/transaction takes
• Look for “e=nnnnnn” data in the trace
• This gives you an idea how long an operation took (in microseconds)
• Is this expected?
• Does the operation take the same amount of time at the source?
• CDC cannot apply faster than the database can process
• Identifies wait times
• Logged when Oracle waits for:
• More information from the client (SQL*Net message from client)
• In this case, CDC is the client
• Updates of an index
• Latch free
• Disk
• Data spread un-evenly, or spread across too few disks is a common problem
Information Management Software

Condensing the Raw Trace

• Raw traces can be tedious to go through
• If you want to identify the operations which consume most
of the time, use the Oracle tkprof utility
• Syntax
• # tkprof <TraceFile> <OutputFile>
• The default for this command is typically what you will want
to see
• Statement which duration was the longest is on top of the list
• Shows how many times the statement was executed, how
much time was consumed for parsing the statement, and how
much time for executing
• All statements sharing the (exact) same format (same columns
updated, same where clause) are grouped together
Information Management Software

Interpreting Oracle Database Traces (condensed trace)

UPDATE "ERP"."CUSTOMER" SET "ID" = :1, "ID_MINOR" = :2, "OPENDAY" = :3, "PRODAY" = :4, "UNSEENOPCNT" = :5, "OPNO" = :6, "OPDAY"
= :7, "OPTRANSDAY" = :8, "BALANCE" = :9, "LASTVISITDAY" = :10, "PAYROLLDAY" = :11, "EXPIRATIONDAY" = :12, "CAPYEAR" = :13,
"PU9DAY" = :14, "PU9CASH" = :15, "LASTOPNO" = :16, "CAPDAY" = :17, "SBOOKNO" = :18, "OPROW_MINOR" = :19, "OPROW" = :20,
"ASSIGNDAY" = :21 WHERE "ID" = :22 AND "ID_MINOR" = :23

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 31918 0.33 0.39 0 0 0 0
Execute 31918 11.38 401.82 25230 114508 521516 66882
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 63836 11.71 402.21 25230 114508 521516 66882

Misses in library cache during parse: 1

Optimizer mode: CHOOSE
Parsing user id: 539

Rows Row Source Operation

------- ---------------------------------------------------
1 UPDATE
1 PARTITION HASH SINGLE PARTITION: KEY KEY
1 INDEX UNIQUE SCAN PKDEPOSIT PARTITION: KEY KEY (object id 34496)

********************************************************************************

What the above means:

It took 401.82 seconds to do 31918 updates, 66882 rows were affected by the updates
-It took roughly 6 milliseconds to update 1 row
-CDC applied arraying in this statement (# of rows affected > # of Executes)
Information Management Software

Considering Alternative Architectures

Information Management Software

Other Scalability Options

• Sometimes a standard configuration of CDC will not scale
well enough
• Multiple parallel subscriptions may be considered
• A single table can also be handled by multiple parallel subscriptions
• Employ row filtering to split up the operations in ranges
• Java user exit to determine modulo of numeric column
• Caveat: does not work if the filtering column is updated!!!
• Or, even multiple CDC instances to spread the parsing
workload
• Do not spread the workload of a single table across CDC instances
(will have a counter-productive effect)
• An N-tier architecture can also sometimes help to improve
scalability (especially when working with older hardware)
Information Management Software

Standard Architecture
Log reader
Subscription process

1 subscription
Source
Source Target
Target
CDC
CDC Instance
Instance CDC
CDC Instance
Instance

Redo/archive
logs

Simplest configuration to maintain, and most efficient use of resources

One source instance reads and parses the Oracle database log entries
Instance uses one connection to send to the target
Subscription applies transactions using one database session on the target

29
Information Management Software

Spreading Workload Among Subscriptions when Apply is

Bottleneck
Log reader
Subscription process

Multiple subscriptions

Source
Source Target
Target
CDC
CDC Instance
Instance CDC
CDC Instance
Instance

Redo/archive
logs

Multiple subscriptions allow each subscription to have it’s own database

connection, and apply data in parallel.
Minimal impact on source when subscriptions share the staging store and one
log reader and parser.
If subscriptions cannot share store, there may be times with mutliple readers
and parsers increasing parallellism on the source as well.
Each subscription sends data through a separate network connection and
applies with a dedicated database session on the target

30
Information Management Software

Spreading Workload Among Instances when Source is

Bottleneck
Log reader
Subscription process

Source
Single subscription per instance
Source
CDC
CDC Instance
Instance

Target
Target
CDC
CDC Instance
Instance

Source
Source
CDC
CDC Instance
Instance

Redo/archive
logs

Adding an instance ensures source subscriptions from the two instances reads
logs in parallel
Each source instance reads entire log, but discards out-of-scope entries prior to
parsing
Each subscription sends data through a separate network connection and
applies with a dedicated database session on the target

31
Information Management Software

Spreading Workload Among Instances and Subscriptions

Log reader
Subscription process
Source
Source
Multiple instances,
CDC
CDC Instance
Instance multiple subscriptions

Target
Target
CDC
CDC Instance
Instance

Source
Source
CDC
CDC Instance
Instance
Redo/archive
logs

After tuning for source bottleneck and then for target one, or vice-versa, the
eventual configuration may have multiple instances with many subscriptions
each
Ensures subscriptions from the multiple instances read entire log and parse only
in-scope data in parallel
Multiple subscriptoins sending to target and applying data with dedicated
database sessions
32
Information Management Software

Growth Strategy
Increasing
Increasing number
number of
of sessions
sessions on
on the
the target
target database
database (employ
(employ database
database scalability)
scalability)

1 Instance 1 Instance 1 Instance 1 Instance

Increasing
Increasing parsing

1 Subscription 2 Subscriptions 3 Subscriptions n Subscriptions

parsing capacity

2 Instances 2 Instances 2 Instances 2 Instances

capacity (more

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

Instance per Instance per Instance per Instance
(more tables

3 Instances 3 Instances 3 Instances 3 Instances

tables in

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

in scope)

Instance per Instance per Instance per Instance

scope)

4 Instances 4 Instances 4 Instances 4 Instances

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

Instance per Instance per Instance per Instance

Every instance allocates a minimum amount of memory

Every instance will read the database logs

33
Information Management Software

Remote Configuration with Oracle Source

1. Scraper on source, Apply on target 2. Scraper and Apply on target *
Source Server Target Server Source Server Target Server
Metadata Metadata
Metadata Metadata
Source Tables Scraper Apply Scraper
Target Tables Source Tables
Target Tables
redolog redolog Apply

3. Scraper and Apply on CDC server * 4. Scraper and Apply on source

Source Server CDC Server Target Server Source Server Target Server
Metadata Metadata
Scraper Metadata Metadata
Source Tables Source Tables Scraper Apply
Target Tables Target Tables

redolog Apply redolog

* For configurations 2 & 3, optimal using shared SAN for logs

Information Management Software

Advantages of N-tier Architecture

• CDC engine running on higher class hardware
• Faster processors
• More memory
• Faster disks
• Avoid limitations of the customer’s production hardware
• Customer may have relatively slow processors installed (for example HP PA-RISC)
• There may not be sufficient memory to house a CDC instance
• Customer may have concerns about CDC running in their production environment
• Advantages
• Reduced impact on production hardware
• Leverage IBM hardware to improve project
• Limitations
• N-tier currently only supported for DB2 LUW and Oracle databases
• If Source database server hardware and CDC server hardware differ, configure system
parameter to inform CDC that the source logs come from a specific platform
• Endianness of operating systems must be the same (AIX server can process logs from AIX,
HP-UX and Solaris servers, but not from Linux or Windows servers)
Information Management Software

Statistics collection for Advanced Analysis

• In CDC 6.5, performance statistics are collected by default
• If detailed performance analysis is required, use dmsupportinfo to gather the statistics,
and attach to a PMR
• The following parameters can also be set to adjust how statistics are gathered:
• stats_collect [True] Controls if statistics are logged to CSV files. It is a dynamic parameter and will take affect after a
STATS_INTERVAL. Default is true.
• stats_interval [30] Determines approximately how often statistics are collected. Value is in seconds. Default is 30
seconds, minimum is 1 second. Generally you do not need to set this value lower unless trying to do focused
analysis over a small interval (one hour or less).
• stats_max_file_length_kbytes [not set] Maximum disk size for a single CSV file. A new CSV file will be created per
source or target subscription when the current file exceeds this size, or the number of row in the file exceeds tool limit
(65535)
• stats_max_dir_space_mbytes [10] Determines how much directory space to use for statistics per subscription
Default 10 Mbytes.
• stats_subscriptions [<not set>] Comma separated list of subscriptions to log. If empty, logs all subscriptions.
Defaults to empty. When logging a target subscription, please ensure the source ID is used, which may not be the
same as the subscription name. This name can be changed by the user and must not be longer than 8 characters in
length.
• stats_log_directory [<install.dir>/log] Determines the location where the log files will be generated. The default is
[install.dir]/log. The path must be an absolute path, and cannot contain any environment variables. If the path does
not exists, CDC will attempt to create it.
• stats_file_label [<not set>] A user-specified label that can be applied to statistics file names The default is unset.
Changing the file label changes the file name. This provides a mechanism for statistics users to force the creation of
new files and the retention of old files.
• stats_user_label [<not set>] The value to put in the user label property of the CSV file. This property allows users
to inject a specific value into their CSV files for their analysis.
Information Management Software

Questions?

Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Suzuki Swift Diagrama PDF
70% (10)
Suzuki Swift Diagrama PDF
18 pages
Informatica Intelligent Cloud Services (IICS) – Part 1_ Architecture and Services Overview - ClearPeaks Blog
No ratings yet
Informatica Intelligent Cloud Services (IICS) – Part 1_ Architecture and Services Overview - ClearPeaks Blog
13 pages
F-016-Static Equipment Installation Checklist
100% (3)
F-016-Static Equipment Installation Checklist
3 pages
Exam C1000 - 085 IBM Netezza Performance Server V11.x Administrator
No ratings yet
Exam C1000 - 085 IBM Netezza Performance Server V11.x Administrator
3 pages
EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
WhatSNewAndChanged en
No ratings yet
WhatSNewAndChanged en
402 pages
User Manual - IIDR CHCCLP Scripting 1 © 2015 IBM Corporation
100% (1)
User Manual - IIDR CHCCLP Scripting 1 © 2015 IBM Corporation
24 pages
Netezza Oracle Configuration in Datastage
No ratings yet
Netezza Oracle Configuration in Datastage
8 pages
Informatica Powermart / Powercenter 6.X Upgrade Features: Ted Williams
No ratings yet
Informatica Powermart / Powercenter 6.X Upgrade Features: Ted Williams
53 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
TIBCO Software The Ultimate Step-By-Step Guide
From Everand
TIBCO Software The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
InfoSphere CDC For Oracle Configurations
No ratings yet
InfoSphere CDC For Oracle Configurations
10 pages
New - Datastage Architecture
No ratings yet
New - Datastage Architecture
5 pages
SCIM Configuration in IICS Admin
No ratings yet
SCIM Configuration in IICS Admin
16 pages
Axon Facet Relationships 7 0
No ratings yet
Axon Facet Relationships 7 0
25 pages
PAM For Informatica Platform v10.4.1
No ratings yet
PAM For Informatica Platform v10.4.1
248 pages
Installing Tivoli System Automation For High Availability of DB2 UDB BCU On AIX Redp4254
No ratings yet
Installing Tivoli System Automation For High Availability of DB2 UDB BCU On AIX Redp4254
14 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Informatica MDM Course Contents
No ratings yet
Informatica MDM Course Contents
7 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
IDQ Learning
No ratings yet
IDQ Learning
33 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Data Modeler Release Notes
No ratings yet
Data Modeler Release Notes
81 pages
IDQ Functionality Imp
No ratings yet
IDQ Functionality Imp
7 pages
Everything You Need To Know About PostgreSQL EXPLAIN
No ratings yet
Everything You Need To Know About PostgreSQL EXPLAIN
44 pages
Delphix White Paper Database Virtualization and Instant Cloning
No ratings yet
Delphix White Paper Database Virtualization and Instant Cloning
11 pages
IBM Netezza Appliance Models (Courtesy: WWW - Etraining.guru)
No ratings yet
IBM Netezza Appliance Models (Courtesy: WWW - Etraining.guru)
25 pages
Oracle - realExamQuestions.1Z0 238.v2011!11!09.by - Keron
0% (1)
Oracle - realExamQuestions.1Z0 238.v2011!11!09.by - Keron
79 pages
DataStage Matter
No ratings yet
DataStage Matter
81 pages
Day 11 Datastage
No ratings yet
Day 11 Datastage
489 pages
IDQ Reference
No ratings yet
IDQ Reference
31 pages
h8310 Deploying Pentaho Data Integration Dia
No ratings yet
h8310 Deploying Pentaho Data Integration Dia
29 pages
CDC Presentation
No ratings yet
CDC Presentation
19 pages
NDMO_Data and Privacy Regulatory Sandbox_Guideline
No ratings yet
NDMO_Data and Privacy Regulatory Sandbox_Guideline
13 pages
SQL Replication Basic
No ratings yet
SQL Replication Basic
22 pages
BigQueryTechnicalWP PDF
No ratings yet
BigQueryTechnicalWP PDF
12 pages
Oracle Netapp Best Practices
No ratings yet
Oracle Netapp Best Practices
47 pages
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
No ratings yet
How To Set Up REPMGR With WITNESS For PostgreSQL 10 Official Pythian®® Blog
5 pages
Front Cover: Ibm Infosphere Information Server Administration V9.1
No ratings yet
Front Cover: Ibm Infosphere Information Server Administration V9.1
560 pages
Datastage Architecture
No ratings yet
Datastage Architecture
4 pages
Certificación 11.3
100% (1)
Certificación 11.3
16 pages
UBS OCF - IDQ Capabilities Review
No ratings yet
UBS OCF - IDQ Capabilities Review
15 pages
MIE1628 Big Data Analytics Lecture8
No ratings yet
MIE1628 Big Data Analytics Lecture8
82 pages
White Paper - What Is DataStage
No ratings yet
White Paper - What Is DataStage
10 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
PAM For Informatica Platform v10 5 4
No ratings yet
PAM For Informatica Platform v10 5 4
237 pages
IBM InfoSphere DataStage and QualityStage Version 11 Release 3 Designer Client Guide
No ratings yet
IBM InfoSphere DataStage and QualityStage Version 11 Release 3 Designer Client Guide
279 pages
MDM 4
No ratings yet
MDM 4
159 pages
Flash Recovery Area - Space Management Warning and Alerts
No ratings yet
Flash Recovery Area - Space Management Warning and Alerts
4 pages
Section 1 - Design & Performance For Netezza Migration To Azure Synapse
No ratings yet
Section 1 - Design & Performance For Netezza Migration To Azure Synapse
14 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
Powerbi Microsoft
No ratings yet
Powerbi Microsoft
990 pages
Amazon EC2 Autoscaling
No ratings yet
Amazon EC2 Autoscaling
212 pages
AmazonCloudFront DevGuide
No ratings yet
AmazonCloudFront DevGuide
493 pages
SQL DW
No ratings yet
SQL DW
596 pages
Performance Tuning in Teradata: © October 2015 - IJIRT - Volume 2 Issue 5 - ISSN: 2349-6002
No ratings yet
Performance Tuning in Teradata: © October 2015 - IJIRT - Volume 2 Issue 5 - ISSN: 2349-6002
6 pages
IIDR Lineage 2016 Apr 06
No ratings yet
IIDR Lineage 2016 Apr 06
26 pages
How To Set Up SSH Tunneling With InfoSphere CDC PDF
No ratings yet
How To Set Up SSH Tunneling With InfoSphere CDC PDF
17 pages
2016 10 English Communicative Sample Paper Sa2 01
No ratings yet
2016 10 English Communicative Sample Paper Sa2 01
5 pages
(Answers) : CBSE Sample Paper-02 Summative Assessment - Ii Class - X
No ratings yet
(Answers) : CBSE Sample Paper-02 Summative Assessment - Ii Class - X
6 pages
Failed to prepare guest Veeam Guest Agent is not started - Virtualization Howto
No ratings yet
Failed to prepare guest Veeam Guest Agent is not started - Virtualization Howto
4 pages
01 4ib 66000B PDF
No ratings yet
01 4ib 66000B PDF
76 pages
Photo To Pencil Sketch
100% (1)
Photo To Pencil Sketch
5 pages
NAHAD Design Handbook Sample
100% (1)
NAHAD Design Handbook Sample
10 pages
Silk Road Trial - Day 6 Transcript
100% (1)
Silk Road Trial - Day 6 Transcript
216 pages
MEAM 609 - Final Exam AY 2019 20 1 Trim PDF
No ratings yet
MEAM 609 - Final Exam AY 2019 20 1 Trim PDF
2 pages
Fabric Utilization
No ratings yet
Fabric Utilization
8 pages
Temperature Sensors: Sensytemp Tsp311, Tsp321, Tsp331
No ratings yet
Temperature Sensors: Sensytemp Tsp311, Tsp321, Tsp331
40 pages
Avantco Equipment ff400 Nat Specsheet
No ratings yet
Avantco Equipment ff400 Nat Specsheet
2 pages
ISYS90049 S1 - 2018 A1 Group 54
No ratings yet
ISYS90049 S1 - 2018 A1 Group 54
18 pages
Beam End Connection Using Clip Angles Per AISC 9th Edition (ASD)
No ratings yet
Beam End Connection Using Clip Angles Per AISC 9th Edition (ASD)
36 pages
MOP For RET Configuration in 3G NodeB
No ratings yet
MOP For RET Configuration in 3G NodeB
10 pages
Task 1 and task 2 fixed
No ratings yet
Task 1 and task 2 fixed
2 pages
Invoice
No ratings yet
Invoice
1 page
Mancom Sep 29
No ratings yet
Mancom Sep 29
8 pages
Lista de Pret Skoda Superb Combi
No ratings yet
Lista de Pret Skoda Superb Combi
13 pages
Rugeles
0% (1)
Rugeles
5 pages
FieldMate Update r4
No ratings yet
FieldMate Update r4
26 pages
Design Drawings Status
No ratings yet
Design Drawings Status
2 pages
Bahria University, Islamabad Campus: Department of Computer Science
No ratings yet
Bahria University, Islamabad Campus: Department of Computer Science
1 page
Bearing_1
No ratings yet
Bearing_1
248 pages
14bt3bs03-Probability and Statistics PDF
No ratings yet
14bt3bs03-Probability and Statistics PDF
2 pages
Mini Laser CNC 3D Printed – Yaowiz.com
No ratings yet
Mini Laser CNC 3D Printed – Yaowiz.com
3 pages
DTP VTP
No ratings yet
DTP VTP
24 pages
Painel Aud A3
No ratings yet
Painel Aud A3
4 pages
Art Quarter3 UnitTest
No ratings yet
Art Quarter3 UnitTest
2 pages
Lista Tintas Compatible Ays 2021
No ratings yet
Lista Tintas Compatible Ays 2021
1 page
A4-P 1.0 en
No ratings yet
A4-P 1.0 en
30 pages

Performance Tuning With InfoSphere CDC

Uploaded by

Performance Tuning With InfoSphere CDC

Uploaded by

InfoSphere CDC

Latency & Performance Tuning

© 2011 IBM Corporation

Latency & Throughput - Definitions

Performance Tuning Basics

The first step – Identify your Bottleneck(s)

Sanity Check Datastore

When to Adjust Datastore Memory

Healthy Instance Memory

CDC Management Performance View - Bottlenecks

Steady State Operation - Performance Tuning

Identifying Busy Tables

View displays the 10 busiest

Table Activity Breakdown by Table

Performance Tuning – Addressing Source Side Bottlenecks

Performance Tuning – Addressing Communications

Analyzing Communications Bottlenecks – 1

Analyzing Communications Bottlenecks – 2

Performance Tuning – Addressing Target Side Bottlenecks

Performance Tuning – Addressing Target Side Bottlenecks

• If a table has been identified as slow by the performance

Interpreting Oracle Database Traces (raw trace)

Condensing the Raw Trace

Interpreting Oracle Database Traces (condensed trace)

call count cpu elapsed disk query current rows

Misses in library cache during parse: 1

Rows Row Source Operation

What the above means:

Considering Alternative Architectures

Other Scalability Options

Simplest configuration to maintain, and most efficient use of resources

Spreading Workload Among Subscriptions when Apply is

Multiple subscriptions allow each subscription to have it’s own database

Spreading Workload Among Instances when Source is

Spreading Workload Among Instances and Subscriptions

1 Instance 1 Instance 1 Instance 1 Instance

1 Subscription 2 Subscriptions 3 Subscriptions n Subscriptions

2 Instances 2 Instances 2 Instances 2 Instances

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

3 Instances 3 Instances 3 Instances 3 Instances

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

Instance per Instance per Instance per Instance

4 Instances 4 Instances 4 Instances 4 Instances

1 Subscription per 2 Subscriptions 3 Subscriptions n Subscriptions

Every instance allocates a minimum amount of memory

Remote Configuration with Oracle Source

3. Scraper and Apply on CDC server * 4. Scraper and Apply on source

redolog Apply redolog

* For configurations 2 & 3, optimal using shared SAN for logs

Advantages of N-tier Architecture

Statistics collection for Advanced Analysis

You might also like