0% found this document useful (0 votes)

12 views

Study Guide for DataStage Certification

This study guide provides a comprehensive overview for preparing for the DataStage 7.5 Certification exam, including essential sections on installation, parallel architecture, metadata, databases, data transformation, and job design. It emphasizes practical exercises and recommended readings for each topic, along with tips for understanding key concepts and environment variables. The guide also includes links to additional resources for further study and preparation.

Uploaded by

swadhin.bohidar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Study Guide for DataStage Certification

Uploaded by

swadhin.bohidar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Study Guide for DataStage Certification

Vincent McBurney (Deloitte Manager)

This entry is a comprehensive guide to preparing for the DataStage 7.5 Certification exam.

This page can be reached via the shortened URL tinyurl.com/dsexam. All the
details on the DataStage 8 Certification exam can be found at
tinyurl.com/ds8exam.

Regular readers may be feeling a sense of de ja vu. Haven’t we seen this post before? I
originally posted this in 2006 and this, the Directors Cut - I’ve added some deleted scenes, a
commentary for DataStage 7.0 and 8.0 users and generally improved the entry. By
reposting I retain the links from other sites such as my DataStage Certification Squidoo lens
with links to my certification blog entries and IBM certification pages.

This post shows all of the headings from the IBM exam Objectives and describes how to
prepare for that section.

Before you start read work out how you add Environment variables to a job as a job
parameter as they are handy for exercises and testing. See the DataStage Designer Guide
for details.

Section 1 - Installation and Configuration

Versions: Version 7.5.1 and 7.5.2 are the best to study and run exercises on. Version 6.x is
risky but mostly the same as 7. Version 8.0 is no good for any type of installation and
configuration preparation as it has a new approach to installation and user security.

Reading: Read the Installation and Upgrade Guide for DataStage, especially the section on
parallel installation. Read the pre-requisites for each type of install such as users and
groups, the compiler, project locations, kernel settings for each platform. Make sure you
know what goes into the dsenv file. Read the section on DataStage for USS as you might get
one USS question. Do a search for threads on dsenv on the dsxchange forum to become
familiar with how this file is used in different production environments.

Exercise: installing your own DataStage Server Enterprise Edition is the best exercise -
getting it to connect to Oracle, DB2 and SQL Server is also beneficial. Run the DataStage
Administrator and create some users and roles and give them access to DataStage
functions.

Section 4 - Parallel Architecture (10%)

Section 9 - Monitoring and Troubleshooting (10%)

I’ve move section 4 and 9 up to the front as you need to study this before you run exercises
and read about parallel stages in the other sections. Understanding how to use and monitor
parallel jobs is worth a whopping 20% so it’s a good one to know well.
Versions: you can study this using DataStage 6, 7 and 8. Version 8 has the best definition of
the parallel architecture with better diagrams.

Reading: Parallel Job Developers Guide opening chapters on what the parallel engine and job
partitioning is all about. Read about each partitioning type. Read how sequential file stages
partition or repartition data and why datasets don’t. The Parallel Job Advanced Developers
Guide has sections on environment variables to help with job monitoring, read about every
parameter with the word SCORE in it. The DataStage Director Guide describes how to run job
monitoring - use the right mouse click menu on the job monitor window to see extra parallel
information.

Exercises: Turn on various monitoring environment variables such as APT_PM_SHOW_PIDS

and APT_DUMP_SCORE so you can see what happens during your exercises. It shows you
what really runs in a job - the extra processes that get added across parallel nodes.

Try creating one node, two node and four node config files and see how jobs behave under
each one. Try the remaining exercises on a couple different configurations by adding the
configuration environment variable to the job. Try some pooling options. I got to admit I
guessed my way through some of the pooling questions as I didn’t do many exercises.

Generate a set of rows into a sequential file for testing out various partitioning types. One
column with unique ids 1 to 100 and a second column with repeating codes such as A, A, A,
A, A, B, B, B, B, B etc. Write a job that reads from the input, sends it through a partitioning
stage such as a transformer and writes it to a peek stage. The Director logs shows which
rows went where. You should also view the Director monitor and expand and show the row
counts on each instance of each stage in the job to see how stages are split and run on each
node and how many rows each instance gets.

Use a filter stage to split the rows down two paths and bring them back together with a
funnel stage, then replace the funnel with a collector stage. Compare the two.

Test yourself on estimating how many processes will be created by a job and check the
result after the job has run using the Director monitor or log messages. Do this throughout
all your exercises across all sections as a habit.

Section 2 - Metadata

Section 3 - Persistant Storage

I’ve merged these into one. Both sections talk about sequential files, datasets, XML files and
Cobol files.

Versions: you can study this on DataStage 6, 7 or 8 as it is a narrow focus on DataStage

Designer metadata. DataStage 8 will have slightly different CFF options but not enough to
cause a problem.
Reading: read the section in the DataStage Developers Guide on Orchestrate schemas and
partial schemas. Read the plugin guide for the Complex Flat File Stage to understand how
Cobol metadata is imported (if you don’t have any cobol copybooks around you will just
have to read about them and not do any exercises). Quickly scan through the NLS guide -
but don’t expect any hard questions on this.

Exercises: Step through the IBM XML tutorial to get the tricky part on reading XML files. Find
an XML file and do various exercises reading it and writing it to a sequential file. Switch
between different key fields to see the impact of the key on the flattening of the XML
hierarchy. Don’t worry too much about XML Transform.

Import from a database using the Orchestrate Import and the Plugin Import and compare
the table definitions. Run an exercise on column propagation using the Peek stage where a
partial schema is written to the Peek stage to reveal the propagated columns.

Create a job using the Row Generator stage. Define some columns, on the columns tab
doubleclick on a column to bring up the advanced column properties. Use some properties
to generate values for different data types. Get to know the advanced properties page.

Create a really large sequential file, dataset and fileset and use each as a reference in a
lookup stage. Monitor the Resource and Scratch directories as the job runs to see how these
lookup sources are prepared prior to and during a job run. Get to know the difference
between the lookup fileset and other sources for a lookup stage.

Section 5 - Databases (15%)

One of the more difficult topics if you get questions for a database you are not familiar with.
I got one database parallel connectivity question that I still can’t find the answer to in any of
the manuals.

Versions: DataStage 7.x any version or at a pinch DataStage 8. Earlier versions do not have
enough database stages and DataStage 8 has a new approach to database connections.

Reading: read the plugin guide for each enterprise database stage: Oracle, SQL Server, DB2
and ODBC. In version 8 read the improved Connectivity Guides for these targets. If you have
time you can dig deeper, the Parallel Job Developers Guide and/or the Advanced Developers
Guide has a section on the Oracle/DB2/Informix/Teradata/Sybase/SQL Server interface
libraries. Look for the section called "Operator action" and read it for each stage. It’s got
interesting bits like whether the stage can run in parallel, how it converts data and handles
record sizes.

Exercise: Add each Enterprise database stage to a parallel job as both an input and output
stage. Go in and fiddle around with all the different types of read and write options. You
don’t need to get a connection working or have access to that database, you just need to
have the stage installed and add it to your job. Look at the differences between
insert/update/add/load etc. Look at the different options for each database stage. If you
have time and a database try some loads to a database table.
Section 6 - Data Transformation (15%)

If you’ve used DataStage for longer than a year this is probably the topic you are going to
ace - as long as you have done some type of stage variable use.

Versions: should be okay studying on versions 6, 7 or 8. Transformation stages such as

Transformer, Filter and Modify have not changed much.

Reading: there is more value in using the transformation stages than reading about it. It’s
hard to read about it and take it in as the Transformer stage is easier to navigate and
understand if you are using it. If you have to make do with reading then visit the dsxchange
and look for threads on stage variables, the FAQ on the parallel number generator, removing
duplicates using a transformer and questions in the parallel forum on null handling. This will
be better than reading the manuals as they will be full of practical examples. Read the
Parallel Job Advanced Developers Guide section on "Specifying your own parallel stages".

Exercises: Focus on Transformer, Modify Stage (briefly), Copy Stage (briefly) and Filter
Stage.

Create some mixed up source data with duplicate rows and a multiple field key. Try to
remove duplicates using a Transformer with a sort stage and combination of stage variables
to hold the prior row key value to compare to the new row key value.

Process some data that has nulls in it. Use the null column in a Transformer concatenate
function with and without a nulltovalue function and with and without a reject link from the
Transformer. This gives you an understanding of how rows get dropped and/or trapped from
a transformer. Explore the right mouse click menu in the Transformer, output some of the
DS Macro values and System Variables to a peek stage and think of uses for them in various
data warehouse scenarios. Ignore DS Routine, it’s not on the test.

Don’t spend much time on the Modify stage - it would take forever memorizing functions.
Just do an exercise on handle_null, string_trim and convert string to number. Can be tricky
getting it working and you might not even get a question about it.

Section 7 - Combining and Sorting Data (10%)

Section 10 - Job Design (10%)

I’ve combined these since they overlap. Don’t underestimate this section, it covers a very
narrow range of functionality so it is an easy set of questions to prepare for and get right.
There are easy points on offer in this section.

Versions: Any version 7.x is best, version 6 has a completely different lookup stage, version
8 can be used but remember that the Range lookup functionality is new.

Reading: The Parallel Job Developers Guide has a table showing the differences between the
lookup, merge and join stages. Try to memorize the parts of this table about inputs and
outputs and reject links. This is a good place to learn about some more environment
variables. Read the Parallel Job Advanced Developers Guide looking for any environment
variables with the word SORT or SCORE in them.

Exercises: Compare the way join, lookup and merge work. Create a job that switches
between each type.

Add various combinations of the SORT and COMBINE_OPERATORS environment variables to

your job and examine the score of the job to see what sorts get added to your job at run
time. A simple job with a Remove Duplicates or Join stage and no sorting will add a sort into
the job - even if the data comes from a sorted database source. Use the SORT variables to
turn off this sort insertion. See what happens to Transformer, Lookup and Copy stages when
COMBINE_OPERATORS is turned on or off using the SCORE log entry.

Create an annotation and a description annotation and explore the differences between the
two. Use the Multiple Compile tool in the DataStage Manager (version 6, 7) or Designer
(version 8). Create and use shared containers.

Section 8 - Automation and Production Deployment (10%)

Versions: most deployment methods have remained the same from version 6, 7 and 8.
Version 8 has the same import and export functions. DataStage 8 parameter sets will not be
in the version 7 exam.

Reading: you don’t need to install or use the Version Control tool to pass this section
however you should read the PDF that comes with it to understand the IBM
recommendations for deployment. It covers the move from dev to test to prod. Read the
Server Job Developers Guide section on command line calls to DataStage such as dsjob and
dssearch and dsadmin.

Exercises: practice a few dsjob and dssearch commands. Practice saving a log to a text file.
Create some job specific environment variables.

Fuji SP3000 MS11 MS01 Download
100% (3)
Fuji SP3000 MS11 MS01 Download
1 page
Abinitio Interview
100% (6)
Abinitio Interview
70 pages
ABAP On Hana
No ratings yet
ABAP On Hana
120 pages
Pfmea Ranking Table
100% (7)
Pfmea Ranking Table
3 pages
These Are The Mistakes That ETL Designers Can Make When Processing Scary High Data Volumes
No ratings yet
These Are The Mistakes That ETL Designers Can Make When Processing Scary High Data Volumes
11 pages
BISP Informatica Question Collections
100% (2)
BISP Informatica Question Collections
84 pages
Datastage: Datastage Interview Questions/Answers
No ratings yet
Datastage: Datastage Interview Questions/Answers
28 pages
Freebitco 10000 Rol
No ratings yet
Freebitco 10000 Rol
2 pages
Sandy's DataStage Notes
No ratings yet
Sandy's DataStage Notes
23 pages
Datastage Points
No ratings yet
Datastage Points
26 pages
Datastage Interview Ques
No ratings yet
Datastage Interview Ques
6 pages
Database Helper
100% (3)
Database Helper
8 pages
11
No ratings yet
11
10 pages
14 Good Design Tips in Datastage 8.0.1
No ratings yet
14 Good Design Tips in Datastage 8.0.1
9 pages
Performance Tuning
No ratings yet
Performance Tuning
4 pages
Tableau Performance Checklist
No ratings yet
Tableau Performance Checklist
21 pages
Talend Data Integration Certified Developer Exam Dumps by Young 22 07 2024 12qa Go4braindumps
No ratings yet
Talend Data Integration Certified Developer Exam Dumps by Young 22 07 2024 12qa Go4braindumps
19 pages
Data Stage Faqs2
No ratings yet
Data Stage Faqs2
14 pages
Tableau Interview Presentation
No ratings yet
Tableau Interview Presentation
41 pages
Datastage Debugging Tips
No ratings yet
Datastage Debugging Tips
2 pages
DWH & Datastage
No ratings yet
DWH & Datastage
5 pages
Testing MPP Configuration Datastage
No ratings yet
Testing MPP Configuration Datastage
10 pages
What - S New in DataStage 8 - FINAL
No ratings yet
What - S New in DataStage 8 - FINAL
5 pages
datastage
No ratings yet
datastage
74 pages
Datastage Interview Questions
100% (1)
Datastage Interview Questions
18 pages
Datastage Interview Questions
No ratings yet
Datastage Interview Questions
61 pages
Datastage Interview Questions
No ratings yet
Datastage Interview Questions
64 pages
DBMS PRACTICAL 1-merged (1)
No ratings yet
DBMS PRACTICAL 1-merged (1)
46 pages
Why You Need Datastage 8.5: 1. It'S Faster
No ratings yet
Why You Need Datastage 8.5: 1. It'S Faster
7 pages
Datastage_Interview1
No ratings yet
Datastage_Interview1
3 pages
LSMW Material Master by BAPI Method
100% (1)
LSMW Material Master by BAPI Method
31 pages
Question: Dimension Modeling Types Along With Their Significance
No ratings yet
Question: Dimension Modeling Types Along With Their Significance
27 pages
1.what Is Talend Software? What Is A Project in Talend? Why Is Talend Called A Code Generator?
No ratings yet
1.what Is Talend Software? What Is A Project in Talend? Why Is Talend Called A Code Generator?
3 pages
Compartment and Access: Preface What's New? Getting Started
No ratings yet
Compartment and Access: Preface What's New? Getting Started
176 pages
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
No ratings yet
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
6 pages
101 Tech Tips For VB Developers 001
No ratings yet
101 Tech Tips For VB Developers 001
28 pages
21 Ibm Websphere Datastage Interview Questions A Answers
No ratings yet
21 Ibm Websphere Datastage Interview Questions A Answers
9 pages
Data Stage
No ratings yet
Data Stage
2 pages
SQL
No ratings yet
SQL
11 pages
SQL Server T-SQL
50% (2)
SQL Server T-SQL
30 pages
LSMW Steps
100% (1)
LSMW Steps
39 pages
Top 33 IBM DataStage Interview Questions and Answers
No ratings yet
Top 33 IBM DataStage Interview Questions and Answers
7 pages
Winter (09) Paper of AWT Subject Soluion
No ratings yet
Winter (09) Paper of AWT Subject Soluion
23 pages
Ten Reasons Why You Need DataStage 8.5
No ratings yet
Ten Reasons Why You Need DataStage 8.5
7 pages
9 Jan 01, 2007-431 Dec 31, 2007
No ratings yet
9 Jan 01, 2007-431 Dec 31, 2007
286 pages
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
No ratings yet
What Is Difference Between Server Jobs and Parallel Jobs? Ans:-Server Jobs
71 pages
Entity Framework
No ratings yet
Entity Framework
154 pages
SAP BODS - Commonly Used Platform Transforms - SAP Blogs
No ratings yet
SAP BODS - Commonly Used Platform Transforms - SAP Blogs
8 pages
Dokumen - Pub Database Design and Implementation Java JDBC 2nbsped 9783030338350 9783030338367
100% (1)
Dokumen - Pub Database Design and Implementation Java JDBC 2nbsped 9783030338350 9783030338367
468 pages
The Expression Trees: Getting Started
No ratings yet
The Expression Trees: Getting Started
5 pages
Microstation V8 - 20-Tips & Tricks
No ratings yet
Microstation V8 - 20-Tips & Tricks
54 pages
Advance Java 33333
No ratings yet
Advance Java 33333
15 pages
Datastage Interview Questions - Answers - 0516
No ratings yet
Datastage Interview Questions - Answers - 0516
29 pages
Datastage
No ratings yet
Datastage
4 pages
Datastage Info
No ratings yet
Datastage Info
28 pages
04 Metadata and Metadata Management
No ratings yet
04 Metadata and Metadata Management
23 pages
Did U Get The Data From Any Other Data Sources
No ratings yet
Did U Get The Data From Any Other Data Sources
4 pages
Introduction To SQL Stored Procedures
No ratings yet
Introduction To SQL Stored Procedures
3 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
From Everand
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Anthony Serpico
No ratings yet
Bus Buddy Mini Project Report
No ratings yet
Bus Buddy Mini Project Report
57 pages
Manual Virdi AC7000 PDF
No ratings yet
Manual Virdi AC7000 PDF
75 pages
Dryjanski 2020
No ratings yet
Dryjanski 2020
6 pages
Excel Lab Exercises - V3
No ratings yet
Excel Lab Exercises - V3
15 pages
Object Query Language
No ratings yet
Object Query Language
3 pages
Array List
No ratings yet
Array List
15 pages
Field Safety Notice Form
No ratings yet
Field Safety Notice Form
2 pages
Chapter 4 Inheritance
0% (1)
Chapter 4 Inheritance
68 pages
What Is Synchronous Transmission
100% (1)
What Is Synchronous Transmission
6 pages
WiSenMeshNET Vibrating Wire Interface Node en
No ratings yet
WiSenMeshNET Vibrating Wire Interface Node en
3 pages
Mi Power
0% (1)
Mi Power
24 pages
SWEP in civil engineering
No ratings yet
SWEP in civil engineering
41 pages
Disha Chat Bot
No ratings yet
Disha Chat Bot
15 pages
AEONIX
No ratings yet
AEONIX
290 pages
Object-Oriented and Classical Software Engineering: Stephen R. Schach
No ratings yet
Object-Oriented and Classical Software Engineering: Stephen R. Schach
63 pages
SAP Plant Maintenance Training: by Cube ERP
No ratings yet
SAP Plant Maintenance Training: by Cube ERP
6 pages
A Dissertation Submitted To The Bharathidasan University in Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
A Dissertation Submitted To The Bharathidasan University in Partial Fulfillment of The Requirements For The Award of The Degree of
5 pages
Câu Hỏi ĐỌC HIỂU FULL Lời Giải Chi Tiết (Phần 5)
No ratings yet
Câu Hỏi ĐỌC HIỂU FULL Lời Giải Chi Tiết (Phần 5)
30 pages
10 True Repair Case Histories of LCD Monitors Volume I PDF
100% (1)
10 True Repair Case Histories of LCD Monitors Volume I PDF
58 pages
Barbershop Online
No ratings yet
Barbershop Online
4 pages
Hand Book For Bba Students
No ratings yet
Hand Book For Bba Students
187 pages
Final Taco Bell
No ratings yet
Final Taco Bell
26 pages
Presentation Cados
No ratings yet
Presentation Cados
22 pages
Raghav Pract..
No ratings yet
Raghav Pract..
36 pages
Format - For - MSC - Thesis
No ratings yet
Format - For - MSC - Thesis
14 pages
025-9599 - E Conventional Services Gateway (CSG)
No ratings yet
025-9599 - E Conventional Services Gateway (CSG)
24 pages