DataStage Basics
DataStage Basics
Atul Singh
[email protected]
What is Datastage?
Datastage is an ETL tool used to design jobs for Extraction, Transformation and Load Ideal Tool for data integration projects-such as data warehouses and data marts.
9/26/2013 4:25 AM
Atul Singh
[email protected]
9/26/2013 4:25 AM 3
Atul Singh
[email protected]
Data Warehouse
A data warehouse is a
subject-oriented integrated time-varying non-volatile
9/26/2013 4:25 AM
Atul Singh
[email protected]
Optimized Loader
ERP Systems
Purchased Data
Atul Singh
[email protected]
to the
competition
9/26/2013 4:25 AM
Atul Singh
[email protected]
The Motivation
DATA
INFORMATION
UNDERSTANDING
DECISION
9/26/2013 4:25 AM
Atul Singh
[email protected]
INTRODUCTION TO DATASTAGE
9/26/2013 4:25 AM
Atul Singh
[email protected]
What is DataStage ?
DataStage is a client server application. Server can be installed in either Windows or Unix Operating Systems. Client can be installed in Windows Communication between the client tools and DataStage server Design jobs for Extraction, Transformation, and Loading (ETL) Ideal Tool for data integration projects-such as data warehouses, data marts and system migration.
9/26/2013 4:25 AM
9/26/2013
Atul Singh
[email protected]
Extract
Load
Transform
9/26/2013 4:25 AM
10
Atul Singh
[email protected]
DataStage Architecture
SERVER
NT/ UNIX
Intel Alpha Unix Solaris ENGINE
WIN 95/NT
MANAGER
CLIENT
DESIGNER
DIRECTOR
ADMIN
Graphical workflow style tools for point-and-click specifications of sources, targets and transformation requirements
9/26/2013 4:25 AM 11
Atul Singh
[email protected]
9/26/2013 4:25 AM
12
Atul Singh
[email protected]
DataStage Terminology
Project: A Project is a collection of related Jobs Job : A job is an executable Program which is built using different stages in GUI Stages: They represent the processing steps required. Links: They represent the flow of data between different stages. Shared Containers: Defines reusable logic Sequences: Allows to run a sequence of related Jobs.
9/26/2013 4:25 AM
13
Atul Singh
[email protected]
Atul Singh
[email protected]
DataStage Administrator
9/26/2013 4:25 AM
15
Atul Singh
[email protected]
Client Logon
9/26/2013 4:25 AM
16
Atul Singh
[email protected]
DataStage Manager
9/26/2013 4:25 AM
17
Atul Singh
[email protected]
DataStage Designer
9/26/2013 4:25 AM
18
Atul Singh
[email protected]
DataStage Director
9/26/2013 4:25 AM
19
Atul Singh
[email protected]
Developing in DataStage
Define global and project properties in Administrator Import meta data into Manager Build job in Designer Compile Designer Validate, run, and monitor in Director
9/26/2013 4:25 AM
20
Atul Singh
[email protected]
DataStage Projects
9/26/2013 4:25 AM
21
Atul Singh
[email protected]
Project Properties
Projects can be created and deleted in Administrator Project properties and defaults are set in Administrator
9/26/2013 4:25 AM
22
Atul Singh
[email protected]
Setting Project Properties To set project properties, log onto Administrator, select your project, and then click Properties
9/26/2013 4:25 AM
23
Atul Singh
[email protected]
9/26/2013 4:25 AM
24
Atul Singh
[email protected]
Environment Variables
9/26/2013 4:25 AM
25
Atul Singh
[email protected]
Permissions Tab
9/26/2013 4:25 AM
26
Atul Singh
[email protected]
Tunables Tab
9/26/2013 4:25 AM
27
Atul Singh
[email protected]
Parallel Tab
9/26/2013 4:25 AM
28
Atul Singh
[email protected]
Atul Singh
[email protected]
What Is Metadata?
Data
Source
Meta Data
Transform
Target
Meta Data
Atul Singh
[email protected]
DataStage Manager
9/26/2013 4:25 AM
31
Atul Singh
[email protected]
Manager Contents
Metadata describing sources and targets: Table definitions DataStage objects: jobs, routines, table definitions, etc.
9/26/2013 4:25 AM
32
Atul Singh
[email protected]
9/26/2013 4:25 AM
33
Atul Singh
[email protected]
Export Procedure
In Manager, click Export>DataStage Components Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine
9/26/2013 4:25 AM
34
Atul Singh
[email protected]
9/26/2013 4:25 AM
35
Atul Singh
[email protected]
9/26/2013 4:25 AM
36
Atul Singh
[email protected]
Import Procedure
In Manager, click Import>DataStage Components Select DataStage objects for import
9/26/2013 4:25 AM
37
Atul Singh
[email protected]
9/26/2013 4:25 AM
38
Atul Singh
[email protected]
Metadata Import
Import format and column destinations from sequential files Import relational table column destinations Imported as Table Definitions Table definitions can be loaded into job stages
9/26/2013 4:25 AM
39
Atul Singh
[email protected]
9/26/2013 4:25 AM
40
Atul Singh
[email protected]
Atul Singh
[email protected]
What Is a Job?
Executable DataStage program Created in DataStage Designer, but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH)
9/26/2013 4:25 AM
42
Atul Singh
[email protected]
Atul Singh
[email protected]
9/26/2013 4:25 AM
44
Atul Singh
[email protected]
Designer Toolbar
Show/hide metadata markers
Job properties
Compile
9/26/2013 4:25 AM
45
Atul Singh
[email protected]
Tools Palette
9/26/2013 4:25 AM
46
Atul Singh
[email protected]
9/26/2013 4:25 AM
47
Atul Singh
[email protected]
9/26/2013 4:25 AM
48
Atul Singh
[email protected]
9/26/2013 4:25 AM
49
Atul Singh
[email protected]
9/26/2013 4:25 AM
50
Atul Singh
[email protected]
9/26/2013 4:25 AM
51
Atul Singh
[email protected]
9/26/2013 4:25 AM
52
Atul Singh
[email protected]
Transformer Stage
Used to define constraints, derivations, and column mappings A column mapping maps an input column to an output column In this module will just defined column mappings (no derivations)
9/26/2013 4:25 AM
53
Atul Singh
[email protected]
9/26/2013 4:25 AM
54
Atul Singh
[email protected]
9/26/2013 4:25 AM
55
Atul Singh
[email protected]
9/26/2013 4:25 AM
56
Atul Singh
[email protected]
Result
9/26/2013 4:25 AM
57
Atul Singh
[email protected]
9/26/2013 4:25 AM
58
Atul Singh
[email protected]
Shows in Manager
Annotation stage
Is a stage on the tool palette Shows on the job GUI (work area)
9/26/2013 4:25 AM
59
Atul Singh
[email protected]
9/26/2013 4:25 AM
60
Atul Singh
[email protected]
9/26/2013 4:25 AM
61
Atul Singh
[email protected]
9/26/2013 4:25 AM
62
Atul Singh
[email protected]
9/26/2013 4:25 AM
63
Atul Singh
[email protected]
Compiling a Job
9/26/2013 4:25 AM
64
Atul Singh
[email protected]
9/26/2013 4:25 AM
65
Atul Singh
[email protected]
Running Jobs
Atul Singh
[email protected]
9/26/2013 4:25 AM
67
Atul Singh
[email protected]
DataStage Director
Can schedule, validating, and run jobs Can be invoked from DataStage Manager or Designer
Tools > Run Director
9/26/2013 4:25 AM
68
Atul Singh
[email protected]
9/26/2013 4:25 AM
69
Atul Singh
[email protected]
9/26/2013 4:25 AM
70
Atul Singh
[email protected]
9/26/2013 4:25 AM
71
Atul Singh
[email protected]
9/26/2013 4:25 AM
72
Atul Singh
[email protected]
9/26/2013 4:25 AM
73
Atul Singh
[email protected]
Atul Singh
[email protected]
Job Presentation
9/26/2013 4:25 AM
75
Atul Singh
[email protected]
Naming conventions
Stages named after the
Data they access
9/26/2013 4:25 AM
76
Atul Singh
[email protected]
9/26/2013 4:25 AM
77
Atul Singh
[email protected]
Container
9/26/2013 4:25 AM
78
Atul Singh
[email protected]
Partitioner Collector
9/26/2013 4:25 AM
79
Atul Singh
[email protected]
More Stages
Atul Singh
[email protected]
9/26/2013 4:25 AM
81
Atul Singh
[email protected]
9/26/2013 4:25 AM
82
Atul Singh
[email protected]
9/26/2013 4:25 AM
83
Atul Singh
[email protected]
Stage Variables
Show/Hide button
9/26/2013 4:25 AM
84
Atul Singh
[email protected]
2 sorted input links, 1 output link "left outer" on primary input, "right outer" on secondary input Pre-sort make joins "lightweight": few rows need to be in RAM
9/26/2013 4:25 AM
85
Atul Singh
[email protected]
Atul Singh
[email protected]
No basic coding
9/26/2013 4:25 AM 87
Atul Singh
[email protected]
Job Sequencer
Build like a regular job Type Job Sequence Has stages and links Job Activity stage represents a DataStage job Links represent passing control
9/26/2013 4:25 AM 88
Stages
Atul Singh
[email protected]
Example
Job Activity stage contains conditional triggers
9/26/2013 4:25 AM
89
Atul Singh
[email protected]
QUESTIONS ??????????
9/26/2013 4:25 AM 90
Atul Singh
[email protected]