50% found this document useful (2 votes)
433 views

DataStage Basics

This document provides an overview of DataStage, an ETL tool used for data integration projects like data warehousing. It describes what DataStage is, its components, and how it is used to design jobs for extracting, transforming and loading data. Key aspects covered include the DataStage architecture, terminology, server and client components, how to develop jobs using the Designer, import and manage metadata, and set project properties.

Uploaded by

DataStage4You
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
433 views

DataStage Basics

This document provides an overview of DataStage, an ETL tool used for data integration projects like data warehousing. It describes what DataStage is, its components, and how it is used to design jobs for extracting, transforming and loading data. Key aspects covered include the DataStage architecture, terminology, server and client components, how to develop jobs using the Designer, import and manage metadata, and set project properties.

Uploaded by

DataStage4You
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
You are on page 1/ 90

DataWarehousing & DataStage - Session 2

Data Stage : Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

What is Datastage?
Datastage is an ETL tool used to design jobs for Extraction, Transformation and Load Ideal Tool for data integration projects-such as data warehouses and data marts.

9/26/2013 4:25 AM

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

What is a DATA WAREHOUSE?


A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

9/26/2013 4:25 AM 3

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Data Warehouse
A data warehouse is a
subject-oriented integrated time-varying non-volatile

collection of data that is used primarily in organizational decision making.


-- Bill Inmon, Building the Data Warehouse 1996

9/26/2013 4:25 AM

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Data Warehouse Architecture


Relational Databases

Optimized Loader

ERP Systems

Extraction Cleansing Data Warehouse Engine Analyze Query

Purchased Data

Legacy Data Metadata Repository 9/26/2013 4:25 AM 5

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Why Data Warehouse?


Who are our lowest/highest margin customers ?

What is the most effective distribution channel?

Who are my customers and what products are they buying?

What product prom-otions have the biggest impact on revenue?

to the

competition

Which customers are most likely to go

9/26/2013 4:25 AM

What impact will new products/services have on revenue and margins?

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

The Motivation

DATA

INFORMATION

UNDERSTANDING

DECISION

9/26/2013 4:25 AM

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

INTRODUCTION TO DATASTAGE

9/26/2013 4:25 AM

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

What is DataStage ?
DataStage is a client server application. Server can be installed in either Windows or Unix Operating Systems. Client can be installed in Windows Communication between the client tools and DataStage server Design jobs for Extraction, Transformation, and Loading (ETL) Ideal Tool for data integration projects-such as data warehouses, data marts and system migration.

9/26/2013 4:25 AM

Data Stage: Basics

9/26/2013

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DATASTAGE -- ETL TOOL

Extract

Load

Transform

The Foundation for Business Intelligence

9/26/2013 4:25 AM

10

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Architecture
SERVER

Multiple processes dynamically stated to support Parallelization and development


Extraction/ Transformation rules and Metadata

NT/ UNIX
Intel Alpha Unix Solaris ENGINE

WIN 95/NT

MANAGER
CLIENT

DESIGNER

DIRECTOR

ADMIN

Graphical workflow style tools for point-and-click specifications of sources, targets and transformation requirements
9/26/2013 4:25 AM 11

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Server and Clients

9/26/2013 4:25 AM

12

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Terminology
Project: A Project is a collection of related Jobs Job : A job is an executable Program which is built using different stages in GUI Stages: They represent the processing steps required. Links: They represent the flow of data between different stages. Shared Containers: Defines reusable logic Sequences: Allows to run a sequence of related Jobs.

9/26/2013 4:25 AM

13

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DATASTAGE SERVER COMPONENTS:


REPOSITORY Central Store for all info required to build a DW

DATASTAGE SERVER Runs jobs

DATASTAGE CLIENT COMPONENTS:


ADMINISTRATOR To create and move projects, set up DS users DESIGNER Design Interface to create DS jobs DIRECTOR GUI to schedule, run and monitor jobs. MANAGER GUI to view and edit Repository contents
9/26/2013 4:25 AM 14

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Administrator

9/26/2013 4:25 AM

15

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Client Logon

9/26/2013 4:25 AM

16

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Manager

9/26/2013 4:25 AM

17

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Designer

9/26/2013 4:25 AM

18

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Director

9/26/2013 4:25 AM

19

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Developing in DataStage
Define global and project properties in Administrator Import meta data into Manager Build job in Designer Compile Designer Validate, run, and monitor in Director

9/26/2013 4:25 AM

20

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Projects

9/26/2013 4:25 AM

21

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Project Properties
Projects can be created and deleted in Administrator Project properties and defaults are set in Administrator

9/26/2013 4:25 AM

22

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Setting Project Properties To set project properties, log onto Administrator, select your project, and then click Properties

9/26/2013 4:25 AM

23

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Projects General Tab

9/26/2013 4:25 AM

24

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Environment Variables

9/26/2013 4:25 AM

25

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Permissions Tab

9/26/2013 4:25 AM

26

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Tunables Tab

9/26/2013 4:25 AM

27

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Parallel Tab

9/26/2013 4:25 AM

28

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Managing Meta Data

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

What Is Metadata?

Data

Source
Meta Data

Transform

Target
Meta Data

Meta Data Repository


9/26/2013 4:25 AM 30

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Manager

9/26/2013 4:25 AM

31

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Manager Contents
Metadata describing sources and targets: Table definitions DataStage objects: jobs, routines, table definitions, etc.

9/26/2013 4:25 AM

32

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Import and Export


Any object in Manager can be exported to a file Can export whole projects Use for backup Sometimes used for version control Can be used to move DataStage objects from one project to another Use to share DataStage jobs and projects with other developers

9/26/2013 4:25 AM

33

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Export Procedure
In Manager, click Export>DataStage Components Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine

9/26/2013 4:25 AM

34

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Exporting DataStage Objects

9/26/2013 4:25 AM

35

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Exporting DataStage Objects

9/26/2013 4:25 AM

36

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Import Procedure
In Manager, click Import>DataStage Components Select DataStage objects for import

9/26/2013 4:25 AM

37

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Importing DataStage Objects

9/26/2013 4:25 AM

38

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Metadata Import
Import format and column destinations from sequential files Import relational table column destinations Imported as Table Definitions Table definitions can be loaded into job stages

9/26/2013 4:25 AM

39

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Sequential File Import Procedure


In Manager, click Import>Table Definitions>Sequential File Definitions Select directory containing sequential file and then the file Select Manager category Examined format and column definitions and edit is necessary

9/26/2013 4:25 AM

40

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Designing and Documenting Jobs

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

What Is a Job?
Executable DataStage program Created in DataStage Designer, but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH)

9/26/2013 4:25 AM

42

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Development Overview


In Manager, import metadata defining sources and targets In Designer, add stages defining data extractions and loads And Transformers and other stages to defined data transformations Add links defining the flow of data from sources to targets Compiled the job In Director, validate, run, and monitor your job
9/26/2013 4:25 AM 43

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Designer Work Area

9/26/2013 4:25 AM

44

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Designer Toolbar
Show/hide metadata markers

Job properties

Compile

9/26/2013 4:25 AM

45

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Tools Palette

9/26/2013 4:25 AM

46

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Adding Stages and Links


Stages can be dragged from the tools palette or from the stage type branch of the repository view Links can be drawn from the tools palette or by right clicking and dragging from one stage to another

9/26/2013 4:25 AM

47

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Sequential File Stage


Used to extract data from, or load data to, a sequential file Specify full path to the file Specify a file format: fixed width or delimited Specified column definitions Specify write action

9/26/2013 4:25 AM

48

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Designer - Create New Job

9/26/2013 4:25 AM

49

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Drag Stages and Links Using Palette

9/26/2013 4:25 AM

50

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Assign Meta Data

9/26/2013 4:25 AM

51

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Editing a Sequential Source Stage

9/26/2013 4:25 AM

52

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Transformer Stage
Used to define constraints, derivations, and column mappings A column mapping maps an input column to an output column In this module will just defined column mappings (no derivations)

9/26/2013 4:25 AM

53

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Transformer Stage Elements

9/26/2013 4:25 AM

54

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Create Column Mappings

9/26/2013 4:25 AM

55

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Creating Stage Variables

9/26/2013 4:25 AM

56

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Result

9/26/2013 4:25 AM

57

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Adding Job Parameters


Makes the job more flexible Parameters can be:
Used in constraints and derivations Used in directory and file names

Parameter values are determined at run time

9/26/2013 4:25 AM

58

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Adding Job Documentation


Job Properties
Short and long descriptions

Shows in Manager

Annotation stage
Is a stage on the tool palette Shows on the job GUI (work area)

9/26/2013 4:25 AM

59

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Properties Documentation

9/26/2013 4:25 AM

60

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Annotation Stage on the Palette

9/26/2013 4:25 AM

61

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Annotation Stage Properties

9/26/2013 4:25 AM

62

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Final Job Work Area with Documentation

9/26/2013 4:25 AM

63

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Compiling a Job

9/26/2013 4:25 AM

64

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Errors or Successful Message

9/26/2013 4:25 AM

65

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Running Jobs

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Prerequisite to Job Execution


Result from Designer compile

9/26/2013 4:25 AM

67

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

DataStage Director
Can schedule, validating, and run jobs Can be invoked from DataStage Manager or Designer
Tools > Run Director

9/26/2013 4:25 AM

68

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Running Your Job

9/26/2013 4:25 AM

69

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Run Options Parameters and Limits

9/26/2013 4:25 AM

70

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Director Log View

9/26/2013 4:25 AM

71

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Message Details are Available

9/26/2013 4:25 AM

72

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Other Director Functions


Schedule job to run on a particular date/time Clear job log Set Director options
Row limits Abort after x warnings

9/26/2013 4:25 AM

73

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Standards and Techniques

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Presentation

Document using the annotation stage

9/26/2013 4:25 AM

75

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Naming conventions
Stages named after the
Data they access

Function they perform


DO NOT leave defaulted stage names like Sequential_File_0

Links named for the data they carry


DO NOT leave defaulted link names like DSLink3

9/26/2013 4:25 AM

76

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Stage and Link Names

Stages and links renamed to data they handle

9/26/2013 4:25 AM

77

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Create Reusable Job Components

Container

9/26/2013 4:25 AM

78

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Partitioning and Collecting Icons

Partitioner Collector

9/26/2013 4:25 AM

79

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

More Stages

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Transformer Stage Functions


Control data flow Create derivations

9/26/2013 4:25 AM

81

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Rejecting Data Example

Property Reject Mode = Output

If Not Found property

Constraint Other/log option

9/26/2013 4:25 AM

82

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Transformer Stage Properties

9/26/2013 4:25 AM

83

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Stage Variables

Show/Hide button

9/26/2013 4:25 AM

84

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

1. The Join Stage


Four types:

2 sorted input links, 1 output link "left outer" on primary input, "right outer" on secondary input Pre-sort make joins "lightweight": few rows need to be in RAM

Inner Left Outer Right Outer Full Outer

9/26/2013 4:25 AM

85

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Control Using the Job Sequencer

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Control Options


Manually write job control Code generated in Basic Use the job control tab on the job properties page Generates basic code which you can modify Job Sequencer Build a controlling job much the same way you build other jobs Comprised of stages and links

No basic coding
9/26/2013 4:25 AM 87

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Job Sequencer
Build like a regular job Type Job Sequence Has stages and links Job Activity stage represents a DataStage job Links represent passing control
9/26/2013 4:25 AM 88

Stages

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

Example
Job Activity stage contains conditional triggers

9/26/2013 4:25 AM

89

Data Stage: Basics

Atul Singh
[email protected]

DataWarehousing & DataStage - Session 2

QUESTIONS ??????????
9/26/2013 4:25 AM 90

Data Stage: Basics

Atul Singh
[email protected]

You might also like