0% found this document useful (0 votes)
53 views45 pages

Data Warehousing Basics - Iii: Associateid:178467

data warehouse

Uploaded by

gadmax1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views45 pages

Data Warehousing Basics - Iii: Associateid:178467

data warehouse

Uploaded by

gadmax1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

A

s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
Data Warehousing Basics - III
C3: Protected
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
2 2007, Cognizant Technology Solutions Confidential
About the Author
4 years of Experience in DW and BI tools. Credential
Information:
DWBASIC/PPT/0807/1.0 Version and
Date:
Dhana Lakshmi Thirunavukkarasu (154180) Created By:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
3 2007, Cognizant Technology Solutions Confidential
Icons Used
Questions
Contacts
Reference
Demonstration
Hands on
Exercise
Coding
Standards
Test Your
Understanding
Tools
A Welcome
Break
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
4 2007, Cognizant Technology Solutions Confidential
Data Warehousing Basics - III: Overview
Introduction:
This chapter explains you the following topics:
ETL Process.
Types of ETL.
Metadata and its types.
Data warehousing Tools.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
5 2007, Cognizant Technology Solutions Confidential
Objective:
After completing this chapter, you will be able to:
Explain ETL concepts
Describe Metadata
Data Warehousing Basics - III: Objective
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
6 2007, Cognizant Technology Solutions Confidential
What is ETL?
ETL (Extraction, Transformation, and Loading) is a process by
which data is integrated and transformed from the operational
systems into the Data Warehouse environment.
ETL can be one of the most challenging warehousing tasks.
Various ETL scenarios:
Different source data formats
Incremental updates
Inconsistent filenames
Missing column headers
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
7 2007, Cognizant Technology Solutions Confidential
ETL Process
Steps involved in the ETL process:
Capture
Scrub or data cleansing
Transform
Load and Index
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
8 2007, Cognizant Technology Solutions Confidential
ETL Process: Capture
Static extract: Capturing a snapshot of the source data at a point in
time
Incremental extract: Capturing changes that have occurred since the
last static extract
Capture = extractobtaining a snapshot
of a chosen subset of the source data for
loading into the data warehouse
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
9 2007, Cognizant Technology Solutions Confidential
ETL Process: Scrub
Scrub = cleanseuses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors: Misspellings, erroneous
dates, incorrect field usage, mismatched
addresses, missing data, duplicate data,
inconsistencies
It also perform: Decoding, reformatting,
time stamping, conversion, key generation,
merging, error detection/logging, locating
missing data
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
10 2007, Cognizant Technology Solutions Confidential
ETL Process: Transform
Transform = convert data from
format of operational system to
format of data warehouse
Record-level:
Selection: Data partitioning
Joining: Data combining
Aggregation: Data summarization
Field-level:
Singlefield: From one field to one field
Multifield: From many fields to one,
or one field to many
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
11 2007, Cognizant Technology Solutions Confidential
ETL Process: Load and Index
Load/Index= place transformed
data into the warehouse and create
indexes
Refresh mode: Bulk rewriting of target
data at periodic intervals
Update mode: Only changes in source
data are written to data warehouse
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
12 2007, Cognizant Technology Solutions Confidential
Extraction Types
Extraction
Full Extract
Periodic/
Incremental
Extract
The following diagram explains the types of extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
13 2007, Cognizant Technology Solutions Confidential
Full Extract
Source System
Full Extract
Existing data
Data Mart
The following diagram explains the full extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
14 2007, Cognizant Technology Solutions Confidential
Full Extract (Contd.)
Source System
Full Extract
Data Mart
New data
The following diagram explains the full extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
15 2007, Cognizant Technology Solutions Confidential
Incremental Extract
Data Mart
Source System
Incremental Extract
Existing data
Incremental
Data
Incremental extraction is known as change data capture. The
following diagram explains the incremental extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
16 2007, Cognizant Technology Solutions Confidential
Incremental Extract (Contd.)
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data
Incremental
Data
The following diagram explains the incremental extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
17 2007, Cognizant Technology Solutions Confidential
Incremental Extract (Contd.)
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data
updated using
changed data
Incremental
Data
Incremental addition
to data mart
unchanged
data
Newly
inserted data
The following diagram explains the incremental extraction:
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
18 2007, Cognizant Technology Solutions Confidential
Structural Transformation
Additive
Orders arrive
every two
minutes
Aggregate
Average
Daily
Productivity
figures
Average
OLTP
OLTP
Data
Warehous
e
Data
Warehous
e
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
19 2007, Cognizant Technology Solutions Confidential
Format Transformation
Splitting
Data Type
Conversions
Source
Schema
32
Transformation
Target
Schema
32
Age as a
String
Age as an
Integer
15-10-
1992
Date as a
String
Transformation
15 10 1999
Day Month Year
Date as a combination
of 3 integer fields
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
20 2007, Cognizant Technology Solutions Confidential
Simple Conversions
Source Schema
Rs. 10000
Multiply by 1/43
Target Schema
$232.56
Revenue in Rupees
Revenue in Dollars
1000 lbs.
Multiply by 0.4536
453.56 kgs.
Production in
Pounds
Production in Kilograms
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
21 2007, Cognizant Technology Solutions Confidential
Single-field Transformation
In general some
transformation function
translates data from old form
to new form
Algorithmic transformation
uses a formula or logical
expression
Table lookup another
approach
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
22 2007, Cognizant Technology Solutions Confidential
Multi-field Transformation
1:M from one source
field to many target
fields
M:1 from many
source fields to one
target field
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
23 2007, Cognizant Technology Solutions Confidential
Why ETL Tools?
ETL Tool:
Provides facility to specify a large number of transformation rules with
a GUI.
Generate programs to transform data.
Handle multiple data sources.
Handle data redundancy.
Generate metadata as output.
Most tools exploit parallelism by running on multiple lowcost servers
in multithreaded environment.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
24 2007, Cognizant Technology Solutions Confidential
What is Repository?
Repository is a database containing Enterprise's Metadata
(Data about Data) and Access and Reporting mechanism for
that Database.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
25 2007, Cognizant Technology Solutions Confidential
Ideal Repository
Ideal repository has the following characteristics:
Openness: Standard APIs and development environment
Flexibility: Heterogeneous platform capabilities
Usability: Webbased access, good GUI capabilities
Extensibility: Definable, extensible information models, and functions
Versioning: Sophisticated configuration management capabilities
Performance: Optimised data store
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
26 2007, Cognizant Technology Solutions Confidential
What is Metadata?
Describes the WHAT, WHEN, WHO, WHERE, HOW of the Data
Warehouse.
Describes about the data being captured and loaded into the
Warehouse.
Documented in IT tools that improves both business and
technical understanding of data and datarelated processes.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
27 2007, Cognizant Technology Solutions Confidential
What is Metadata? (Contd.)
Defining Metadata
Simplest definition Data about data.
PROMOTION
LOCATION
UNIT_PRICE
QUANTITY
DATE
ITEM
CUST_ID
SALE_ID
Data base table Metadata
LINKS TO RELATED REPORTS: TOTAL SALES,
CUSTOMER PROFILES
PURPOSE: This table tracks customer sales
LAST MODIFIED BY :MIS OWNER
LAST MODIFIED DATE 03MAR2003 09:30:00
CREATE DATE: 25 OCT 2002 21:54:00
TABLE_OWNER: MIS_OWNER
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
28 2007, Cognizant Technology Solutions Confidential
Importance of Metadata
Locating information:
Time spent in looking for information.
How often information is found?
What poor decisions were made based on the incomplete information?
How much money was lost or earned as a result?
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
29 2007, Cognizant Technology Solutions Confidential
Importance of Metadata (Contd.)
Interpreting information:
How many times have businesses needed to rework or recall products?
What impact does it have on the bottom line?
How many mistakes were due to misinterpretation of existing
documentation?
How much interpretation results form too much metadata?
How much time is spent trying to determine if any of the metadata is
accurate?
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
30 2007, Cognizant Technology Solutions Confidential
Importance of Metadata (Contd.)
Integrating information:
How various data perspectives connect together?
How much time is spent trying to figure out that?
How much does the inefficiency and lack of metadata affect decision
making?
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
31 2007, Cognizant Technology Solutions Confidential
Consumers of Metadata
User
What data,
prebuilt
Queries Exists
S/W Tool-ETL,
Modelling, OLAP Etc.
Support Development
of Data warehouse,
Data Mart
DBA
Impact of changes
in operational
system to data
warehouse and
data mart
Developer
Impact Analysis
Metadata
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
32 2007, Cognizant Technology Solutions Confidential
Metadata Types
ETL metadata:
Holds only metadata related to ETL process.
Examples:
Sources/Targets:
Table name, comment, DB Type, DBD String
Fields
Name, type, length, comment, nullable
Mappings:
Name, comment, source, target, transformation objects
Fields/ports
Name, type, length, comment, transformation logic
Sessions:
Name, start time, repeat count, source/target connection, overrides
Transformation objects, mapplets, batches, folders
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
33 2007, Cognizant Technology Solutions Confidential
Metadata Types (Contd.)
Database metadata:
Hold Database related metadata information.
Examples:
Tables/Views:
Name, comment, owner, size, indices, triggers
Fields:
Name, type, length, comment, nullable
Stored Procedures/Triggers:
Name, comment, code
Indices:
Name, comment, table.field(s)
Users:
Name, password, permissions
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
34 2007, Cognizant Technology Solutions Confidential
Metadata Types (Contd.)
Reporting metadata:
Holds Reporting metadata details.
Examples:
Reports:
Name, comment, table, fields, transformation logic
Tables:
Name, comment, owner
Fields:
Name, type, length, comment, nullable
Each BI product is unique:
Business Objects:
Universes, domains, documents, classes, objects
Micro Strategy:
Filters, templates, metrics, transformations
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
35 2007, Cognizant Technology Solutions Confidential
Metadata Types (Contd.)
Others:
Examples:
Business Rules:
Name, comment, owner, description/psuedo code
End user requirements:
Name, comment, owner, description
Test Cases:
Name, comment, owner, description
Servers:
Name, department, CPUs, memory, location, admin contact
Databases:
Name, department, type, location, admin contact
Resources:
Name, comment, department, phone, email
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
36 2007, Cognizant Technology Solutions Confidential
Dormant Data
The data that is hardly used in a Data Warehouse is called
dormant data.
The faster Data Warehouses grows, the more data becomes
dormant. Over a period of time the amount of dormant data in
a Data Warehouse increases.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
37 2007, Cognizant Technology Solutions Confidential
Origins of Dormant Data
Storing history data that is not required
Storing columns that are never used
Storing detail level data when only summary level data is used
Creating summary data that is never used
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
38 2007, Cognizant Technology Solutions Confidential
Strategy for Removing Dormant Data
The strategy for removing dormant data might include:
Removing data after a period of time say after two years.
Removing summary data that has not been accessed in the past six
months.
Removing columns that have never or only very infrequently been
accessed.
Storing data for high profile users even though that data has not been
accessed.
Storing data for selected accounts even though that data has not been
accessed.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
39 2007, Cognizant Technology Solutions Confidential
Tuning a Data Warehouse
Some of the techniques that can be used for tuning a Data
Warehouse are:
Handling dormant data
Storing pre summarized data based on data pattern usage
Creating indexes for data that is frequently used
Merging tables that have common and regular access
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
40 2007, Cognizant Technology Solutions Confidential
Data Warehousing Tools
Design Tools:
ERWIN
Power soft Warehouse Architect
Oracle Designer
ETL Tools:
Oracle Warehouse Builder
Power Centre/Mart from Informatica
DataStage from Ascetical
Abinitio
Reporting Tools:
Discoverer
Business Objects
Cognos
Crystal Reports
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
41 2007, Cognizant Technology Solutions Confidential
Questions from participants
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
42 2007, Cognizant Technology Solutions Confidential
Test Your Understanding
1. What is ETL?
2. Explain the types of Extracts.
3. Name the types of Transformations.
4. What is Metadata?
5. Why Metadata is needed?
6. Is dormant data affect the performance of Data Warehouse?
7. What are the tuning tips for DWH?
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
43 2007, Cognizant Technology Solutions Confidential
Data Warehousing Basics - III: Summary
ETL (Extraction, Transformation, and Loading) is a process by
which data is integrated and transformed from the operational
systems into the Data Warehouse environment.
Metadata is nothing but data about data.
The data that is hardly used in a Data Warehouse is called
dormant data.
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
44 2007, Cognizant Technology Solutions Confidential
Data Warehousing Basics - III: Source
Data Warehousing Life cycle Tool Kit by Ralph Kimball.
Disclaimer: Parts of the content of this course is based on the materials available from the Web sites and books
listed above. The materials that can be accessed from linked sites are not maintained by Cognizant Academy and
we are not responsible for the contents thereof. All trademarks, service marks, and trade names in this course are
the marks of the respective owner(s).
A
s
s
o
c
i
a
t
e
I
D
:
1
7
8
4
6
7
You have successfully completed
Data Warehousing Basics - III
Click here to proceed

You might also like