Data Staging

The document discusses the process of staging data from multiple sources for a data warehouse at a building materials company. It outlines 10 tasks to design the staging area and ETL processes: 1) Identify original data sources and columns, 2) Create staging tables, 3) Create star schema tables, 4) Create staging area tables for the star, 5) Validate data and handle errors, 6) Load star tables, 7) Address errors at the source, 8) Handle existing data in subsequent quarters, 9) Incorporate an operational data store, 10) Examine changes from a new fact table. The goal is to extract, cleanse, transform and load transaction and market data into dimensional tables for analysis.

Uploaded by

Anh Minh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views3 pages

Data Staging

Uploaded by

Anh Minh Lê

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

Data Staging

LotaDoors is a company that sells building materials to trade and public customers. All
transactions are recorded in their branch based operational systems, a schema of which
follows.

Management have asked you to write a system that will report, for each quarter (three
months) and for each product group (timber, building materials, etc) how the company's
quantities of sales compare with the national market totals.
The data for full market sector will be supplied to you (at a cost!) by a market intelligence
company. The information comes as a CSV file (Comma Separated Value file) attachment
to an email at the end of each quarter. For example, the attachment might look like this:
Year 2002
Quarter 1 Value
Timber 250,000
Building 485,200
Hardware 94,500
In data format in the CSV file will look like this:
Year,Quarter,ProdGrp,Value
2002,1,"Timber",250000
2002,1,"Building",485200
2002,1,"Hardware",94500
Inspect the following star structure to see if it will give you the desired answer:

1 of 3
We will now consider the ETL process and design the Staging Area. The process needs
to capture data from the LotaDoors system, incorporate the market sector data, cleanse,
merge, transform and load the data; the result of all this will be the populated star.

Exercise
Task 1: Identify the original data sources. For each of the required data sources
(LotaDoors, market data), show the source and the column names that you need
to extract and process.
Task 2: For each original data source, create a named Staging Area table and show the
columns it contains. This is the start of a processing stream to transform that
table's data.
Task 3: For each table in the final star, create a named Star table and show the columns
it contains.
Task 4: For each table in the final star, create a named Staging Area table and show the
columns it contains.
Task 5: Assume that you are designing the Staging Area to be used for the first time, ie
there is no existing data in the Staging Area or final Star. For each processing
stream, identify validation tests that should be carried out on the data. For each
test identify possible data errors. Show new named tables and their columns,
one each to hold valid and erroneous data (assuming there can be errors)
resulting from the validation. Show where erroneous data is returned into the
main processing stream when it has been corrected.
There will be situations where processing streams merge and/or terminate
having satisfied all their processing needs. Show new named tables and their
columns for all such merging of streams.
All processing streams should either terminate or contribute to the population of
one or more of the final Staging Area tables (as created in Task 4).
Note- for space reasons you will have to omit many of the error tables and
corrective loops. It is important that you still realise their role in the processing!
Task 6: Identify the procedures related to and the sequence of loading the Star tables
from the final Staging Area tables. You should carry out some research into this
aspect.
Task 7: With reference to each of the validation tests identified in Task 5, what action
may be possible in the original systems to remove or reduce the incidence or
errors, and what corrective action should be taken with the current data (that
transferred into the error table)?
Task 8: Identify how the Staging Area and ETL processes will need to change to process
Quarter 2 data, ie when data already exists within the Star.
Task 9: What are the implications to Staging Area and ETL processes of incorporating an
ODS?
Task 10: Now download and inspect the LotaMerge database. Identify the new fact table
that has been created and how the Data Mart has been modified to deal with this
new table. Also download the CVS files and look at their structure. There is one
for each quarter of 2002.

2 of 3
Below is the entity-relationship diagram for the modified LotaStar (LotaMerge) Data Mart.

The major point that we are illustrating in this Exercise is that data Extraction,
Transformation and Loading is a complex exercise. Some authors say that it takes
70% of the budget of a data warehousing project. Unfortunately, it has to be done, even
for small projects. It isn't a project phase that can easily be shortened.

3 of 3

Designing a Data Staging Area for ETL
No ratings yet
Designing a Data Staging Area for ETL
4 pages
ETL Data Structures Overview
No ratings yet
ETL Data Structures Overview
31 pages
Key Components of Data Warehousing
No ratings yet
Key Components of Data Warehousing
22 pages
Staging Area in Data Warehousing
No ratings yet
Staging Area in Data Warehousing
3 pages
Unit III DWM
No ratings yet
Unit III DWM
13 pages
Advanced ETL Techniques for DWH
No ratings yet
Advanced ETL Techniques for DWH
46 pages
ETL Process in Data Warehousing Explained
No ratings yet
ETL Process in Data Warehousing Explained
29 pages
Outline: ETL Extraction Transformation Loading
No ratings yet
Outline: ETL Extraction Transformation Loading
38 pages
ETL Process for Data Warehouse Integration
No ratings yet
ETL Process for Data Warehouse Integration
45 pages
Data Warehousing: Staging & Storage Insights
No ratings yet
Data Warehousing: Staging & Storage Insights
11 pages
ETL Process in Data Warehousing
No ratings yet
ETL Process in Data Warehousing
37 pages
Module 3
No ratings yet
Module 3
30 pages
Lect#4
No ratings yet
Lect#4
27 pages
Unique Constraints in Partitioned Tables
No ratings yet
Unique Constraints in Partitioned Tables
38 pages
Test 8
No ratings yet
Test 8
6 pages
ETL Process in Data Warehouse: Chirayu Poundarik
No ratings yet
ETL Process in Data Warehouse: Chirayu Poundarik
40 pages
ETL System Design for Data Warehousing
No ratings yet
ETL System Design for Data Warehousing
41 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
ETL Process for Data Warehousing Guide
67% (3)
ETL Process for Data Warehousing Guide
40 pages
Explaining ETL Projects in Interviews
No ratings yet
Explaining ETL Projects in Interviews
23 pages
Data Warehousing Concepts and Architectures
No ratings yet
Data Warehousing Concepts and Architectures
32 pages
ETL Power Point Presentation
No ratings yet
ETL Power Point Presentation
40 pages
New DM
No ratings yet
New DM
47 pages
Understanding ETL Staging Areas
No ratings yet
Understanding ETL Staging Areas
10 pages
ETL Process in Data Warehousing
No ratings yet
ETL Process in Data Warehousing
80 pages
Business Intelligence: Data Warehouse Overview
No ratings yet
Business Intelligence: Data Warehouse Overview
33 pages
Data Warehouse vs Data Mart Overview
No ratings yet
Data Warehouse vs Data Mart Overview
31 pages
Data Processing and Transformation Guide
No ratings yet
Data Processing and Transformation Guide
31 pages
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
34 pages
Need For ETL in DW - V1
No ratings yet
Need For ETL in DW - V1
29 pages
Data Warehousing Concepts and Architectures
No ratings yet
Data Warehousing Concepts and Architectures
32 pages
ETL Startegy To Store Data Validation Rules
No ratings yet
ETL Startegy To Store Data Validation Rules
7 pages
ETL Basics and Testing Guide
No ratings yet
ETL Basics and Testing Guide
19 pages
Comprehensive Guide to ETL Process
No ratings yet
Comprehensive Guide to ETL Process
7 pages
Session5 6 Etl
No ratings yet
Session5 6 Etl
22 pages
Data Warehouse ETL Testing Strategies
No ratings yet
Data Warehouse ETL Testing Strategies
5 pages
Datastage FAQs and Interview Questions
No ratings yet
Datastage FAQs and Interview Questions
13 pages
ETL Process for Data Warehouse Integration
No ratings yet
ETL Process for Data Warehouse Integration
33 pages
Bases de Dados e Armazéns de Dados: Bibliography
No ratings yet
Bases de Dados e Armazéns de Dados: Bibliography
11 pages
ETL Framework Methodology: Prepared by Michael Favero & Kenneth Bland
No ratings yet
ETL Framework Methodology: Prepared by Michael Favero & Kenneth Bland
19 pages
ETL Process Overview for Data Warehousing
No ratings yet
ETL Process Overview for Data Warehousing
26 pages
Understanding ETL: Process & Importance
No ratings yet
Understanding ETL: Process & Importance
26 pages
ETLDesignMethodologyDocument SSIS PDF
No ratings yet
ETLDesignMethodologyDocument SSIS PDF
12 pages
ETLDesignMethodologyDocument SSIS
100% (1)
ETLDesignMethodologyDocument SSIS
12 pages
Basics of Data Integration
100% (1)
Basics of Data Integration
61 pages
DWH 04
No ratings yet
DWH 04
18 pages
DW Course
No ratings yet
DW Course
7 pages
DataStage Job Performance Optimization Guide
No ratings yet
DataStage Job Performance Optimization Guide
74 pages
ETL Interview Question Basic
No ratings yet
ETL Interview Question Basic
10 pages
Understanding ETL in Data Warehousing
No ratings yet
Understanding ETL in Data Warehousing
16 pages
Data Pipeline Design for ETL/ELT
No ratings yet
Data Pipeline Design for ETL/ELT
36 pages
Microsoft Official Course: Designing An ETL Solution
No ratings yet
Microsoft Official Course: Designing An ETL Solution
25 pages
ETL Process in Data Warehousing Explained
No ratings yet
ETL Process in Data Warehousing Explained
21 pages
Data Integration Best Practices
No ratings yet
Data Integration Best Practices
17 pages
Comprehensive ETL and Data Warehousing Guide
100% (1)
Comprehensive ETL and Data Warehousing Guide
4 pages
BFC5935 Portfolio Management and Theory: Learning Outcomes
No ratings yet
BFC5935 Portfolio Management and Theory: Learning Outcomes
7 pages
Electricity Company of Ghana Limited: Employee Payslip
No ratings yet
Electricity Company of Ghana Limited: Employee Payslip
1 page
JocelynLingJieYing Category Management
No ratings yet
JocelynLingJieYing Category Management
1 page
M11 - Vol 2 - 2nd Ed
No ratings yet
M11 - Vol 2 - 2nd Ed
946 pages
Postal Customs Guide for Staff
No ratings yet
Postal Customs Guide for Staff
64 pages
Futures and Forward Price Determination
No ratings yet
Futures and Forward Price Determination
10 pages
Management Control Systems - Anagement Control Systems)
No ratings yet
Management Control Systems - Anagement Control Systems)
27 pages
ESG Sustainability Consultant Role
No ratings yet
ESG Sustainability Consultant Role
3 pages
Si TBK Annual Sustainability Report 2024
100% (1)
Si TBK Annual Sustainability Report 2024
231 pages
Accounting P2 QP - Final
No ratings yet
Accounting P2 QP - Final
8 pages
Competitive Market Firm Behavior Explained
No ratings yet
Competitive Market Firm Behavior Explained
35 pages
Lap Keu - PT CENTEX TBK - 2019
No ratings yet
Lap Keu - PT CENTEX TBK - 2019
146 pages
Statistical Tools for Quality Control
No ratings yet
Statistical Tools for Quality Control
12 pages
06 Human Resources Management 28720713
No ratings yet
06 Human Resources Management 28720713
58 pages
Budgeting and Control in Cost Accounting
No ratings yet
Budgeting and Control in Cost Accounting
7 pages
Servicenow Samnew
No ratings yet
Servicenow Samnew
4 pages
Micron PC
No ratings yet
Micron PC
19 pages
!A7./9 DPWH: 0/ 23. 'Ld'la
No ratings yet
!A7./9 DPWH: 0/ 23. 'Ld'la
3 pages
OxfordNovember Abstract Handbook 2015
No ratings yet
OxfordNovember Abstract Handbook 2015
37 pages
Himani Mishra - CV - MARCOM
No ratings yet
Himani Mishra - CV - MARCOM
3 pages
Managerial Economics Overview and Applications
No ratings yet
Managerial Economics Overview and Applications
21 pages
Siem Offshore 2015 Annual Report
No ratings yet
Siem Offshore 2015 Annual Report
104 pages
Organisation and Human Resource
No ratings yet
Organisation and Human Resource
10 pages
Book
No ratings yet
Book
2 pages
Understanding Contracts of Guarantee
100% (1)
Understanding Contracts of Guarantee
54 pages
Evelyne Huber-Models of Capitalism - Lessons For Latin America (2002)
No ratings yet
Evelyne Huber-Models of Capitalism - Lessons For Latin America (2002)
501 pages
Q3 - 2025 Tli
No ratings yet
Q3 - 2025 Tli
19 pages
Employee Referral Bonus Guide
No ratings yet
Employee Referral Bonus Guide
1 page
Business Managers' Guide to VaR
No ratings yet
Business Managers' Guide to VaR
7 pages
Franchise Contract
No ratings yet
Franchise Contract
11 pages

Data Staging

Uploaded by

Data Staging

Uploaded by

Data Staging

You might also like