0% found this document useful (0 votes)
11 views3 pages

Module 6_ETL (Extraction, Transformation, Loading)

The document provides an overview of the ETL (Extraction, Transformation, Loading) process, emphasizing its importance in data warehousing and the steps involved: extraction from various data sources, transformation to fit the data warehouse schema, and loading into the warehouse. It highlights common ETL tools, both commercial and open-source, and stresses that ETL is an ongoing process that must be well-documented and automated. The document also notes that underestimating the complexities of ETL can lead to failures in data warehousing efforts.

Uploaded by

Dom Balseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

Module 6_ETL (Extraction, Transformation, Loading)

The document provides an overview of the ETL (Extraction, Transformation, Loading) process, emphasizing its importance in data warehousing and the steps involved: extraction from various data sources, transformation to fit the data warehouse schema, and loading into the warehouse. It highlights common ETL tools, both commercial and open-source, and stresses that ETL is an ongoing process that must be well-documented and automated. The document also notes that underestimating the complexities of ETL can lead to failures in data warehousing efforts.

Uploaded by

Dom Balseen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Lecture Notes in

Professorial Lecturer: Module 6_ETL (Extraction, Data Warehousing


Dr. Domingo T. Balse, Jr, LPT Transformation, Loading)

ETL (Extraction, Transformation, Loading)


1. Learning outcomes
• Discuss the steps in ETL
• Identify instances where ETL would be necessary in an organization

2. Games
-related to the topic

3. Introduction
We have also learned that the Data Track stream in the Kimball Lifecycle involves the ETL
(Extraction, Transformation, Loading) process. We will take a closer look at the process involved
here in this module.

4. ETL (Extraction, Transformation, Loading)


ETL is mostly done by business analytics people following an information technology
track. However, it is useful for managers to know what happens during ETL.
The objective of ETL is to get data out of the source and load it into the data warehouse. It
is simply a process of copying data from one database to other. Data is extracted from a
database, transformed to match the data warehouse schema and loaded into the data
warehouse database. When defining ETL for a data warehouse, it is important to think of ETL as
a process, not a physical implementation.
The process is usually handled using Structured Query Language (SQL) scripts, a
special-purpose programming language designed for managing data held in a relational
database.
In extraction, data is extracted from heterogeneous data sources. Each data source has its
distinct set of characteristics that need to be managed and integrated into the ETL system in
order to effectively extract data. This is usually done using SQL Select Statements.
Transformation is the main step where the ETL adds value. It changes data and provides
guidance whether data can be used for its intended purposes. For example, "Male" is changed to
"M" and "Yes" is changed to "1". This is performed in a staging area.
Finally, in loading, data is then loaded into data warehouse tables. Here, surrogate keys
are created and assigned. The process is usually done using Insert SQL Statements.
ETL is often a major failure point in data warehousing because the effort involved in the ETL
process is underestimated. Underestimating data quality problems and providing for contextual
history are also prime culprits for this. The ETL process should therefore not be taken for
granted.
It should be noted that ETL is not a one time event as new data is added to the data
warehouse periodically - monthy, daily, or hourly. Because ETL is an integral, ongoing, and
recurring part of a data warehouse it is automated, well-documented, and is easily changeable.

Page 1 of 3
Lecture Notes in
Professorial Lecturer: Module 6_ETL (Extraction, Data Warehousing
Dr. Domingo T. Balse, Jr, LPT Transformation, Loading)
Several companies have strong ETL tools and a fairly complete suite of supplementary
tools. There are three general types of Source to Target Tools:
1. Code generators - These actually compile ETL code, typically COBOL which is used by
several large companies that use mainframe.
2. Engine based - These have easy-to-use graphic interfaces and interpreter style programs.
3. Database based - These involve manual coding using SQL statements augmented by scripts.

Well known ETL tools are the following:


1. Commercial
a. Ab initio
b. IBM DAtaStage
c. Informatica PowerCenter
d. Microsoft Data Integration Services
e. Oracle Data Integrator
f. SAP Business Objects - Data Integrator
g. SAS Data Integration Studio
2. Open-Source Based. Adeptia Integration Suite
a. Apatar
b. CloverETL
c. Pentaho Data Integration (Kettle)
d. Talend Open Studio/Integration Suite
e. R/R Studio

Take note that the "best" tool does not exist. You will have to choose based on your own needs.
You should also check first if the standard tools from the big vendors are alright.

Page 2 of 3
Lecture Notes in
Professorial Lecturer: Module 6_ETL (Extraction, Data Warehousing
Dr. Domingo T. Balse, Jr, LPT Transformation, Loading)

5. Quiz / Activity

References

Book References:
Corr, Lawrence & Jim Stagnitto (2011). Agile Data Warehouse Design: Collaborative Dimensional
Modeling, from Whiteboard to Star Schema
Jarke , Matthias, Maurizio Lenzerini , Yannis Vassiliou & Panos Vassiliadis (2003). Fundamentals
of Data Warehouses. Springer Berlin Heidelberg Publishing. ISBNs 978-3-54-042089-7,
978-3-64-207564-3, 978-3-66-205153-5. DOI 10.1007/978-3-662-05153-5
Jukic,Nenad, Susan Vrbsky & Svetlozar Nestorov (2016). Database Systems: Introduction to
Databases and Data Warehouses.
Kimball, Ralp (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling, 3rd Edition
Linstedt, Daniel & Michael Olschimke (2015). Building a Scalable Data Warehouse with Data
Vault 2.0
Ponniah, Paulraj (2001). Data Warehousing Fundamentals: A Comprehensive Guide for IT
Professionals, 1st Edition. Wiley-Interscience Publishing

Page 3 of 3

You might also like