0% found this document useful (0 votes)
39 views

Imran Introduction To DWH-5

Here are the main mechanisms of data transformation with examples: 1. Data cleaning - Removing special characters, correcting spelling mistakes, standardizing names, addresses, dates, etc. For example, cleaning a name field by removing extra spaces and capitalizing first letters. 2. Data type conversion - Converting data from one type to another like converting a string to a date. For example, converting a birthdate in MM/DD/YYYY format to a date data type. 3. Data validation - Checking for valid values, ranges and formats. For example, validating an age field only accepts numeric values between 0-150. 4. Data aggregation - Summarizing or combining data at different granularities. For example

Uploaded by

imran saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Imran Introduction To DWH-5

Here are the main mechanisms of data transformation with examples: 1. Data cleaning - Removing special characters, correcting spelling mistakes, standardizing names, addresses, dates, etc. For example, cleaning a name field by removing extra spaces and capitalizing first letters. 2. Data type conversion - Converting data from one type to another like converting a string to a date. For example, converting a birthdate in MM/DD/YYYY format to a date data type. 3. Data validation - Checking for valid values, ranges and formats. For example, validating an age field only accepts numeric values between 0-150. 4. Data aggregation - Summarizing or combining data at different granularities. For example

Uploaded by

imran saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ETL

Extract, Transform, Load


ETL
(Extract Transform and Load)

Definition :ETL stands for extract, transform,


load, three database functions that are combined
into one tool to pull data out of one database,
Transform it as per requirements and place it into
another database.
WHAT IS ETL?

E  EXTRACT EXTRACT DATA FROM


DISPARATE SOURCES

T  TRANSFORM TRANSFORM
DATA

L  LOAD LOAD DATA WHERE WE


WANT TO
Why ETL is used?

• Companies need a way to analyze their data for


critical business decisions
• Transactional databases cannot answer complex
business questions
• A data warehouse provides a common data
repository
• ETL provides a method of moving the data from
various sources into a data warehouse
ETL Concept and Usage

• A Company data may be scattered in different


locations in different format. ETL allow you to
– Migrate the data into a data warehouse
– Convert the various formats and types to hold on
to one consistent system
• ETL is a predefined process for accessing and
manipulating source data and loading it into the
target database
ETL Concept and Usage

• ETL can be used for various purposes e.g


ETL Concept and Usage

• Data warehousing, merging, migration in the same


organization
ETL Concept and Usage

• ETL is very effective tool and have central role in


various fields

ETL
ETL Requirements

ETL Architecture must meet the following


requirements Business Requirements
– Business Requirements
– Data Profiling
– Data Security
– Right data at right time
– Archiving the data
– Final End User Delivery interfaces
– Available skills
– Alignment with overall enterprise Architecture
ETL Workflow
ETL Layers
ETL Layers

The three layers involved in an ETL cycle are:


• Staging Layer:
– The staging layer is used to store the data extracted
from different source data systems
• Data Integration layer:
– The integration layer transforms the data from the
staging layer and moves the data to a database
• Access layer:
– Where the data will be queried
– The access layer is used by end users to retrieve the
data for analytical reporting
ETL Process
Data Extraction

Data Extraction from various sources


• The main objective of this step is retrieve all required
data from source system
• Source system can be RDBMS and files (xml, excel
files etc)
• The extraction step should be design in such a way
that it should not have negative affect on the source
system
Data Extraction

• Extraction methods (Two methods)


– Logical Extraction (Further Two methods)
• Full Extraction
• Incremental Extraction

– Physical Extraction (Further Two methods)


• Online Extraction
• Offline Extraction

• The extract step should be designed in a way that it does


not negatively affect the source system in terms of
performance, response time or any other kind of locking
Logical Extraction

• Full Extraction
– In this method, data is completely extracted from
the source system. Since it is complete extraction,
so no need to track source system for changes
• Incremental Extraction
– In incremental extraction, the changes in source
data need to be tracked since the last successful
extraction.
– Only these changes in data will be extracted and
then loaded.
Physical Extraction

• Online Extraction
– In this process, extraction process directly connect
to the source system and extract the source data.
• Offline Extraction
– The data is not extracted directly from the source
system but is staged explicitly outside the original
source system.
Transform

Data Transformation
• This step include cleaning, filtering, validating and
applying rules to extracted data
• The main objective of this step is to load the
extracted data into target database with clean and
general format
• This is because we extract data from various sources
and each have their own format
Transform

• For example there are two sources A and B

• A date format is dd/mm/yyyy

• B date format is yyyy/mm/dd

• In transformation these dates are bring into general


format
Transform

Transformation Tasks
• Cleaning (Male to ‘M’ and female to ‘F’), (Null to 0),
Date Format Consistency
• Duplication (Removing Duplicate Records)
• Format revision: Character set conversion, unit of
measurement conversion, date/time conversion, etc.
• Key restructuring: Establishing key relationships
across tables
Transform

Transformation Tasks
• Joining Linking data from multiple sources – for
example, adding ad spend data across multiple
platforms, such as Google Adwords and Facebook
Ads.
• Data validation Simple or complex data validation –
for example, if the first three columns in a row are
empty then reject the row from processing
Transform

Transformation Tasks
• Summarization: Values are summarized to obtain
total figures which are calculated and stored at
multiple levels as business metrics – for example,
adding up all purchases a customer has made to
build a customer lifetime value (CLV) metric.
• Integration: Give each unique data element one
standard name with one standard definition. Data
integration reconciles different data names and
values for the same data element.
Transform

Transformation Tasks
• Derivation Applying business rules to your data that
derive new calculated values from existing data – for
example, creating a revenue metric that subtracts taxes
• Filtering (Selection only certain column to load)
• Enrichment (Full name to ‘first name’, ‘middle name’,
‘last name’)
• Splitting (Splitting one column into multiple column)
• Joining (together data from multiple sources)
In some cases data does not need transformation and
this type of data is called rich data
Loading

• Data Extracted and transform is of no use until it is


loaded in target database

• In this step the extracted data and transform data is


loaded to target database

• In order to make data load efficiently, it is necessary


to index the database
Loading

Data loading fetches the prepared data, applied it to the


data warehouse and store it in the database

• Initial load:
– Populating all the data warehouse table for the first
time
• Incremental load:
– Applying ongoing changes as necessary in a periodic
manner
• Full Refresh:
– Completely erasing the contents of one or more table
and reloading with fresh data
Home Assignment

Home Assignment

Describe different mechanisms of Transformation with


examples for each.

You might also like