Imran Introduction To DWH-5
Imran Introduction To DWH-5
T TRANSFORM TRANSFORM
DATA
ETL
ETL Requirements
• Full Extraction
– In this method, data is completely extracted from
the source system. Since it is complete extraction,
so no need to track source system for changes
• Incremental Extraction
– In incremental extraction, the changes in source
data need to be tracked since the last successful
extraction.
– Only these changes in data will be extracted and
then loaded.
Physical Extraction
• Online Extraction
– In this process, extraction process directly connect
to the source system and extract the source data.
• Offline Extraction
– The data is not extracted directly from the source
system but is staged explicitly outside the original
source system.
Transform
Data Transformation
• This step include cleaning, filtering, validating and
applying rules to extracted data
• The main objective of this step is to load the
extracted data into target database with clean and
general format
• This is because we extract data from various sources
and each have their own format
Transform
Transformation Tasks
• Cleaning (Male to ‘M’ and female to ‘F’), (Null to 0),
Date Format Consistency
• Duplication (Removing Duplicate Records)
• Format revision: Character set conversion, unit of
measurement conversion, date/time conversion, etc.
• Key restructuring: Establishing key relationships
across tables
Transform
Transformation Tasks
• Joining Linking data from multiple sources – for
example, adding ad spend data across multiple
platforms, such as Google Adwords and Facebook
Ads.
• Data validation Simple or complex data validation –
for example, if the first three columns in a row are
empty then reject the row from processing
Transform
Transformation Tasks
• Summarization: Values are summarized to obtain
total figures which are calculated and stored at
multiple levels as business metrics – for example,
adding up all purchases a customer has made to
build a customer lifetime value (CLV) metric.
• Integration: Give each unique data element one
standard name with one standard definition. Data
integration reconciles different data names and
values for the same data element.
Transform
Transformation Tasks
• Derivation Applying business rules to your data that
derive new calculated values from existing data – for
example, creating a revenue metric that subtracts taxes
• Filtering (Selection only certain column to load)
• Enrichment (Full name to ‘first name’, ‘middle name’,
‘last name’)
• Splitting (Splitting one column into multiple column)
• Joining (together data from multiple sources)
In some cases data does not need transformation and
this type of data is called rich data
Loading
• Initial load:
– Populating all the data warehouse table for the first
time
• Incremental load:
– Applying ongoing changes as necessary in a periodic
manner
• Full Refresh:
– Completely erasing the contents of one or more table
and reloading with fresh data
Home Assignment
Home Assignment