DM104 - Evaluation of Business Performance
DM104 - Evaluation of Business Performance
Welcome Notes:
WELCOME BSIS STUDENTS!
I. INTRODUCTION:
This module contains the lesson about extraction, transform, and load process. ETL process is used to
synthesize data from multiple sources many times to build a Data Warehouse, Data Hub, or Data Lake. This
module will help you to understand the process of transmitting data from the source database to the
destination data warehouse.
II. OBJECTIVES:
Before you proceed to the main lesson, test yourself in this activity.
GREAT!!!
You may now proceed to the main lesson.
Based on the preliminary activities, what did you notice about it?
________________________________________________________
CONGRATULATIONS!
You may now proceed to the lesson.
To maintain its value as a tool for decision-makers, Data warehouse technique needs to change with
business changes. ETL is a recurring method (daily, weekly, monthly) of a Data warehouse system and
needs to be agile, automated, and well documented.
Data is extracted from online transaction processing (OLTP) databases, today more commonly known
just as 'transactional databases', and other data sources. OLTP applications have high throughput, with large
numbers of read and write requests. They do not lend themselves well to data analysis or business
intelligence tasks. Data is then transformed in a staging area. These transformations cover both data
cleansing and optimizing the data for analysis. The transformed data is then loaded into an online analytical
processing (OLAP) database, today more commonly known as just an analytics database.
Business intelligence (BI) teams then run queries on that data, which are eventually presented to end
users, or to individuals responsible for making business decisions, or used as input for machine learning
algorithms or other data science projects. One common problem encountered here is if the OLAP summaries
can’t support the type of analysis the BI team wants to do, then the whole process needs to run again, this
time with different transformations.
Another is the rapid shift to cloud-based SaaS applications that now house significant amounts of
business-critical data in their own databases, accessible through different technologies such as APIs and
webhooks. Also, data today is frequently analyzed in raw form rather than from preloaded OLAP summaries.
This has led to the development of lightweight, flexible, and transparent ETL systems with processes that
look something like this:
The biggest advantage to this setup is that transformations and data modeling happen in the analytics
database, in SQL. This gives the BI team, data scientists, and analysts greater control over how they work
with it, in a common language they all understand.
ETL Phases
ETL consists of three separate phases:
Extraction
Extraction is the operation of extracting information from a source system for further use in a data
warehouse environment. This is the first stage of the ETL process.
Extraction process is often one of the most time-consuming tasks in the ETL.
The source systems might be complicated and poorly documented, and thus determining which data
needs to be extracted can be difficult.
The data has to be extracted several times in a periodic manner to supply all changed data to the
warehouse and keep it up-to-date.
DM104 – Evaluation of Business Performance Page 6 of 15
Extract, Transform, and Load Process
Cleansing
The cleansing stage is crucial in a data warehouse technique because it is supposed to improve data
quality. The primary data cleansing features found in ETL tools are rectification and homogenization. They
use specific dictionaries to rectify typing mistakes and to recognize synonyms, as well as rule-based
cleansing to enforce domain-specific rules and defines appropriate associations between values.
The following examples show the essential of data cleaning:
If an enterprise wishes to contact its users or its suppliers, a complete, accurate and up-to-date list of
contact addresses, email addresses and telephone numbers must be available.
If a client or supplier calls, the staff responding should be quickly able to find the person in the enterprise
database, but this need that the caller's name or his/her company name is listed in the database.
If a user appears in the databases with two or more slightly different names or different account numbers,
it becomes difficult to update the customer's information.
Transformation
Transformation is the core of the reconciliation phase. It converts records from its operational source
format into a particular data warehouse format. If we implement a three-layer architecture, this phase outputs
our reconciled data layer.
The following points must be rectified in this phase:
Loose texts may hide valuable information. For example, XYZ PVT Ltd does not explicitly show that
this is a Limited Partnership company.
Different formats can be used for individual data. For example, data can be saved as a string or as
three integers.
Following are the main transformation processes aimed at populating the reconciled data layer:
Conversion and normalization that operate on both storage formats and units of measure to make
data uniform.
Matching that associates equivalent fields in different sources.
Selection that reduces the number of source fields and records.
Cleansing and Transformation processes are often closely linked in ETL tools.
DM104 – Evaluation of Business Performance Page 7 of 15
Extract, Transform, and Load Process
Loading
The Load is the process of writing the data into the
target database. During the load step, it is necessary to
ensure that the load is performed correctly and with as little
resources as possible.
Loading can be carried in two ways:
1. Refresh: Data Warehouse data is completely
rewritten. This means that older file is replaced.
Refresh is usually used in combination with static
extraction to populate a data warehouse initially.
2. Update: Only those changes applied to source
information are added to the Data Warehouse. An
update is typically carried out without deleting or
modifying preexisting data. This method is used in
combination with incremental extraction to update
data warehouses regularly.
ETL vs ELT
Extract, Transform and Load (ETL) is the technique of extracting the record from sources (which is
present outside or on-premises, etc.) to a staging area, then transforming or reformatting with business
manipulation performed on it in order to fit the operational needs or data analysis, and later loading into the
goal or destination databases or data warehouse.
Strengths
Development Time: Designing from the output backwards provide that only information applicable to
the solution is extracted and processed, potentially decreasing development, delete, and processing
overhead.
DM104 – Evaluation of Business Performance Page 8 of 15
Extract, Transform, and Load Process
Targeted data: Due to the targeted feature of the load process, the warehouse contains only information
relevant to the presentation. Reduced warehouse content simplify the security regime enforce and hence
the administration overheads.
Tools Availability: The number of tools available that implement ETL provides the flexibility of approach
and the opportunity to identify the most appropriate tool. The proliferation of tools has to lead to a
competitive functionality war, which often results in loss of maintainability.
Weaknesses
Flexibility: Targeting only relevant information for output means that any future requirements that may
need data that was not included in the original design will need to be added to the ETL routines. Due to
the nature of tight dependency between the methods developed, this often leads to a need for
fundamental redesign and development. As a result, this increase the time and cost involved.
Hardware: Most third-party tools utilize their engine to implement the ETL phase. Regardless of the
estimate of the solution, this can necessitate the investment in additional hardware to implement the
tool's ETL engine. The use of third-party tools to achieve the ETL process compels the information of
new scripting languages and processes.
Learning Curve: Implementing a third-party tools that uses foreign processes and languages results in
the learning curve that is implicit in all technologies new to an organization and can often lead to
consecutive blind alleys in their use due to shortage of experience.
Extract, Load and Transform (ELT) is the various sight while looking at data migration or movement.
ELT involves the extraction of aggregate information from the source system and loading to the target method
instead of transformation between the extraction and loading phase. Once the data is copied or loaded into
the target method, then change takes place.
DM104 – Evaluation of Business Performance Page 9 of 15
Extract, Transform, and Load Process
The extract and load step can be isolated from the transformation process. Isolating the load phase from
the transformation process delete an inherent dependency between these phases. In addition to containing
the data necessary for the transformations, the extract and load process can include components of data that
may be essential in the future. The load phase could take the entire source and loaded it into the warehouses.
Separating the phases enables the project to be damaged down into smaller chunks, thus making it more
specific and manageable.
Performing the data integrity analysis in the staging method enables a further phase in the process
to be isolated and dealt with at the most appropriate point in the process. This method also helps to ensure
that only cleaned and checked information is loaded into the warehouse for transformation. Isolating the
transformations from the load steps helps to encourage a more staged way to the warehouse design and
implementation.
Strengths
Project Management: Being able to divide the warehouse method into specific and isolated functions,
enables a project to be designed on a smaller function basis, therefore the project can be broken down
into feasible chunks.
Flexible & Future Proof: In general, in an ELT implementation, all record from the sources are loaded
into the data warehouse as part of the extract and loading process. This, linked with the isolation of the
transformation phase, means that future requirements can easily be incorporated into the data
warehouse architecture.
Risk minimization: Deleting the close interdependencies between each technique of the warehouse
build system enables the development method to be isolated, and the individual process design can thus
also be separated. This provides a good platform for change, maintenance and management.
Utilize Existing Hardware: In implementing ELT as a warehouse build process, the essential tools
provided with the database engine can be used.
Utilize Existing Skill sets: By using the functionality support by the database engine, the existing
investment in database functions are re-used to develop the warehouse. No new skills need to be
learned, and the full weight of the experience in developing the engine’s technology is utilized, further
reducing the cost and risk in the development process.
DM104 – Evaluation of Business Performance Page 10 of 15
Extract, Transform, and Load Process
Weaknesses
Against the Norm: ELT is a new method to data warehouse design and development. While it has
proven itself many times over through its abundant use in implementations throughout the world, it does
require a change in mentality and design approach against traditional methods.
Tools Availability: Being an emergent technology approach, ELT has limited availability of tools.
1. The mechanism of extracting information from source systems and bringing it into the data warehouse
is commonly called __________.
2. __________ is extracted from online transaction processing (OLTP) databases, today more commonly
known just as 'transactional databases', and other data sources.
3. __________ is the operation of extracting information from a source system for further use in a data
warehouse environment.
4. __________ stage is crucial in a data warehouse technique because it is supposed to improve data
quality.
5. __________ is the core of the reconciliation phase.
6. __________is the process of writing the data into the target database.
7. __________ involves the extraction of aggregate information from the source system and loading to
the target method instead of transformation between the extraction and loading phase.
8. __________ is usually used in combination with static extraction to populate a data warehouse initially.
9. __________ is typically carried out without deleting or modifying preexisting data.
10. Transformations are performed in ETL __________.
VI. GENERALIZATION
2. Transform
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
3. Load
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
KUDOS!
You have come to an end of Module 3.
OOPS! Don’t forget that you have still an assignment to do.
Here it is….
DM104 – Evaluation of Business Performance Page 13 of 15
Extract, Transform, and Load Process
VII. ASSIGNMENT
After your long journey of reading and accomplishing the module, let us now
challenge your mind by answering the evaluation part of this module.
DM104 – Evaluation of Business Performance Page 14 of 15
Extract, Transform, and Load Process
VIII. EVALUATION