0% found this document useful (0 votes)
8 views

Topic 8 - Intro to Data Warehouse

Uploaded by

lmelody206
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Topic 8 - Intro to Data Warehouse

Uploaded by

lmelody206
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

SECD2523 DATABASE

TOPIC 8 | INTRODUCTION TO DATA WAREHOUSE

Content adapted from Connolly, T., Begg, C., 2015. Database Systems: A Practical Approach to Design, Implementation, and
Management, Global Edition. Pearson Education.

www.utm.my
LECTURE LEARNING OUTCOME
By the end of this lecture, students should be able to:

01 The main concepts and benefits associated with data warehousing.

How online transaction processing (OLTP) systems differ from a data warehouse.
02

03 The problems associated with data warehousing.

04 The architecture and main components of a data warehouse.

www.utm.my
05 The concept of a data mart and the main reasons for implementing a data mart.
What is Data Warehouse?

• Data warehouse (DW) is an environment, not a


product.
• Data is often scattered across different database, it
need DW to get complete information.
• It is aimed at effective integration of operational
databases that enables strategic use of data.

www.utm.my
Definition: Data Warehouse

• “A subject-oriented, integrated, time-variant, and non-volatile collection


of data in support of management’s decision-making process.” (W. H.
Inmon, 1993)
• “A data warehouse is a single, complete and consistent store of data
obtained from a variety of sources and made available to end users in a
way they can understand and use in a business context.”
• “A data warehouse is a collection of corporate information derived
directly from operational systems and some external data sources.”

www.utm.my
01/24/2025 4
Data Warehousing - Introduction

• Data warehousing integrates data and information collected from various


sources into one comprehensive database.
(e.g. Customer information from organization’s point-of-sale systems, its mailing lists,
website and comment cards, etc.)
• Data warehouse is a centralized storage system or central repository for
storing, analyzing information and interpreting of data in order to facilitate
better decision making.
• A data warehouse is a type of data management system that facilitates and
supports business intelligence (BI) activities, specifically analysis.

www.utm.my
• It is primarily designed to facilitate searches and analyses usually contain large
amounts of historical data.
01/24/2025 5
Data Warehouse Usage - Examples
• Investment & Insurance Companies – to analyze customer & market trends and allied
data patterns.
• Retail Chains – used for marketing and distribution to tract items, examine pricing
policies and analyze buying trends of customers.
• Healthcare – to generate treatment reports, share data with insurance companies &
medical units.
• Airline – operation purpose like crew assignment, route profitability, frequent flyer
program promotions, etc.
• Banking – to manage resources available on desk effectively.
• Public Sector – used for intelligence gathering, to maintain & analyze tax records,
health policy records, etc.

www.utm.my
• Telecommunication – used for product promotions, sales decisions and to make
distribution decisions.
01/24/2025 6
Concepts of Data Warehousing

• Data Integration: Combines data from various sources into a


unified repository.
• Historical Data Storage: Stores data over time, allowing for trend
analysis.
• Analytical Processing: Designed for complex queries, data mining,
and reporting.

www.utm.my
01/24/2025 7
Benefits of Data Warehousing
Successful implementation of data warehouse can bring major benefits
to an organization:
• Improved Decision Making: Provides a single, reliable source of data for
analytics and reporting.
• Enhanced Data Quality and Consistency: Centralized data management
ensures data is cleansed, validated, and standardized.
• High Performance for Queries: Optimized for complex queries and large-
scale data analysis, improving response times.
• Historical Data Analysis: Enables long-term trend analysis and supports
strategic decision-making.

www.utm.my
• Better Data Management: Simplifies data management by consolidating
multiple data sources into one location.
01/24/2025 8
Characteristics of data in DW
The data held in a data warehouse is described as being subject-oriented, integrated,
time-variant, and nonvolatile (Inmon, 1993).

Subject-oriented Integrated Time-variant Non-volatile

• The warehouse is organized • The data warehouse • Data in the warehouse is • Data in the warehouse is not
around the major subjects integrates corporate only accurate and valid at normally updated in real-
of the enterprise (e.g. application-oriented data some point in time or over time (RT) but is refreshed
customers, products, and from different source some time interval. from operational systems
sales) rather than the major systems, which often • Time-variance is also shown on a regular basis. (However,
application areas (e.g. includes data that is in the extended time that emerging trend is towards R
customer invoicing, stock inconsistent. the data is held, the implicit T or near RT DWs)
control, and product sales). • The integrated data source or explicit association of • New data is always added as
• This is reflected in the need must be made consistent to time with all data, and the a supplement to the
to store decision-support present a unified view of the fact that the data represents database, rather than a

www.utm.my
data rather than application- data to the users. a series of snapshots. replacement.
oriented data.

01/24/2025 9
Design of Architecture of a Data Warehouse
• Designed for analytical processing.
• Optimized for complex queries, data aggregation and large-scale
reporting.
• Focuses on historical data and trends.
• A consolidated/integrated view of corporate data drawn from
disparate operational data sources and a range of end-user access
tools capable of supporting simple to highly complex queries to
support decision making.

www.utm.my
01/24/2025 10
Architecture Components of a Data Warehouse

• Data Sources: Include internal databases (OLTP systems), external data


sources, and other data feeds.
• ETL (Extract, Transform, Load) Process:
• Extracts: Gather data from various sources,
• Transform: Converts data into a usable format, ensuring consistency and
accuracy.
• Load: Loads transformed data into the data warehouse.
• Data Storage: Centralized storage area for structured and unstructured
data, often using a star or snowflake schema for organizing data.

www.utm.my
01/24/2025 11
Architecture Components of a Data Warehouse
(cont.)
• Data Marts: Subsets of the data warehouse, optimized for specific
departments or business functions.
• Metadata Management: Manages metadata, which describes the
structure, operations, and contents of the data warehouse.
• Data Access Tools: Include reporting, analysis, data mining, and
visualization tools for end-users.

www.utm.my
01/24/2025 12
Data Warehouse Architecture

www.utm.my
The typical architecture of a data warehouse
01/24/2025 13
Multi-Tiered Architecture of Data Warehouse

www.utm.my
01/24/2025 14
Designing a Data Warehouse
• Bottom-Up Approach

• The Bottom-Up Approach creating small data marts, to solve specific

www.utm.my
business problems.
• As these data marts can be combined into a larger data warehouse.
01/24/2025 15
Designing a Data Warehouse
• Top-Down Approach

• The Top-Down Approach suggests that start by creating an

www.utm.my
enterprise-wide data warehouse and then, as specific business needs
are identified, create smaller data marts.
01/24/2025 16
Data Mart
• A database that contains a subset of corporate data to support
the analytical requirements of a particular business unit (such
as the Sales department) or to support users who share the
same requirements to analyse a particular business process
(such as property sales).
• A smaller, more focused version of a data warehouse tailored
to specific business areas (e.g., sales, finance).

www.utm.my
Reasons for Creating a Data Mart
• Improved Performance: Faster query response times for specific data
sets.
• Cost Efficiency: Reduces storage and processing costs by focusing on
specific data needs.
• Simplified Access: Makes data access easier and more relevant for end-
users (give users access to the data they need to analyze most often).
• Customized data models for specific business requirements (to provide
data in a form that matches the collective view of the data by a group of

www.utm.my
users in a department or business application area).
Designing a Data Warehouse
Dimensionality Modeling:
• A logical design technique that aims to present the data in a standard,
intuitive form that allows for high-performance access.
• Two types in general:
• Star schema
• Snowflake schema
• Fact Table – the primary table in a dimensional model that is meant to
contain measurements of the business.
• Dimension Table – One of a set of companion tables to a fact table. Most

www.utm.my
dimension tables contain many textual attributes that are the basis for
constraining and grouping within data warehouse queries.
01/24/2025 19
Dimensionality Modeling

• Dimension tables usually contain descriptive textual information.


• Dimension attributes are used as the constraints in data
warehouse queries.
• Every dimensional model (DM) is composed of one table with a
composite primary key, called the fact table, and a set of smaller
tables called dimension tables.

www.utm.my
Dimensionality Modeling (cont.)
• Star schema is a logical structure that has a fact table containing factual
data in the center, surrounded by dimension tables containing reference
data, which can be denormalized.
• Star schemas can be used to speed up query performance by
denormalizing reference information into a single dimension table.
• Snowflake schema is a variant of the star schema that has a fact table in
the center, surrounded by normalized dimension tables.
• Starflake schema is a hybrid structure that contains a mixture of star
(denormalized) and snowflake (normalized) dimension tables. Allows

www.utm.my
dimensions to be present in both forms to cater for different query
requirements.
Star Schema

• A dimensional data model that has a fact table in the center,


surrounded by denormalized dimension tables.
• A fact table in the middle connected to be a set of dimension tables.
• Star schema is a logical structure that has a fact table (containing
factual data) in the center, surrounded by denormalized dimension
tables (containing reference data).
• Facts are generated by events that occurred in the past, and are
unlikely to change, regardless of how they are analyzed.

www.utm.my
01/24/2025 22
Example of Star Schema

www.utm.my
01/24/2025 23
Snowflake Schema
• A dimensional data model that has a fact table in the center,
surrounded by normalized dimension tables.
• A refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape
similar to snowflake.
• Bulk of data in data warehouse is in fact tables, which can be
extremely large.
• Important to treat fact data as read-only reference data that will not
change over time.

www.utm.my
• Most useful fact tables contain one or more numerical measures, or
‘facts’ that occur for each record and are numeric and additive.
01/24/2025 24
Example of Snowflake Schema (1)

www.utm.my
01/24/2025 25
Example of Snowflake Schema (2)

www.utm.my
01/24/2025 26
Online Transaction Processing (OLTP)
• Major task of traditional relational DBMS.
• Design for day-to-day transaction processing: purchasing, inventory,
banking, manufacturing, payroll, registration, accounting, etc.
• Optimized for data entry, updating and deletion.
• Focuses on real-time data management.

www.utm.my
01/24/2025 27
Online Analytical Processing (OLAP)
• Original definition - The dynamic synthesis, analysis, and consolidation of large
volumes of multi-dimensional data, Codd (1993).
• Describes a technology that is designed to optimize the storing and querying of
large volumes of multi-dimensional data that is aggregated (summarized) to
various levels of detail to support the analysis of this data.
• Enables users to gain a deeper understanding and knowledge about various
aspects of their corporate data through fast, consistent, interactive access to a
wide variety of possible views of the data.
• Allows users to view corporate data in such a way that it is a better model of the

www.utm.my
true dimensionality of the enterprise.

01/24/2025 28
OLTP vs. OLAP

www.utm.my
01/24/2025 29
Examples of OLAP Applications in Various Functional
Areas

www.utm.my
01/24/2025 30
Comparison of OLTP Systems and Data Warehousing
• A DBMS built for online transaction processing (OLTP) is generally regarded as
unsuitable for data warehousing, because each system is designed with a differing
set of requirements in mind.
• For example, OLTP systems are designed to maximize the transaction processing
capacity, while data warehouses are designed to support ad hoc query processing.

www.utm.my
01/24/2025 31
Data Warehouse Queries
• The types of queries that a data warehouse is expected to answer
ranges from the relatively simple to the highly complex and is
dependent on the type of end-user access tools used.
• End-user access tools include:
• Traditional reporting and query
• OLAP
• Data mining

www.utm.my
Example - Data Warehouse Queries
• What was the total revenue for Scotland in the third quarter of 2001?
• What was the total revenue for property sales for each type of
property in Great Britain in 2000?
• What are the three most popular areas in each city for the renting of
property in 2001 and how does this compare with the figures for the
previous two years?
• What is the monthly revenue for property sales at each branch office,

www.utm.my
compared with rolling 12-monthly prior figures?
Example - Data Warehouse Queries (cont.)
• Which type of property sells for prices above the average selling price
for properties in the main cities of Great Britain and how does this
correlate to demographic data?
• What is the relationship between the total annual revenue generated
by each branch office and the total number of sales staff assigned to
each branch office?

www.utm.my
E T L in Data Warehousing
• All data loaded into the data warehouse would have to be converted to
use this standard format is called Extraction-Transformation-Load
(ETL).
• It is a critical process in data warehousing that involves three main
stages to integrate data from multiple sources into a single, unified
data warehouse.
• The ETL process ensures that the data stored in the warehouse is
accurate, consistent, and ready for analysis.

www.utm.my
ETL in Data Warehousing - Extraction
• Involves gathering data from multiple heterogeneous sources.
• Objective - To capture all necessary data required for analysis while
minimizing the impact on the source systems.
• Targets one or more data sources and these sources typically include OLTP
databases but can also include personal databases and spreadsheets, and
web services and other structured or unstructured data repositories.
• The data sources are normally internal but can also include external
sources such as the systems used by suppliers and/or customers.
• Challenges:

www.utm.my
• Handling data from heterogeneous sources with varying formats and structures.
• Ensuring data integrity and consistency during extraction.
E T L in Data Warehousing – Transformation
• Involve converting the extracted data into a format suitable for analysis in
the data warehouse.
• This stage cleanses, filters, and structures the data to ensure it meets the
quality and format requirements of the data warehouse.
• Objective: To ensure that data is accurate, consistent, and relevant for
decision-making.
• Challenges:
• Managing complex data transformations and maintaining data integrity.
• Ensuring that transformed data maintains its meaning and relevance.

www.utm.my
• Applies a series of rules or functions to the extracted data, which determines how the
data will be used for analysis and can involve transformations such as data summations,
data encoding, data merging, data splitting, data calculations, and creation of surrogate
ETL in Data Warehousing - Loading
• Involves moving the transformed data into the target data warehouse.
• The loading process can be done in batches or as a continuous stream,
depending on the data's volume and the business requirements.
• Objective: To ensure that data is accurately loaded into the data
warehouse without loss or corruption.
• Challenges:
• Ensuring data integrity and avoiding data loss or corruption during loading.
• Minimizing the load's impact on data warehouse performance.

www.utm.my
Problems of Data Warehousing
• Data Integration Complexity: Combining data from various and often incompatible sources
can be challenging and time-consuming.
• High Costs: Requires substantial investment in hardware, software, and ongoing maintenance.
• Data Latency: Data is often not updated in real-time, leading to potential delays in data
availability.
• Data Quality Issues: Inconsistent or poor-quality data from source systems can negatively
impact the accuracy of analysis.
• Scalability Challenges: As data volume grows, the warehouse may require significant
upgrades to handle increased load and maintain performance.
• Maintenance Overhead: Ongoing effort is required to update, manage, and optimize the data
warehouse environment.

www.utm.my
• Security Risks: Large repositories of sensitive data may pose security risks if not adequately
protected.
THANK YOU!

40

You might also like