0% found this document useful (0 votes)
56 views31 pages

Data Warehousing: Defined and Its Applications

The document defines a data warehouse as an architectural model that gathers data from various operational sources into a centralized database designed for analysis. It describes the key components of a data warehouse including hardware, database management software, front-end tools, and how data is extracted, transformed, and loaded from source systems. The benefits of a data warehouse for analysis are provided, along with some potential costs to consider for implementation and maintenance.

Uploaded by

Indumathi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views31 pages

Data Warehousing: Defined and Its Applications

The document defines a data warehouse as an architectural model that gathers data from various operational sources into a centralized database designed for analysis. It describes the key components of a data warehouse including hardware, database management software, front-end tools, and how data is extracted, transformed, and loaded from source systems. The benefits of a data warehouse for analysis are provided, along with some potential costs to consider for implementation and maintenance.

Uploaded by

Indumathi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Warehousing: Defined

and Its Applications

Pete Johnson

April 2002
Introduction
• The Past and The Problem
• What is a Data Warehouse?
• Components of a Data Warehouse
• OLAP, Metadata, Data Mining
• Getting the Data in
• Benefits vs. Costs
• Conclusion & Questions
The Past and The Problem
• Only had scattered transactional systems in the
organization – data spread among different
systems
• Transactional systems were not designed for
decision support analysis
• Data constantly changes on transactional systems
• Lack of historical data
• Often resources were taxed with both needs on the
same systems
The Past and The Problem
• Operational databases are designed to keep
transactions from daily operations. It is
optimized to efficiently update or create
individual records
• A database for analysis on the other hand
needs to be geared toward flexible requests
or queries (Ad hoc, statistical analysis)
What is a Data Warehouse?
Data warehousing is an architectural model
designed to gather data from various
sources into a single unified data model for
analysis purposes.
What Is a Data Warehouse?
Term was introduced in 1990 by William
Immon
A managed database in which the data is:
• Subject Oriented
• Integrated
• Time Variant
• Non Volatile
Subject Oriented
• Organized around major subject areas in the
enterprise (Sales, Inventory, Financial, etc.)
• Only includes data which is used in the
decision making processes
• Elements used for transactional processing are
removed
Integrated
• Data from different sources are brought
together and consolidated
• The data is cleaned and made consistent

Example – Bank Systems using Different Codes


Loan Department – COMM
Transactional System - C
Time Variant
• Data in a Data Warehouse contains both
current and historical information
• Operational Systems contain only current
data
Systems typically retain data:
Operational Systems – 60 to 90 Days
Data Warehouse – 5 to 10 Years
Non Volatile
• Operational systems have continually
changing data
• Data Warehouses continually absorb
current data and integrates it with its
existing data (Aggregate or Summary
tables)
Example of volatile data would be an account balance at a bank
What Is a Data Warehouse?
• Not a product, it is a process
• Combination of hardware and software
• Concept of a Data Warehouse is not new,
but the technology that allows it is
What Is a Data Warehouse?
Can often be set up as one VLDB (Very
Large Database) or a collection of subject
areas called Data Marts.

There are now tools which “unify” these Data


Marts and make it appear as a single
database.
What Is a Data Warehouse?
Transformation of Data to Information

Information
Exploration / Analysis

SQL reporting

Relational Warehouse

Cleansing / & Normalization

Data Transaction Processing


Components of a Data Warehouse

Four General Components:


• Hardware
• DBMS - Database Management System
• Front End Access Tools
• Other Tools
In all components scalability is vital
Scalability is the ability to grow as your data and processing needs
increase
Components of a Data Warehouse -
Hardware
• Power - # of Processors, Memory, I/O Bandwidth,
and Speed of the Bus
• Availability – Redundant equipment
• Disk Storage - Speed and enough storage for the
loaded data set
• Backup Solution - Automated and be able to allow
for incremental backups and archiving older data
Components of a Data Warehouse -
DBMS
• Physical storage capacity of the DBMS
• Loading, indexing, and processing speed
• Availability
• Handle your data needs
• Operational integrity, reliability, and
manageability
Components of a Data Warehouse -
Front End & Other Tools
• Query Tools (SQL & GUI based)
• Report Writers
• Metadata Repositories
• OLAP (Online Analytical Processing)
• Data Mining Products
Components of a Data Warehouse –
Metadata Repositories
Metadata is Data about Data. Users and Developers
often need a way to find information on the data
they use. Information can include:
• Source System(s) of the Data, contact information
• Related tables or subject areas
• Programs or Processes which use the data
• Population rules (Update or Insert and how often)
• Status of the Data Warehouse’s processing and
condition
Components of a Data Warehouse –
OLAP Tools
OLAP - Online Analytical Processing. It works by
aggregating detail data and looks at it by dimensions
• Gives the ability to “Drill Down” in to the detail data
• Decision Support Analysis Tool
• Multidimensional DB focusing on retrieval of
precalculated data
• Ends the “big reports” with large amounts of detailed data
• These tools are often graphical and can run on a “thin
client” such as a web browser
Components of a Data Warehouse –
Data Mining
• Answers the questions you didn’t know to
ask
• Analyzes great amounts of data (usually
contained in a Data Warehouse) and looks
for trends in the data
• Technology now allows us to do this better
than in the past
Components of a Data Warehouse –
Data Mining
• Most famous example is the Huggies -
Heineken case
• Used in Retail sector to analyze buying
habits
• Used in financial areas to detect fraud
• Used in the stock market to find trends
• Used in scientific research
• Used in national security
Getting the Data In

• Data will come from multiple databases and


files within the organization
• Also can come from outside sources
• Examples:
• Weather Reports

• Demographic information by Zip Code


Getting the Data In

Three Steps :

1. Extraction Phase

2. Transformation Phase

3. Loading Phase
Getting the Data In

Extraction Phase:
• Source systems export data via files or
populates directly when the databases can
“talk” to each other
• Transfers them to the Data Warehouse
server and puts it into some sort of staging
area
Getting the Data In

Transformation Phase:
• Takes data and turns it into a form that is suitable
for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
Getting the Data In

Loading Phase:
• Places the cleaned data into the DBMS in
its final, useable form
• Compare data from source systems and the
Data Warehouse
• Document the load information for the
users
Benefits vs. Costs
Benefits
• Creates a single point for all data
• System is optimized and designed
specifically for analysis
• Access data without impacting the
operational systems
• Users can access the data directly without
the direct help from IT dept
Costs
• Cost of implementation & maintenance
(hardware, software, and staffing)
• Lack of compatibility between components
• Data from many sources are hard to
combine, data integrity issues
• Bad designs and practices can lead to costly
failures
Conclusion
• What is a Data Warehouse?
• Components of a Data Warehouse
• How the Data Gets In
• OLAP, Metadata, and Data Mining
• Benefits vs. Costs
Questions

You might also like