0% found this document useful (0 votes)
17 views

DWH Week 02

The document discusses the key components and defining features of a data warehouse. It distinguishes between how data is stored in operational systems versus a data warehouse. Specifically, it notes that in a data warehouse, data is stored by subject rather than application, is integrated from multiple sources, contains historical as well as current data, and remains static once loaded rather than being constantly updated. The document also contrasts top-down and bottom-up approaches to building a data warehouse.

Uploaded by

Sibtain Tahir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

DWH Week 02

The document discusses the key components and defining features of a data warehouse. It distinguishes between how data is stored in operational systems versus a data warehouse. Specifically, it notes that in a data warehouse, data is stored by subject rather than application, is integrated from multiple sources, contains historical as well as current data, and remains static once loaded rather than being constantly updated. The document also contrasts top-down and bottom-up approaches to building a data warehouse.

Uploaded by

Sibtain Tahir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Warehouse: The Building Blocks

CHAPTER OBJECTIVES

 Review formal definitions of a data warehouse

 Discuss the defining features

 Distinguish between data warehouses and data marts

 Study each component or building block that makes

up a data warehouse

 Introduce metadata and highlight its significance


Introduction
Bill Inmon, considered to be the father of data
warehousing, provides the following definition:
“A Data Warehouse is a subject oriented,
integrated, nonvolatile, and time variant
collection of data in support of management’s
decisions.”

Sean Kelly, defines the data warehouse in the


following way.
The data in the data warehouse is: Separate,
Available, Integrated, Time stamped, Subject
oriented, Nonvolatile and Accessible
Subject Oriented Data
• In operational systems, we store data by
individual applications.
• In the data sets for an order processing
application, we keep the data for that particular
application.
• These data sets provide the data for all the
functions for entering orders, checking stock,
verifying customer’s credit, and assigning the
order for shipment.
• Therefore, the data sets for each application
need to be organized around that specific
application.
Subject Oriented Data
• In striking contrast, in the data warehouse, data
is stored by subjects, not by applications.
• If data is stored by business subjects, what are
business subjects?
• Business subjects differ from enterprise to
enterprise. These are the subjects critical for the
enterprise.
• For a manufacturing company, sales, shipments,
and inventory are critical business subjects.
• For a retail store, sales at the check-out counter
is a critical subject.
distinguishes between how data is stored in operational systems and in the data
warehouse.
Integrated Data
• For proper decision making, you need to pull
together all the relevant data from the various
applications.
• The data in the data warehouse comes from
several operational systems.
• Source data are in different databases, files, and
data segments.
• These are disparate applications, so the
operational platforms and operating systems
could be different.
Integrated Data
• Before the data from various disparate sources
can be usefully stored in a data warehouse, you
have to remove the inconsistencies.
• You have to standardize the various data
elements and make sure of the meanings of data
names in each source application.
• Before moving the data into the data warehouse,
you have to go through a process of
transformation, consolidation, and integration of
the source data.
A simple process of data integration for a banking institution.

The data warehouse is integrated.


Time variant Data
• For an operational system, the stored data contains
the current values.
• Of course, we store some past transactions in
operational systems, but, essentially, operational
systems reflect current information because these
systems support day-to-day current operations.
• On the other hand, the data in the data warehouse is
meant for analysis and decision making.
• If a user is looking at the buying pattern of a specific
customer, the user needs data not only about the
current purchase, but on the past purchases as well.
Time variant Data
• A data warehouse, because of the very nature of
its purpose, has to contain historical data, not
just current values.
• Data is stored as snapshots over past and current
periods.
• Every data structure in the data warehouse
contains the time element.
Time variant Data
• The time-variant nature of the data in a data
warehouse
▫ Allows for analysis of the past
▫ Relates information to the present
▫ Enables forecasts for the future
Nonvolatile Data
• Data from the operational systems are moved
into the data warehouse at specific intervals.
Depending on the requirements of the business.
• Do not usually update the data in the data
warehouse.
• The data in a data warehouse is not as volatile as
the data in an operational database is.
• The data in a data warehouse is primarily for
query and analysis.
Nonvolatile Data

The data warehouse is nonvolatile.


Data Granularity
• In an operational system, data is usually kept at the
lowest level of detail.
• In a point-of sale system for a grocery store, the
units of sale are captured and stored at the level of
units of a product per transaction at the check-out
counter.
• In an order entry system, the quantity ordered is
captured and stored at the level of units of a product
per order received from the customer.
• Whenever you need summary data, you add up the
individual transactions.
• You do not usually keep summary data in an
operational system.
Data Granularity
• When a user queries the data warehouse for
analysis, he or she usually starts by looking at
summary data.
• In a data warehouse, therefore, you find it
efficient to keep data summarized at different
levels.
• Depending on the query, you can then go to the
particular level of detail and satisfy the query.
• Data granularity in a data warehouse refers to
the level of detail.
DATA WAREHOUSES AND DATA MARTS
• Before deciding to build a data warehouse for
your organization, you need to ask the following
basic and fundamental questions and address
the relevant issues:
▫ Top-down or bottom-up approach?
▫ Enterprise-wide or departmental?
▫ Which first—data warehouse or data mart?
▫ Build pilot or go with a full-fledged
implementation?
▫ Dependent or independent data marts?
DATA WAREHOUSES AND DATA MARTS
• Here are the two different basic approaches:
• (1) overall data warehouse feeding dependent
data marts
• (2) several departmental or local data marts
combining into a data warehouse.
• In the first approach, you extract data from the
operational systems; you then transform, clean,
integrate, and keep the data in the data
warehouse.
• So, which approach is best in your case, the top-
down or the bottom-up approach?
• Let us examine these two approaches carefully.
Top-Down Approach

• This is the big-picture approach in which you


build the overall, big, enterprise-wide data
warehouse.
• The data warehouse is large and integrated.
• This approach, however, would take longer to
build and has a high risk of failure.
• If you do not have experienced professionals on
your team, this approach could be dangerous.
• Also, it will be difficult to sell this approach to
senior management and sponsors. They are not
likely to see results soon enough.
Top-Down Approach
Bottom-up Approach

• In this bottom-up approach, you build your


departmental data marts one by one.
• You would set a priority scheme to determine
which data marts you must build first.
• The most severe drawback of this approach is
data fragmentation.
• Each independent data mart will be blind to the
overall requirements of the entire organization.
Bottom-up Approach

You might also like