Module 3 - Datawarehousing
Module 3 - Datawarehousing
Data Warehousing
Contents
Processes
What Is a Data Warehouse?
1. Subject oriented
2. Integrated
3. Time-variant (time series)
4. Nonvolatile
5. Web based
6. relational/multi-dimensional
7. Client/server
8. real-time
9. Include metadata
Characteristics of Data Warehousing
1. Subject oriented
Data are organized by detailed subject, such as sales, products, or
customers, containing only information relevant for decision support.
Subject orientation enables users to determine not only how their
business is performing, but why.
Subject orientation provides a more comprehensive view of the
organization.
2. Integrated
A data warehouse is developed by integrating data from varied sources
into a consistent format.
The data must be stored in the warehouse in a consistent and universally
acceptable manner in terms of naming, format, and coding.
3. Time variant
A warehouse maintains historical data. The data do not necessarily
provide current status (except in real-time systems).
They detect trends, deviations, and long-term relationships for
forecasting and comparisons, leading to decision making.
The data stored in a data warehouse is documented with an element of
time, either explicitly or implicitly.
Data for analysis from multiple sources contains multiple time points
(e.g., daily, weekly, monthly views).
4. Nonvolatile
Data once entered into a data warehouse must remain unchanged.
All data is read-only. Previous data is not erased when current data is
entered.
This helps you to analyze what has happened and when..
5. Web based
Data warehouses are typically designed to provide an efficient
computing environment for Web-based applications.
6. Relational/multidimensional
A data warehouse uses either a relational structure or a
multidimensional structure.
Relational models are flat, ie. tables are two-dimensional;
multidimensional models can have more then two dimensions
7. Client/server
A data warehouse uses the client/ server architecture to provide easy
access for end users.
8. Real time
Newer data warehouses provide real-time, or active, data-access and
analysis capabilities
9. Include metadata
A data warehouse contains metadata (data about data) about how
the data are organized and how to effectively use them.
Data Marts
Unlike the static contents of a data warehouse, the contents of an ODS are updated
than for the medium- and long-term decisions associated with an EDW.
An ODS is similar to short-term memory in that it stores only very recent
/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,
API
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
1. Data sources
- Data are sourced from multiple independent operational "legacy" systems and
3. Data loading
- Data are loaded into a staging area, where they are transformed and cleansed.
- The data are then ready to load into the data warehouse and/or data marts.
4. Comprehensive database
- EDW to support all decision analysis by providing relevant summarized
5. Metadata
- Metadata are maintained so that they can be assessed by IT personnel
and users.
- Metadata include software programs about data and rules for organizing
data summaries that are easy to index and search , especially with Web
tools.
6. Middleware tools
- enable access to the data warehouse
- Power users such as analysts may write their own SQL queries
- Others may employ a managed query environment, such as Business
Objects, to access data.
- There are many front-end applications that business users can use to
interact with data stored in the data repositories, including data mining,
OLAP, reporting tools, and data visualization tools.
DATA WAREHOUSING ARCHITECTURES
client/ server or n-tier architectures
• two-tier architectures
• three-tier architectures
multi-tiered architectures are known to be capable of serving the needs of large-scale,
performance demanding information systems such as data warehouses.
• Three parts:
1. The data warehouse itself, which contains the data and associated software
2. Data acquisition (back-end) software, which extracts data from legacy systems and
external sources, consolidates and summarizes them, and loads them into the
data warehouse
3. Client (front-end) software, which allows users to access and analyze data from
the warehouse
3-tier architecture
Data Integration
comprises three major processes that, when correctly implemented,
permit data to be accessed and made accessible to an array of ETL and
analysis tools and the data warehousing environment:
- data access (i.e., the ability to access and extract data from any data
source)
- data federation (i.e. , the integration of business views across multiple
data stores)
-change capture (based on the identification, capture, and delivery of
the changes made to enterprise data sources) from many sources.
Some vendors, such as SAS Institute, Inc., have developed strong data
integration tools.
The SAS enterprise data integration server includes customer data
integration tools that improve data quality in the integration process.
The Oracle Business Intelligence Suite assists in integrating data as well.
A major purpose of a data warehouse is to integrate data from multiple
systems.
Various integration technologies enable data and metadata integration:
• Enterprise application integration (EAI)
• Service-oriented architecture (SOA)
• Enterprise information integration (Ell)
• Extraction, transformation, and load (ETL)
Enterprise application integration (EAi)
provides a vehicle for pushing data from source systems into the data
warehouse.
It involves integrating application functionality and is focused on sharing
functionality (rather than data) across systems, thereby enabling flexibility and
reuse.
Traditionally, EAI solutions have focused on enabling application reuse at the
application programming interface (API) level.
Recently, EAI is accomplished by using SOA coarse-grained services (a collection
of business processes or functions) that are well defined and documented.
• Using Web services is a specialized way of implementing an SOA.
• EAI can be used to facilitate data acquisition directly into a near-real-time data
warehouse or to deliver decisions to the OLTP systems.
• There are many different approaches to and tools for EAI implementation.
Enterprise information integration (Ell)
an evolving tool space that promises real-time data integration from a variety of
databases.
It is a mechanism for pulling data from source systems to satisfy a request for
information. Ell tools use predefined metadata to populate views that make
mechanism for creating an integrated view with data warehouses and data marts.