0% found this document useful (0 votes)

49 views

Chapter 2 and 3

This document discusses key concepts related to data warehousing and online analytical processing (OLAP). It begins with an introduction to data warehouses and their purpose in supporting business decision making. The document then covers architectural components of data warehouses including data sources, the data warehouse itself, and business intelligence tools. It also discusses dimensional modeling, extract-transform-load processes, and techniques for efficiently querying and indexing multidimensional data.

Uploaded by

Harshad Nawghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Chapter 2 and 3

Uploaded by

Harshad Nawghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Data Warehousing & Mining

B. Tech and MBA. Tech. (Information Technology)

Semester-V

Chapter 2 Architecture and Infrastructure & Data

Representation
Chapter 3 Information access and delivery

By. Prof. Bhushan Inje

Outline
• Architectural components
• Infrastructure and metadata
• Principles of dimensional modeling
• Dimensional modeling advance topics
• Data Extraction, Transformation and Loading
• Data quality

By. Prof. Bhushan Inje

Introduction
• A data warehouse is a collection of corporate information,
derived directly from operational systems and some external data sources.
• Its specific purpose is to support business decisions, not business
operations.
• This is what a data warehouse is all about, helping your business ask
“What if?” questions.
• The answers to these questions will ensure your business is proactive,
instead of reactive, a necessity in today’s information age.
What is a Data Warehouse?
• Data warehouse provides architectures and tools for business executives to
systematically organise, understand, and use their data to make strategic
decisions.
• In simple terms, a data warehouse refers to a database that is maintained
separately from an organization’s operational databases.
• According to W. H. Inman, a leading architect in the construction of data
warehouse systems,
“a data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision making
process.”
Cont.…
• Let us understand the four key words in more detail as
follows:
– Subject-oriented
– Integrated
– Time-variant
– Non-volatile
Use of Data Warehouses in organizations
• Many organizations are creating data warehouse to support
business decision-making activities for the following reasons:
– To increasing customer focus.
– To reposition products and managing product portfolios.
– To analyzing operations and looking for sources of profit.
– To managing the customer relationships
– Data warehousing is also very useful from the point of view of
heterogeneous database integration.
Differences between operational Database systems
and Data Warehouses
• On Line Transactional Processing (OLTP)
• On Line Analytical Processing (OLAP).
Characteristics of Data Warehouse
• Subject oriented
• Integrated
• Time variant
• Non volatile
Data Warehouse components
Data Warehouse components
• Data sources
• Data Warehouse
• Reporting
– Business intelligence tools
– Executive information systems (known more widely as Dashboard (business)
– OLAP Tools
– Data Mining

• Metadata
• Operations
•
Data Warehouse components Cont..

• Optional components
– Dependent Data Marts
– Logical Data Marts
– Operational Data Store
Designing the Data Warehouse
Data Warehouse Architecture
• Why do Business analysts need Data Warehouse?
Process of Data Warehouse Design
A data warehouse can be built using three approaches:
1. A top-down approach
2. A bottom-up approach
3. A combination of both approaches
• In general, the warehouse design process consists of the
following steps:
A three-tier Data
Warehouse
architecture
OLAP server architectures
There are three different possible designs:
1. Relational OLAP (ROLAP)
2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)
Getting Multidimensional Data out of the Warehouse
A Multidimensional Data Model
• From Tables and Spreadsheets to Data Cubes
• What is a data cube?
– A data cube allows data to be modeled and viewed in
multiple dimensions. It is defined by dimensions and
facts
– Dimensions are the perspectives or entities with respect
to which an organization wants to keep records.
Cont.…
Cont.…
Schemas for Multidimensional Databases

• Stars, Snowflakes, and Fact Constellations:

• Star schema:
– (1) a large central table (fact table) containing the bulk of the data,
with no redundancy, and
– (2) a set of smaller attendant tables (dimension tables), one for each
dimension.
Examples for Defining Star, Snowflake,
and Fact Constellation Schemas
• Data warehouses and data marts can be defined using two language primitives,
one for cube definition and one for dimension definition. The cube definition
statement has the following syntax:
Measures: Their Categorization and Computation

• Note that a multidimensional point in the data cube space can be defined
by a set of dimension-value pairs, for example, (time = “Q1”, location =
“Vancouver”, item = “computer”).
• A data cube measure is a numerical function that can be evaluated
at each point in the data cube space.
• Measures can be organized into three categories (i.e., distributive,
algebraic, holistic), based on the kind of aggregate functions used.
• Distributive: An aggregate function is distributive if it can be
computed in a distributed manner
• Algebraic: An aggregate function is algebraic if it can be
computed by an algebraic function with M arguments
• Holistic: An aggregate function is holistic if there is no constant
bound on the storage size needed to describe a sub aggregate.
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from a set of low-
level concepts to higher-level, more general concepts.
OLAP Operations in the Multidimensional Data Model
• “How are concept hierarchies useful in OLAP?”
OLAP Operations
• Roll-up- The roll-up operation (also called the drill-up operation by some vendors)
performs aggregation on a data cube, either by climbing up a concept hierarchy for
a dimension or by dimension reduction.
• Drill-down- Drill-down is the reverse of roll-up. It navigates from less detailed data
to more detailed data. Drill-down can be realized by either stepping down a concept
hierarchy for a dimension or introducing additional dimensions.
• Slice and dice- The slice operation performs a selection on one dimension of the
given cube, resulting in a sub-cube
• Pivot (rotate)- Pivot (also called rotate) is a visualization operation that rotates the
data axes in view in order to provide an alternative presentation of the data.
• Other OLAP operations
– Drill-across
– Drill-through
OLAP Systems versus Statistical Databases

• Many of the characteristics of OLAP systems, such as the use

of a multidimensional data model and concept hierarchies, the
association of measures with dimensions, and the notions of
roll-up and drill-down, also exist in earlier work on statistical
databases (SDBs).
• A statistical database is a database system that is designed to
support statistical applications. Similarities between the two
types of systems are rarely discussed, mainly due to
differences in terminology and application domains.
A Starnet Query Model for Querying Multidimensional
Databases
A Starnet Query Model for Querying Multidimensional
Databases
Data Warehouse Back-End Tools and Utilities
• Data extraction, which typically gathers data from multiple, heterogeneous, and
external sources
• Data cleaning, which detects errors in the data and rectifies them when possible
• Data transformation, which converts data from legacy or host format to
warehouse format
• Load, which sorts, summarizes, consolidates, computes views, checks integrity,
and builds indices and partitions
• Refresh, which propagates the updates from the data sources to the warehouse
Types of OLAP Servers: ROLAP versus MOLAP
versus HOLAP
• Relational OLAP (ROLAP) servers:
• Multidimensional OLAP (MOLAP) servers:
• Hybrid OLAP (HOLAP) servers:
• Specialized SQL servers:
“How are data actually stored in ROLAP and MOLAP
architectures?”
Data Warehouse Implementation
• Efficient Computation
of Data Cubes
– The compute cube
Operator and the
Curse of
Dimensionality
“How many cuboids are there in an n-dimensional data cube?”

• If there were no hierarchies associated with each dimension, then the total
number of cuboids for an n-dimensional data cube, as we have seen above,
is 2n.
• such as in the hierarchy “day < month < quarter < year”.

where Li is the number of levels associated with dimension i.

Partial Materialization: Selected Computation of Cuboids

• There are three choices for data cube materialization given a base cuboid:
1. No materialization:
2. Full materialization:
3. Partial materialization:

• The partial materialization of cuboids or subcubes should consider three factors:

(1) identify the subset of cuboids or subcubes to materialize;
(2) exploit the materialized cuboids or subcubes during query processing; and
(3) efficiently update the materialized cuboids or subcubes during load and refresh.
Indexing OLAP Data
• How to index OLAP data by bitmap indexing and join indexing.
Efficient Processing of OLAP Queries

1. Determine which operations should be performed on the

available cuboids:
2.Determine to which materialized cuboid(s) the relevant
operations should be applied:
• “Which of the above four cuboids should be
selected to process the query?”
• “How would the costs of each cuboid
compare if used to process the query?”
From Data Warehousing to Data Mining
• “How do data warehousing and OLAP relate to data mining?”
• Data Warehouse Usage
– There are three kinds of data warehouse applications:
• Information processing
• Analytical processing
• Data mining
• “How does data mining relate to information processing and
on-line analytical processing?”
• “Do OLAP systems perform data mining?
• Are OLAP systems actually data mining systems?”
From On-Line Analytical Processing to
On-Line Analytical Mining
• On-line analytical mining (OLAM) (also called OLAP mining)
• OLAM is particularly important for the following reasons:
– High quality of data in data warehouses
– Available information processing infrastructure surrounding data
warehouses
– OLAP-based exploratory data analysis
– On-line selection of data mining functions
Architecture for
On-Line
Analytical Mining
Data
Warehouse
Deployment
Data Warehouse Deployment
• Lifecycle for data warehouse deployment project:
– 0. Project Scoping and Planning
– 1. Requirement
– 2. Front-End Design
– 3. Warehouse Schema Design
– 4. OLTP to data warehouse mapping
– 5. Implementation
– 6. Deployment
– 7. Management and Maintenance of the system
Growth and maintenance of Data warehouse

• Monitoring The Data Warehouse

– Collection of Statistics
– The following is a random list that includes statistics for different uses. You will find most
of these applicable to your environment.
• Physical disk storage space utilization
• Number of times the DBMS is looking for space in blocks or causes fragmentation
• Memory buffer activity
Collection of Statistics Cont..
• Buffer cache usage
• Input–output performance
• Memory management Profile of the warehouse content, giving number of distinct
entity occurrences (example: number of customers, products, etc.)
• Size of each database table Accesses to fact table records
• Usage statistics relating to subject areas
• Numbers of completed queries by time slots during the day
• Time each user stays online with the data warehouse
• Total number of distinct users per day
• Maximum number of users during time slots daily
• Duration of daily incremental loads
• Count of valid users
• Query response times
• Number of reports run each day
• Number of active tables in the database
Using Statistics for Growth Planning
• We indicate below the types of action that are prompted by the monitoring
statistics:
– Allocate more disk space to existing database tables Plan for new disk
space for additional tables
– Modify file block management parameters to minimize fragmentation
– Create more summary tables to handle large number of queries looking
for summary information
– Reorganize the staging area files to handle more data volume
– Add more memory buffers and enhance buffer management Upgrade
database servers
– Offload report generation to another middle tier
– Smooth out peak usage during the 24-hour cycle
– Partition tables to run loads in parallel and to manage backups
MANAGING THE DATA WAREHOUSE
Platform
Upgrades

Ongoing Managing
Fine- Data
Tuning Growth

Information
Delivery Storage
Enhanceme Management
nts

Data
ETL
Model Management
Revisions
Thank You

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
BISP Informatica Question Collections
100% (2)
BISP Informatica Question Collections
84 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Data Mining 9,10,11
No ratings yet
Data Mining 9,10,11
27 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
DM Chapter 2
No ratings yet
DM Chapter 2
35 pages
Unit2 Olap
No ratings yet
Unit2 Olap
13 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
UNIT2DM
No ratings yet
UNIT2DM
63 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
DMDW 1 2nd Module
No ratings yet
DMDW 1 2nd Module
29 pages
3
No ratings yet
3
77 pages
Chap 2
No ratings yet
Chap 2
21 pages
DWM UNIT 1 (2)
No ratings yet
DWM UNIT 1 (2)
67 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
Unit - 4 Final
No ratings yet
Unit - 4 Final
71 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
03 04OLAP SKJ Edited Oct 1, 2024
No ratings yet
03 04OLAP SKJ Edited Oct 1, 2024
93 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
DMDW-Unit I
No ratings yet
DMDW-Unit I
14 pages
Bca DM Unit Ii
No ratings yet
Bca DM Unit Ii
17 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
58 pages
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
No ratings yet
An Overview of Data Warehousing and OLAP Technology What Is Decision Support?
4 pages
IT DWDM Unit I New PPT
No ratings yet
IT DWDM Unit I New PPT
60 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
unit-2_1 (1)
No ratings yet
unit-2_1 (1)
60 pages
DM 6
No ratings yet
DM 6
29 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
DWDM 3
0% (1)
DWDM 3
52 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
bi-unit-4
No ratings yet
bi-unit-4
40 pages
Dataware House
No ratings yet
Dataware House
19 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
R18CSE4102-UNIT 1 Data Mining Notes
No ratings yet
R18CSE4102-UNIT 1 Data Mining Notes
26 pages
OLAP2
No ratings yet
OLAP2
53 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
Unit 2
No ratings yet
Unit 2
34 pages
DMDW 2nd Module
No ratings yet
DMDW 2nd Module
29 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
17 pages
Unit - 3 Data Warehousing and OLAP Technology
No ratings yet
Unit - 3 Data Warehousing and OLAP Technology
20 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
DMDW_Operations
No ratings yet
DMDW_Operations
65 pages
CH - 3
No ratings yet
CH - 3
45 pages
Unit-2
No ratings yet
Unit-2
32 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
47 pages
Module 1 (2)
No ratings yet
Module 1 (2)
71 pages
UNIT 1 DWDM PRE
No ratings yet
UNIT 1 DWDM PRE
20 pages
Data Warehousing unit 1,2
No ratings yet
Data Warehousing unit 1,2
9 pages
DM Lect4
No ratings yet
DM Lect4
31 pages
4th Year Dw& Dm Kai075 Unit 1
No ratings yet
4th Year Dw& Dm Kai075 Unit 1
25 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
Data Mining - 3 PDF
No ratings yet
Data Mining - 3 PDF
62 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
04OLAP
No ratings yet
04OLAP
50 pages
Data Warehouse C
No ratings yet
Data Warehouse C
34 pages
2 DATA MINING TERMS & CONCEPTS
No ratings yet
2 DATA MINING TERMS & CONCEPTS
44 pages
04OLAP
100% (1)
04OLAP
58 pages
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Ccs341-Question-Bank NNNNNN
No ratings yet
Ccs341-Question-Bank NNNNNN
10 pages
Unit 1 Business Intelligence, Decision Support System
No ratings yet
Unit 1 Business Intelligence, Decision Support System
16 pages
SeattleDataGuy's Newsletter - Substack
No ratings yet
SeattleDataGuy's Newsletter - Substack
41 pages
Database System and Data Warehouse
No ratings yet
Database System and Data Warehouse
5 pages
Anuraag Gujje - Cloud FInal Project
No ratings yet
Anuraag Gujje - Cloud FInal Project
11 pages
Information Technology Auditing 3rd Edition James A. Hall download pdf
100% (10)
Information Technology Auditing 3rd Edition James A. Hall download pdf
40 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
CST466 DATA MINING, OCTOBER 2023.pdf - Crdownload
No ratings yet
CST466 DATA MINING, OCTOBER 2023.pdf - Crdownload
3 pages
OLTP Vs OLAP
0% (1)
OLTP Vs OLAP
2 pages
Venkata Ravi Kadali: Snowflake Architect/BI Analytics
No ratings yet
Venkata Ravi Kadali: Snowflake Architect/BI Analytics
2 pages
4th - Business Intelligence
No ratings yet
4th - Business Intelligence
30 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Etl CV
No ratings yet
Etl CV
2 pages
01 - Identifying the strategy for SAP Datasphere
No ratings yet
01 - Identifying the strategy for SAP Datasphere
28 pages
Cognos Demo Class Content
No ratings yet
Cognos Demo Class Content
8 pages
Akhila Resume
No ratings yet
Akhila Resume
2 pages
The Operational Data Store - Tactical Analysis at Your Fingertips
86% (7)
The Operational Data Store - Tactical Analysis at Your Fingertips
64 pages
Important Questions From All Units
No ratings yet
Important Questions From All Units
3 pages
Lakshman - PowerBI
No ratings yet
Lakshman - PowerBI
6 pages
Computer Science Textbook Solutions - 31
No ratings yet
Computer Science Textbook Solutions - 31
30 pages
Business Intelligence Midterm Topics
No ratings yet
Business Intelligence Midterm Topics
70 pages
CST466 Datamining Syllabus
No ratings yet
CST466 Datamining Syllabus
13 pages
DW & DM
No ratings yet
DW & DM
23 pages
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
No ratings yet
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
14 pages
Lesson Plan Details (LP DWDM)
No ratings yet
Lesson Plan Details (LP DWDM)
10 pages
Cường Vũ CV
No ratings yet
Cường Vũ CV
1 page
Carleen Gawain
No ratings yet
Carleen Gawain
14 pages
Handbook For Technical Recruitment
No ratings yet
Handbook For Technical Recruitment
18 pages
Bab 05 Manajemen Data Dan Pengetahuan
No ratings yet
Bab 05 Manajemen Data Dan Pengetahuan
43 pages

Chapter 2 and 3

Uploaded by

Chapter 2 and 3

Uploaded by

Data Warehousing & Mining

B. Tech and MBA. Tech. (Information Technology)

Chapter 2 Architecture and Infrastructure & Data

By. Prof. Bhushan Inje

By. Prof. Bhushan Inje

• Stars, Snowflakes, and Fact Constellations:

• Many of the characteristics of OLAP systems, such as the use

where Li is the number of levels associated with dimension i.

• The partial materialization of cuboids or subcubes should consider three factors:

1. Determine which operations should be performed on the

• Monitoring The Data Warehouse

You might also like