Data Mining Chapter 2
Data Mining Chapter 2
Decision making
2
Data Warehouse Characteristics
Subject oriented
Integrated
Time variant
Non volatile
Data warehousing is the process of constructing and using a data
warehouse.
3
Data Warehouse Architecture
A data warehouse architecture varies depending on the elements of an
organization’s operations. The basic architecture looks like the following:
4
Data Warehouse Architecture Cont’d…
Operational System: used to process the day-to-day transactions of an
organization.
Flat File: a system of files in which transactional data is stored, and every file
in the system must have a different name.
Meta Data: a set of data that defines and gives information about other data.
Example: author, size, date created, date modified, etc.
Summarized data: area of the data warehouse which saves all the predefined
lightly and highly summarized (aggregated) data generated by the warehouse
manager.
Its goal is to speed up query performance. 5
Data Warehouse Architecture Cont’d…
End user access tools:
Data warehouse is principally used to provide useful information to
business managers for strategic decision making.
For this, they use end client access tools. These tools can be;
Reporting and Query Tools
Application Development Tools
Executive Information Systems Tools
Online Analytical Processing Tools
Data Mining Tools 6
A Multidimensional Data Model
A multidimensional model views data in the form of a data-cube.
A data cube enables data to be modeled and viewed in multiple
dimensions. It is defined by dimensions and facts.
The dimensions are the perspectives or entities concerning which
an organization keeps records.
A multidimensional data model is organized around a central
theme.
Facts represent numerical measures.
7
A Multidimensional Data Model Cont’d…
8
A Multidimensional Data Model Cont’d…
Single Location
Many Locations
9
A Multidimensional Data Model Cont’d…
Multidimensional
Representation
10
Data Cube Computation
Data Cube: data grouped in multidimensional matrices.
The general idea of this computation is to materialize certain
expensive computations that are frequently inquired.
A data cube is created from a subset of attributes in the database.
Measure attributes
Dimension or functional attributes
Measure attributes aggregated according to dimensions.
11
Data Cube Computation Cont’d…
There are five basic computations in data cube. These are:
Roll-up: create new cube by aggregating data
Drilling: create new cube by dividing data or adding details
Slicing: create new sub cube by taking one dimension.
Dicing: create new sub cube by taking two or more
dimensions.
Pivot: rotation of dimensions.
12
Data Cube Computation Cont’d…
13
Data Warehouse Implementation
14
Data Warehouse Implementation Cont’d…
Requirements analysis and capacity planning: it involves:
Define enterprise needs
Defining architectures
Carrying out capacity planning
Selecting hardware and software tools
This step requires consulting senior management and
stakeholders.
15
Data Warehouse Implementation Cont’d…
Hardware integration: it involves integrating:
Servers
Storage methods
User software tools
Modeling: involves designing:
The warehouse schema
Views
This requires using a modeling tool 16
Data Warehouse Implementation Cont’d…
Physical modeling: it involves:
Designing the physical data warehouse organization
Data placement
Data partitioning
Deciding on access techniques
Deciding indexing
17
Data Warehouse Implementation Cont’d…
Sources: data comes from several sources.
It involves identifying and connecting data sources using
gateway, ODBC drives, or other wrapper.
ETL:
Extract data from many sources
Transform the data to the staging area
Load the data to the data warehouse
18
Data Warehouse Implementation Cont’d…
Populate the data warehouse: taking data from the staging area
and ETL tools to the data warehouse.
User applications: for the data warehouse to be helpful, there must
be end-user applications.
This step contains designing and implementing applications
required by end-users.
Roll-out the warehouse and applications: taking the warehouse
system and operations for the users’ community to use.
19
From Data Warehouse to Data Mining
Three kinds of data warehouse applications
Information processing: supports querying, basic statistical
analysis, and reporting using crosstabs, tables, charts and graphs.
Analytical processing: multidimensional analysis of data
warehouse data such as OLAP, slice-dice, drilling, pivoting.
Data mining: knowledge discovery from hidden patterns.
supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools. 20
From Data Warehouse to Data Mining Cont’d…
21
OLAP Technology, Attribute-Oriented Induction