0% found this document useful (0 votes)
95 views

Data Mining Chapter 2

The document discusses data warehousing and OLAP technology. It defines a data warehouse as an integrated data repository constructed from different sources to support analytical reporting, queries, and decision-making. It describes key characteristics of data warehouses and outlines common data warehouse architectures. It also discusses multidimensional data modeling, data cube computation, and the process of implementing a data warehouse system from requirements analysis through deployment. Finally, it explains how data warehousing enables data mining applications to discover patterns and extract knowledge from large datasets.

Uploaded by

jilalu nuredin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Data Mining Chapter 2

The document discusses data warehousing and OLAP technology. It defines a data warehouse as an integrated data repository constructed from different sources to support analytical reporting, queries, and decision-making. It describes key characteristics of data warehouses and outlines common data warehouse architectures. It also discusses multidimensional data modeling, data cube computation, and the process of implementing a data warehouse system from requirements analysis through deployment. Finally, it explains how data warehousing enables data mining applications to discover patterns and extract knowledge from large datasets.

Uploaded by

jilalu nuredin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter Two

Data Warehousing and OLAP


Technology for Data Mining
By: Badimaw Terefe
What is a Data Warehouse?
 An information repository constructed by integrating data from
different sources.
 Data in the data warehouse helps in the following ways:

 Support analytical reporting

 Structured and ad hoc queries

 Decision making
2
Data Warehouse Characteristics
 Subject oriented

 Integrated

 Time variant

 Non volatile
 Data warehousing is the process of constructing and using a data
warehouse.
3
Data Warehouse Architecture
 A data warehouse architecture varies depending on the elements of an
organization’s operations. The basic architecture looks like the following:

4
Data Warehouse Architecture Cont’d…
 Operational System: used to process the day-to-day transactions of an
organization.
 Flat File: a system of files in which transactional data is stored, and every file
in the system must have a different name.
 Meta Data: a set of data that defines and gives information about other data.
 Example: author, size, date created, date modified, etc.
 Summarized data: area of the data warehouse which saves all the predefined
lightly and highly summarized (aggregated) data generated by the warehouse
manager.
 Its goal is to speed up query performance. 5
Data Warehouse Architecture Cont’d…
 End user access tools:
 Data warehouse is principally used to provide useful information to
business managers for strategic decision making.
 For this, they use end client access tools. These tools can be;
 Reporting and Query Tools
 Application Development Tools
 Executive Information Systems Tools
 Online Analytical Processing Tools
 Data Mining Tools 6
A Multidimensional Data Model
 A multidimensional model views data in the form of a data-cube.
 A data cube enables data to be modeled and viewed in multiple
dimensions. It is defined by dimensions and facts.
 The dimensions are the perspectives or entities concerning which
an organization keeps records.
 A multidimensional data model is organized around a central
theme.
 Facts represent numerical measures.
7
A Multidimensional Data Model Cont’d…

8
A Multidimensional Data Model Cont’d…
Single Location

Many Locations

9
A Multidimensional Data Model Cont’d…

Multidimensional
Representation

10
Data Cube Computation
 Data Cube: data grouped in multidimensional matrices.
 The general idea of this computation is to materialize certain
expensive computations that are frequently inquired.
 A data cube is created from a subset of attributes in the database.
 Measure attributes
 Dimension or functional attributes
 Measure attributes aggregated according to dimensions.
11
Data Cube Computation Cont’d…
 There are five basic computations in data cube. These are:
 Roll-up: create new cube by aggregating data
 Drilling: create new cube by dividing data or adding details
 Slicing: create new sub cube by taking one dimension.
 Dicing: create new sub cube by taking two or more
dimensions.
 Pivot: rotation of dimensions.
12
Data Cube Computation Cont’d…

13
Data Warehouse Implementation

14
Data Warehouse Implementation Cont’d…
 Requirements analysis and capacity planning: it involves:
 Define enterprise needs
 Defining architectures
 Carrying out capacity planning
 Selecting hardware and software tools
This step requires consulting senior management and
stakeholders.
15
Data Warehouse Implementation Cont’d…
 Hardware integration: it involves integrating:
 Servers
 Storage methods
 User software tools
 Modeling: involves designing:
 The warehouse schema
 Views
 This requires using a modeling tool 16
Data Warehouse Implementation Cont’d…
 Physical modeling: it involves:
 Designing the physical data warehouse organization
 Data placement
 Data partitioning
 Deciding on access techniques
 Deciding indexing

17
Data Warehouse Implementation Cont’d…
 Sources: data comes from several sources.
 It involves identifying and connecting data sources using
gateway, ODBC drives, or other wrapper.
 ETL:
 Extract data from many sources
 Transform the data to the staging area
 Load the data to the data warehouse
18
Data Warehouse Implementation Cont’d…
 Populate the data warehouse: taking data from the staging area
and ETL tools to the data warehouse.
 User applications: for the data warehouse to be helpful, there must
be end-user applications.
 This step contains designing and implementing applications
required by end-users.
 Roll-out the warehouse and applications: taking the warehouse
system and operations for the users’ community to use.
19
From Data Warehouse to Data Mining
 Three kinds of data warehouse applications
 Information processing: supports querying, basic statistical
analysis, and reporting using crosstabs, tables, charts and graphs.
 Analytical processing: multidimensional analysis of data
warehouse data such as OLAP, slice-dice, drilling, pivoting.
 Data mining: knowledge discovery from hidden patterns.
 supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools. 20
From Data Warehouse to Data Mining Cont’d…

 Good data mining starts with a good data warehouse. So,


 Correctly set up the data (ETL)
 Doing tests on the data
 Choosing the correct data warehouse
 The benefits of a data warehouse mean that reliable data is readily
available, and data mining can be performed quickly and accurately
– even on the largest data sets.

21
OLAP Technology, Attribute-Oriented Induction

 OLAP: computer processing that enables a user to easily and selectively


extract and view data from different points of view.
 Attribute oriented induction: extracts generalized rules from an
interesting set of data and discovers high level data regularities.
 Developed for learning different kinds of knowledge rules such
as:
 Characteristic rules  Data evolution regularities
 Discrimination rules  Association rules
 Classification rules  Cluster description rules 22
Qu
es
t
ns i o
?
23

You might also like