0% found this document useful (0 votes)

80 views

Datawarehousing and Data Mining

This document discusses concepts related to data mining and data warehousing. It begins by defining key concepts in data mining including association rules, classification, clustering, and goals of data mining such as prediction, identification, and optimization. It then discusses applications of data mining in marketing, finance, manufacturing, and healthcare. Finally, it provides an overview of data warehousing and online analytical processing (OLAP), distinguishing them from traditional transactional databases and noting they are optimized for decision support.

Uploaded by

Souvik

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views

Datawarehousing and Data Mining

Uploaded by

Souvik

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Advance Database Engineering

Datawarehousing and Data

Mining

Data Mining Concepts

Association Rules
Classification

Clustering
Approaches to Other Data Mining Problems
Applications of Data Mining

Data Mining Concepts

Mining or discovery of new information in terms of patterns or

rules from vast amounts of data.

To be practically useful, data mining must be carried out efficiently
on large files and databases.
Although some data mining features are being provided in
RDBMSs, data mining is not well-integrated with database
management systems.
Data mining can be used in conjunction with a data warehouse to
help with certain types of decision making.
Data mining can also be applied to operational databases with
individual transactions.
Data mining helps in extracting meaningful new patterns that
cannot necessarily be found by merely querying or processing
data or metadata in the data warehouse.

Data Mining
Example: A transaction database maintained by a specialty

consumer goods retailer.

The result of mining may be to discover the following type of new
information:
Association rulesfor example, whenever a customer buys video
equipment, he or she also buys another electronic gadget.
Sequential patternsfor example, suppose a customer buys a
camera, and within three months he or she buys photographic
supplies, then within six months he is likely to buy an accessory item
like an additional lens or filter. This defines a sequential pattern of
transactions.
Classification treesfor example, customers may be classified by
frequency of visits, types of financing used, amount of purchase, or
affinity for types of items; some revealing statistics may be generated
for such classes.

Data Mining
Many possibilities exist for discovering new knowledge

about buying patterns, relating factors such as age,

income group, place of residence, to what and how
much the customers purchase.
This information can then be utilized to plan additional
store locations based on demographics, run store
promotions, combine items in advertisements, or plan
seasonal marketing strategies.
The results of data mining may be reported in a variety
of formats, such as listings, graphic outputs, summary
tables, or visualizations.
5

Goals of Data Mining

Prediction
Identification

Classification
Optimization

Goals of Data Mining

Prediction. Data mining can show how certain attributes within

the data will behave in the future. Examples of predictive data

mining include the analysis of buying transactions to predict what
consumers will buy under certain discounts, how much sales
volume a store will generate in a given period, and whether
deleting a product line will yield more profits. In a scientific
context, certain seismic wave patterns may predict an earthquake
with high probability.

Goals of Data Mining

Classification. Data mining can partition the data so that

different classes or categories can be identified based on

combinations of parameters. For example, customers in a
supermarket can be categorized into
discount seeking shoppers

shoppers in a rush
loyal regular shoppers
shoppers attached to name brands
infrequent shoppers.

This classification may be used in different analyses of customer

buying transactions as a post mining activity.

Goals of Data Mining

Identification

Goals of Data Mining

Optimization. One eventual goal of data mining may be to

optimize the use of limited resources such as time, space, money,

or materials and to maximize output variables such as sales or
profits under a given set of constraints.
Resembles the objective function used in operations research
problems that deals with optimization under constraints.

Knowledge Classification
Knowledge discovered during data mining is classified as follows:
Association rules
Classification of hierarchies
Sequential patterns
Patterns within time series
Clustering
For most applications, the desired knowledge is a combination of
the above types.

Association Rules
These rules correlate the presence of a set of items with
another range of values for another set of variables.
Examples:
(1) When a female retail shopper buys a handbag, she is likely to
buy shoes.
(2) (2) An X-ray image containing characteristics a and b is likely to
also exhibit characteristic c

Other types of association rules

Association Rules among Hierarchies
Occur among hierarchies of items. Typically, it is possible to divide items
among disjoint hierarchies based on the nature of the domain. Example:
foods in a supermarket, articles in a sports shop

Multidimensional Associations

It may be of interest to find association rules that involve multiple

dimensions
Example: Customer transactions with three dimensions:
Transaction_id, Time, and Items_bought:
Single Dimension: Items_bought(milk) => Items_bought(juice).
Multiple Dimension: Time(6:30...8:00) => Items_bought(milk)
Rules like these are called multidimensional association rules
13

Other types of association rules

Negative Associations

A negative association is of the following type: 60 percent of

customers who buy potato chips do not buy bottled water
In general, we are interested in cases in which two specific sets
of items appear very rarely in the same transaction

Classification Hierarchies
The goal is to work from an existing set of events or transactions to
create a hierarchy of classes.
Examples:
(1) A population may be divided into five ranges of credit
worthiness based on a history of previous credit card
transactions.
(2) A model may be developed for the factors that determine the
desirability of a store location on a 110 scale.

Sequential Patterns
A sequence of actions or events is sought.
Example:
If a patient underwent cardiac bypass surgery for blocked arteries
and an aneurysm and later developed high blood urea within a year
of surgery, he or she is likely to suffer from kidney failure within the
next 18 months.
Detection of sequential patterns is equivalent to detecting
associations among events with certain temporal relationships.

Patterns with time series

Similarities can be detected within positions of a time series of data,
which is a sequence of data taken at regular intervals such as daily
sales or daily closing stock prices.
Examples:
(1) Stocks of a utility company, ABC Power, and a financial
company, XYZ Securities, showed the same pattern during
2009 in terms of closing stock prices.
(2) Two products show the same selling pattern in summer but a
different one in winter.
(3) A pattern in solar magnetic wind may be used to predict
changes in Earths atmospheric conditions.
17

Clustering
A given population of events or items can be partitioned
(segmented) into sets of similar elements.
Examples:
(1) An entire population of treatment data on a disease may be
divided into groups based on the similarity of side effects
produced.
(2) The adult population in the United States may be categorized
into five groups from most likely to buy to least likely to buy a
new product.
(3) The Web accesses made by a collection of users against a set
of documents (say, in a digital library) may be analyzed in terms
of the keywords of documents to reveal clusters or categories of
users.

Applications of Data Mining

Marketing
Analysis of consumer behavior based on buying patterns
Determination of marketing strategies including advertising, store

location, and targeted mailing;

Segmentation of customers, stores, or products
Design of catalogs, store layouts, and advertising campaigns.
Finance.
Analysis of creditworthiness of clients

Segmentation of account receivables

Performance analysis of finance investments like stocks, bonds, and

mutual funds
Evaluation of financing options
Fraud detection.
19

Applications of Data Mining

Manufacturing
Optimization of resources like machines, manpower, and materials;
Optimal design of manufacturing processes, shop-floor layouts
Product design, such as for automobiles based on customer

requirements.
Health Care
Radiological images
Analysis of microarray (gene-chip) experimental data to cluster genes

and to relate to symptoms or diseases

Analysis of side effects of drugs and effectiveness of certain
treatments
Optimization of processes within a hospital
Relationship of patient wellness data with doctor qualifications.

Data Mining Tools

Datawarehousing and OLAP

Characterisitics
Data Modeling

Building a Datawarehouse
Typical Functionality
Datawarehouses versus Views

Difficulties of implementing a Datawarehouse

Datawarehousing and OLAP

In modern organizations, users of data are often completely removed

from the data sources

Many people only need read access to data, but still need fast access
to a larger volume of data than can conveniently be downloaded to the
desktop. Often such data comes from multiple databases.
Because many of the analyses performed are recurrent and
predictable, software vendors and systems support staff are designing
systems to support these functions.
Presently there is a great need to provide decision makers from middle
management upward with information at the correct level of detail to
support decision making.
Data warehousing, online analytical processing (OLAP), and data
mining provide this functionality.
23

Datawarehousing and OLAP

Traditional databases are transactional (relational, object-oriented,

network, or hierarchical).
Data warehouses have the distinguishing characteristic that they are
mainly intended for decision-support applications. They are optimized
for data retrieval, not routine transaction processing.
Data warehouses are quite distinct from traditional databases in their
structure, functioning, performance, and purpose
Data warehouse as a subject-oriented, integrated, nonvolatile, timevariant collection of data in support of managements decisions.
Data warehouses provide access to data for complex analysis,
knowledge discovery, and decision making. They support highperformance demands on an organizations data and information
24

OLAP, DSS and Data Mining

Several types of applicationsOLAP, DSS, and data mining

applications are supported by a Datawarehouse

OLAP (online analytical processing) is a term used to describe the
analysis of complex data from the data warehouse. OLAP tools use
distributed computing capabilities for analyses that require more
storage and processing power than can be economically and
efficiently located on an individual desktop.
DSS (decision-support systems), also known as EISexecutive
information systems; support an organizations leading decision
makers with higher-level data for complex and important decisions.
Data mining is used for knowledge discovery, the process of
searching data for unanticipated new knowledge.
25

Traditional DB vs Datawarehouse
Traditional Database

Datawarehouse

Support online transaction

processing (OLTP) i.e.
insertions, updates, and
deletions, queries
Support transactions that deal
with a few tuples per relation to
process.
Optimized to process queries that
may touch a small part of the
database
Updated in real-time.
Transactions are the unit and
agent of change.

Designed to support efficient

extraction, processing, and
presentation for analytic and
decision-making purposes
Generally contain very large amounts
of data from multiple sources
including databases and files
Optimized for OLAP, DSS, or data
mining.
Nonvolatile - information changes far
less often and may be regarded as
nonreal-time with periodic updating.26

Characteristics of Datawarehouse

Characteristics of Datawarehouse
Multidimensional conceptual view
Generic dimensionality
Unlimited dimensions and aggregation levels

Unrestricted cross-dimensional operations

Dynamic sparse matrix handling
Client-server architecture

Multiuser support
Accessibility

Transparency
Intuitive data manipulation
Consistent reporting performance

Flexible reporting
28

Types of Datawarehouses
Enterprise-wide data warehouses are huge projects

requiring massive investment of time and resources.

Virtual data warehouses provide views of operational
databases that are materialized for efficient access.
Data marts generally are targeted to a subset of the
organization, such as a department, and are more tightly
focused e.g. Sales Data mart

Types of Datawarehouses
Enterprise-wide data warehouses are huge projects

requiring massive investment of time and resources.

Data Modeling
Multidimensional data modeling

Populates data in multidimensional matrices called data

cubes
Called hypercubes if they have more than three dimensions
Query performance in multidimensional matrices can be
much better than in the relational data model.
Examples of dimensions in a corporate data warehouse:
fiscal periods
products
regions.

Two Dimensional Matrix

Example: a spreadsheet of regional sales by product for

a particular time period.

Data Cube
Example: Adding a time dimension, such as an organizations

fiscal quarters, would produce a three-dimensional matrix,

which could be represented using a data cube.

Data Cube
Example: Figure shows a three-dimensional data cube that organizes

product sales data by fiscal quarters and sales regions. Each cell could
contain data for a specific product, specific fiscal quarter, and specific
region

Hypercube and Pivoting

By including additional dimensions, a data hypercube could be

produced, although more than three dimensions cannot be easily

visualized or graphically presented.
The data can be queried directly in any combination of dimensions,
bypassing complex database queries.
Tools exist for viewing data according to the users choice of
dimensions.
Changing from one-dimensional hierarchy (orientation) to another is
easily accomplished in a data cube with a technique called pivoting
(also called rotation). In this technique the data cube can be thought of
as rotating to show a different orientation of the axes.
Example: you might pivot the data cube to show regional sales
revenues as rows, the fiscal quarter revenue totals as columns, and the
companys products in the third dimension
35

Data Cube
By including additional dimensions, a data hypercube could be

produced, although more than three dimensions cannot be easily

visualized or graphically presented.

Roll Up and Drill Down

Multi-dimensional models lend themselves readily to hierarchical

views in what is known as roll-up display and drill-down display.

A roll-up display moves up the hierarchy, grouping into larger units
along a dimension providing a coarser grained view.
Example: Summing weekly data by quarter or by year.
Drill-down display provides the opposite capability, furnishing a finer

grained view,
Example: Disaggregating country sales by region and then regional
sales by sub-region and also breaking up products by styles.
37

Roll Up Example
Figure shows a roll-up display that moves from individual products to a

coarser-grain of product categories

Drill Down Example

Figure shows, a drill-down display by disaggregating country sales by

region and then regional sales by sub-region and also breaking up

products by styles

Dimension and Fact Tables

Multidimensional model involves two types of tables:
dimension tables
fact tables.
A dimension table consists of tuples of attributes of the

dimension.
A fact table can be thought of as having tuples, one per a
recorded fact.
This fact contains some measured or observed variable(s)
and identifies it (them) with pointers to dimension tables.
The fact table contains the data, and the dimensions identify
each tuple in that data.
40

Dimension and Fact Tables

Example of fact table with multiple dimensions star schema

Schemas: Star and Snowflake

Two common multidimensional schemas:
Star schema
Snowflake schema.

The star schema consists of a fact table with a single

table for each dimension

The snowflake schema is a variation on the star schema in

which the dimensional tables from a star schema are

organized into a hierarchy by normalizing them
Some installations are normalizing data warehouses up to the
third normal form so that they can access the data warehouse
to the finest level of detail
42

Example: Snowflake Schema

Fact Constellation
A fact constellation is a set of fact tables that share some dimension

tables.
Fact constellations limit the possible queries for the warehouse
Figure shows a fact constellation with two fact tables, business results
and business forecast.
These share the dimension table called product. Fact constellations
limit the possible queries for the warehouse

Typical Functionality
Facilitate complex, data-intensive, and frequent ad hoc queries.
Data warehouses must provide far greater and more efficient query

support than is demanded of transactional databases.

The data warehouse access component supports:
enhanced spreadsheet functionality

efficient query processing

structured queries

ad hoc queries
data mining
materialized views.

Enhanced spreadsheet functionality includes support for state-of-theart

spreadsheet applications (for example, MS Excel) as well as for OLAP

applications programs
45

Pre-programmed Functionality
Roll-up Data is summarized with increasing generalization.
Example: weekly to quarterly to annually
Drill-down Increasing levels of detail are revealed (the

complement of rollup).
Pivot Cross tabulation (also referred to as rotation) is
performed.
Slice and dice Projection operations are performed on the
dimensions.
Sorting Data is sorted by ordinal value.
Selection Data is available by value or range.
Derived (computed) attributes Attributes are computed by
46
operations on stored and derived values.

Maths Worksheet Grade 2 Time
100% (1)
Maths Worksheet Grade 2 Time
1 page
80 BPM Calfrac Hydration Unit PDF
No ratings yet
80 BPM Calfrac Hydration Unit PDF
69 pages
Class 3
No ratings yet
Class 3
1 page
Touchpad Plus Ver. 1.1 Class 5
From Everand
Touchpad Plus Ver. 1.1 Class 5
Nisha Batra
No ratings yet
Amity International Clss 1 Mental Maths Feb
No ratings yet
Amity International Clss 1 Mental Maths Feb
2 pages
Division CBSE Class 3 Worksheet
No ratings yet
Division CBSE Class 3 Worksheet
3 pages
Worksheet On Place Value and Facevalue Grade3 - Google Search
No ratings yet
Worksheet On Place Value and Facevalue Grade3 - Google Search
1 page
CBSE Class 3 Science Shelter MCQS, Multiple Choice Questions
No ratings yet
CBSE Class 3 Science Shelter MCQS, Multiple Choice Questions
15 pages
Addition Class3
No ratings yet
Addition Class3
23 pages
LKG Holiday Worksheets
No ratings yet
LKG Holiday Worksheets
8 pages
Byte Code - Class - 4 Answer Key-2024
No ratings yet
Byte Code - Class - 4 Answer Key-2024
18 pages
Class 3 Math Worksheet
100% (1)
Class 3 Math Worksheet
2 pages
Heros Convent HR - Sec.School First Term Examination Class - 4 Maths M.M.80
No ratings yet
Heros Convent HR - Sec.School First Term Examination Class - 4 Maths M.M.80
3 pages
Ryan International School: Holiday Homework SESSION 2022-23
No ratings yet
Ryan International School: Holiday Homework SESSION 2022-23
17 pages
Fractions Worksheet
No ratings yet
Fractions Worksheet
2 pages
STATICTICS
0% (1)
STATICTICS
296 pages
CL-9, Notes On Number System
No ratings yet
CL-9, Notes On Number System
5 pages
NCERT Solutions For Class 2 Maths Chapter 5 Patterns
No ratings yet
NCERT Solutions For Class 2 Maths Chapter 5 Patterns
9 pages
CBSE Class 3 Mathematics - Multiplication 2
No ratings yet
CBSE Class 3 Mathematics - Multiplication 2
4 pages
Class - KG Entrance Syllabus
No ratings yet
Class - KG Entrance Syllabus
1 page
Money Crunch Worksheet
No ratings yet
Money Crunch Worksheet
31 pages
Grade 2 Maths Sa 1
No ratings yet
Grade 2 Maths Sa 1
13 pages
Grade 2 Data Handling Practice worksheet
No ratings yet
Grade 2 Data Handling Practice worksheet
3 pages
Class 5 Maths Chapter 1 The Fish Tale
No ratings yet
Class 5 Maths Chapter 1 The Fish Tale
17 pages
NCERT Exemplar Solution Class 8 Science Chapter 12
No ratings yet
NCERT Exemplar Solution Class 8 Science Chapter 12
7 pages
Class 1
No ratings yet
Class 1
67 pages
Math Support Class Worksheets
No ratings yet
Math Support Class Worksheets
4 pages
Geography Third Examination
100% (1)
Geography Third Examination
8 pages
April Detailed Planner
No ratings yet
April Detailed Planner
21 pages
Multiplication
No ratings yet
Multiplication
23 pages
INDIAN INTERNATIONAL NUMBER SYSTEM-Notes PDF
No ratings yet
INDIAN INTERNATIONAL NUMBER SYSTEM-Notes PDF
10 pages
Class 2 Maths WS
No ratings yet
Class 2 Maths WS
5 pages
3rd Class Maths Lesson Plans
No ratings yet
3rd Class Maths Lesson Plans
42 pages
Practice Workbook LR Method For Addition Subtraction PDF
No ratings yet
Practice Workbook LR Method For Addition Subtraction PDF
4 pages
NCF Syllabus Class 1st Maths in HP Details by Vijay Kumar Heer
No ratings yet
NCF Syllabus Class 1st Maths in HP Details by Vijay Kumar Heer
7 pages
CBSE Class 10 Science 2020 Question Paper Solution Set 3111 PDF
No ratings yet
CBSE Class 10 Science 2020 Question Paper Solution Set 3111 PDF
7 pages
CBSE Class 3 Maths Worksheet PDF
No ratings yet
CBSE Class 3 Maths Worksheet PDF
3 pages
Maths Class 3 Planner
No ratings yet
Maths Class 3 Planner
118 pages
Super Teacher Worksheets Basic Multiplication
No ratings yet
Super Teacher Worksheets Basic Multiplication
3 pages
Class 5 Ganit Pradnya Test Paper
No ratings yet
Class 5 Ganit Pradnya Test Paper
1 page
Managerial Economics: Why Do Firms Exist?
100% (1)
Managerial Economics: Why Do Firms Exist?
2 pages
CH 3 My Needs - Food We Eat
100% (1)
CH 3 My Needs - Food We Eat
3 pages
CBSE Class 1 GK Question Paper Set B - 0
No ratings yet
CBSE Class 1 GK Question Paper Set B - 0
3 pages
Class 3 Maths Cycle Test
No ratings yet
Class 3 Maths Cycle Test
2 pages
Notes For Addition
No ratings yet
Notes For Addition
12 pages
Advanced Java Lab Manual
No ratings yet
Advanced Java Lab Manual
23 pages
Class 4th Maths
No ratings yet
Class 4th Maths
2 pages
II Maths UT 8
No ratings yet
II Maths UT 8
2 pages
Properties of Air Booklet pdf1
No ratings yet
Properties of Air Booklet pdf1
12 pages
CBSE Class 3 Mathematics - Addition & Subtraction PDF
No ratings yet
CBSE Class 3 Mathematics - Addition & Subtraction PDF
3 pages
Class 4 Number and Numeration
No ratings yet
Class 4 Number and Numeration
2 pages
Grade 2 Mental Maths Worksheet 3
No ratings yet
Grade 2 Mental Maths Worksheet 3
2 pages
Class 3 Math WKT 15 May
No ratings yet
Class 3 Math WKT 15 May
2 pages
Fun With Paint ch7
No ratings yet
Fun With Paint ch7
3 pages
Number Place Value Planning
No ratings yet
Number Place Value Planning
3 pages
Factors & Multiples: Mrs. Walker 4 Grade Math
No ratings yet
Factors & Multiples: Mrs. Walker 4 Grade Math
25 pages
Equal Groups Multiplication Worksheet 2
No ratings yet
Equal Groups Multiplication Worksheet 2
9 pages
Calendar Concepts
No ratings yet
Calendar Concepts
2 pages
DDB - Presentation5data Mining Overview
No ratings yet
DDB - Presentation5data Mining Overview
19 pages
Introduction to Data Mining_125604
No ratings yet
Introduction to Data Mining_125604
7 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Database Recovery Techniques
No ratings yet
Database Recovery Techniques
42 pages
8.concurrency Control
No ratings yet
8.concurrency Control
66 pages
Relational Database Design: Normalization
No ratings yet
Relational Database Design: Normalization
63 pages
Data Modeling Using The Entity-Relationship (ER) Model
No ratings yet
Data Modeling Using The Entity-Relationship (ER) Model
38 pages
2.relational Database
No ratings yet
2.relational Database
74 pages
Component Engineering - UML
No ratings yet
Component Engineering - UML
131 pages
Lecture Notes Mobile Communication
No ratings yet
Lecture Notes Mobile Communication
78 pages
HW 3
No ratings yet
HW 3
10 pages
Learning Unit 6
No ratings yet
Learning Unit 6
13 pages
E Nursery
No ratings yet
E Nursery
7 pages
Tb6560 3A Stepper Motor Driver Board: User Manual
No ratings yet
Tb6560 3A Stepper Motor Driver Board: User Manual
6 pages
tk2170 PDF
No ratings yet
tk2170 PDF
71 pages
Transforming Your Digital Business: ACI Anywhere
No ratings yet
Transforming Your Digital Business: ACI Anywhere
72 pages
Ultimate Guide To BPMN en
100% (3)
Ultimate Guide To BPMN en
26 pages
CLMGR
No ratings yet
CLMGR
12 pages
Cisco Voice Troubleshooting
No ratings yet
Cisco Voice Troubleshooting
10 pages
GuideinstallingWindows95onDOSBox0 74
No ratings yet
GuideinstallingWindows95onDOSBox0 74
4 pages
Phishing Attack: Anatomy of A Modern
100% (1)
Phishing Attack: Anatomy of A Modern
12 pages
Multimedia and Ict
No ratings yet
Multimedia and Ict
24 pages
Ourlog 8686
No ratings yet
Ourlog 8686
5 pages
Camline: Description Performance Analysis Detail
No ratings yet
Camline: Description Performance Analysis Detail
17 pages
Array Questions
No ratings yet
Array Questions
5 pages
Chapter 3. Constraints: User-Defined Data Types (Udts)
No ratings yet
Chapter 3. Constraints: User-Defined Data Types (Udts)
3 pages
Nco Class-8
No ratings yet
Nco Class-8
2 pages
User Interface
No ratings yet
User Interface
4 pages
New-Preparation for Noon Interview
No ratings yet
New-Preparation for Noon Interview
4 pages
Lighthouse
No ratings yet
Lighthouse
472 pages
How To Sell An Amazon FBA Business For $425,000 (Real Life Example)
No ratings yet
How To Sell An Amazon FBA Business For $425,000 (Real Life Example)
1 page
USP Structural Connectors Catalog USA
No ratings yet
USP Structural Connectors Catalog USA
236 pages
Iolite Manual
No ratings yet
Iolite Manual
78 pages
Chapter 8 - Firewall User Management and Technologies
No ratings yet
Chapter 8 - Firewall User Management and Technologies
45 pages
Hi-Pe: High Performance Walk-Through Multi-Zone Metal Detector
No ratings yet
Hi-Pe: High Performance Walk-Through Multi-Zone Metal Detector
6 pages
Result Processing System for Academic Institutions
No ratings yet
Result Processing System for Academic Institutions
19 pages
Class Two
No ratings yet
Class Two
7 pages
Shell Programming
No ratings yet
Shell Programming
50 pages

Datawarehousing and Data Mining

Uploaded by

Datawarehousing and Data Mining

Uploaded by

Advance Database Engineering

Datawarehousing and Data

Data Mining Concepts

Data Mining Concepts

rules from vast amounts of data.

consumer goods retailer.

about buying patterns, relating factors such as age,

Goals of Data Mining

Goals of Data Mining

the data will behave in the future. Examples of predictive data

Goals of Data Mining

different classes or categories can be identified based on

This classification may be used in different analyses of customer

Goals of Data Mining

Goals of Data Mining

optimize the use of limited resources such as time, space, money,

Other types of association rules

It may be of interest to find association rules that involve multiple

Other types of association rules

A negative association is of the following type: 60 percent of

Patterns with time series

Applications of Data Mining

location, and targeted mailing;

Segmentation of account receivables

Applications of Data Mining

and to relate to symptoms or diseases

Data Mining Tools

Datawarehousing and OLAP

Difficulties of implementing a Datawarehouse

Datawarehousing and OLAP

from the data sources

Datawarehousing and OLAP

OLAP, DSS and Data Mining

applications are supported by a Datawarehouse

Support online transaction

Designed to support efficient

Unrestricted cross-dimensional operations

requiring massive investment of time and resources.

requiring massive investment of time and resources.

Populates data in multidimensional matrices called data

Two Dimensional Matrix

a particular time period.

fiscal quarters, would produce a three-dimensional matrix,

Hypercube and Pivoting

produced, although more than three dimensions cannot be easily

produced, although more than three dimensions cannot be easily

Roll Up and Drill Down

views in what is known as roll-up display and drill-down display.

coarser-grain of product categories

Drill Down Example

region and then regional sales by sub-region and also breaking up

Dimension and Fact Tables

Dimension and Fact Tables

Schemas: Star and Snowflake

The star schema consists of a fact table with a single

table for each dimension

which the dimensional tables from a star schema are

Example: Snowflake Schema

support than is demanded of transactional databases.

efficient query processing

Enhanced spreadsheet functionality includes support for state-of-theart

spreadsheet applications (for example, MS Excel) as well as for OLAP

You might also like