0% found this document useful (0 votes)
211 views

Data Mining Techniques: Introductory and Advanced Topics

mining

Uploaded by

kausar31788
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views

Data Mining Techniques: Introductory and Advanced Topics

mining

Uploaded by

kausar31788
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA MINING TECHNIQUES

Introductory and Advanced Topics

Eamonn Keogh
(some slides adapted from) Margaret Dunham

Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics,


Prentice Hall, 2002.
http://iubio.bio.indiana.edu/treeapp/treeprint-sample1.html
© Prentice Hall 1
Data Mining Outline

– Introduction
– Related Concepts
– Data Mining Techniques

© Prentice Hall 2
Introduction Outline
Goal: Provide an overview of data mining.

 Define data mining


 Data mining vs. databases
 Basic data mining tasks
 Data mining issues

© Prentice Hall 3
Introduction
 Data is growing at a phenomenal rate (read “How
Much Information Is There In the World?” By Michael Lesk )

 Users expect more sophisticated information


 How?

UNCOVER HIDDEN INFORMATION


DATA MINING

© Prentice Hall 4
Data Mining Definition
 Finding hidden information in a database
 Data Mining has been defined as
“The nontrivial extraction of implicit, previously
unknown, and potentially useful information
from data”.
 Similar terms
– Exploratory data analysis
– Data driven discovery
– Deductive learning
– Discovery Science
– Knowledge Discovery
© Prentice Hall 5
Database Processing vs. Data
Mining Processing
 Query  Query
– Well defined – Poorly defined
– SQL – No precise query language

 Output  Output
– Subset of database –Not a subset of database

© Prentice Hall 6
Query Examples
 Database
– Find all credit applicants with last name of Smith.
– Identify customers who have purchased more
than $10,000 in the last month.
– Find all customers who have purchased milk

 Data Mining
– Find all credit applicants who are poor credit
risks. (classification)
– Identify customers with similar buying habits.
(Clustering)
– Find all items which are frequently purchased
with milk. (association rules)
© Prentice Hall 7
Data Mining Models and Tasks

© Prentice Hall 8
Basic Data Mining Tasks I
 Classification maps data into predefined
groups or classes
– Supervised learning
– Pattern recognition
– Prediction
 Regression is used to map a data item to a
real valued prediction variable.
 Clustering groups similar data together into
clusters.
– Unsupervised learning
– Segmentation
– Partitioning H =1.31 (Fem + Fib) + 63.05
© Prentice Hall 9
Basic Data Mining Tasks II
 Summarization maps data into subsets with
associated simple descriptions.
– Characterization
– Generalization
 Link Analysis uncovers relationships among data.
– Affinity Analysis
– Association Rules
– Sequential Analysis determines sequential patterns.

© Prentice Hall 10
KDD Process

Modified from [FPSS96C]

 Selection: Obtain data from various sources.


 Preprocessing: Cleanse data.
 Transformation: Convert to common format.
Transform to new format.
 Data Mining: Obtain desired results.
 Interpretation/Evaluation: Present results
to user in meaningful manner.
© Prentice Hall 11
KDD Process Ex: Shuttle Data 1
0.9
0.8

 Selection: 0.7
0.6
0.5

– Select data (which missions etc) to


0.4
0.3
0.2

use
0.1
00 100 200 300 400 500 600 700 800 900 1000

1
 Preprocessing: 0.9
0.8
0.7
0.6
– Remove Spikes 0.5
0.4
0.3
0.2
 Transformation: 0.1
0

– DFT, DWT, PAA etc


 Data Mining:
– Look for Rules…
0 100 200 300 400 500 600 700 800 900 1000

 Interpretation/Evaluation:
– Show rules to domain experts
 Potential User Applications:
– Prediction of Failures© Prentice Hall 12
Data Mining Development
•Similarity Measures
•Hierarchical Clustering
•Relational Data Model •IR Systems
•SQL •Imprecise Queries
•Association Rule Algorithms •Textual Data
•Data Warehousing
•Scalability Techniques •Web Search Engines

•Bayes Theorem
•Regression Analysis
•EM Algorithm
•K-Means Clustering
•Time Series Analysis
•Algorithm Design Techniques
•Algorithm Analysis •Neural Networks
•Data Structures
•Decision Tree Algorithms

© Prentice Hall 13
KDD Issues
 Human Interaction
 Overfitting
 Outliers
 Interpretation
 Visualization
 Large Datasets
 High Dimensionality

© Prentice Hall 14
KDD Issues (cont’d)
 Multimedia Data
 Missing Data
 Irrelevant Data
 Noisy Data
 Changing Data (streams)
 Integration
 Application

© Prentice Hall 15
Social Implications of DM
 Privacy
 Profiling
 Unauthorized use

© Prentice Hall 16
Data Mining Metrics
 Usefulness
 Return on Investment (ROI)
 Accuracy
 Space/Time Complexity

© Prentice Hall 17

You might also like