0% found this document useful (0 votes)

18 views

DMlecture1

DATA MINING

Uploaded by

ppghoshin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

DMlecture1

DATA MINING

Uploaded by

ppghoshin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Data Mining

Course Overview

1
Data Mining Overview

Understanding Data
Classification: Decision Trees and Bayesian classifiers,
ANN, SVM
Association Rules Mining: APriori, FP-growth
Clustering: Hierarchical and Partition approaches
Dimensionality Reductions
Advanced topics: Social Network graph mining, outlier
detection,

2
What is Data Mining?

Data Mining is:

(1) The efficient discovery of previously unknown,
valid, potentially useful, understandable patterns in
large datasets

(2) The analysis of (often large) observational data

sets to find unsuspected relationships and to
summarize the data in novel ways that are both
understandable and useful to the data owner

3
Overview of terms

Data: a set of facts (items) D, usually stored in a

database
Pattern: an expression E in a language L, that
describes a subset of facts
Attribute: a field in an item i in D.
Interestingness: a function ID,L that maps an
expression E in L into a measure space M

4
Overview of terms

The Data Mining Task:

For a given dataset D, language of facts L,

interestingness function ID,L and threshold c, find
the expression E such that ID,L(E) > c efficiently.

5
Knowledge Discovery

6
Examples of Data mining Applications

1. Fraud detection: credit cards, phone calls

2. Marketing: customer targeting
3. Data Warehousing: Walmart
4. Astronomy
5. Molecular biology

7
How Data Mining is used

1. Identify the problem

2. Use data mining techniques to transform the
data into information
3. Act on the information
4. Measure the results

8
The Data Mining Process

1. Understand the domain

2. Create a dataset:
Select the interesting attributes
Data cleaning and preprocessing
3. Choose the data mining task and the specific
algorithm
4. Interpret the results, and possibly return to 2

9
Origins of Data Mining

Draws ideas from machine learning/AI,

pattern recognition, statistics, and database
systems
AI /
Statistics
Must address: Machine Learning
Enormity of data
High dimensionality
of data Data Mining

Heterogeneous,
distributed nature Database
of data systems
10
Data Mining Tasks

Prediction Methods
Use some variables to predict unknown or future
values of other variables.

Description Methods
Find human-interpretable patterns that describe the
data.

11
Data Mining Tasks...

Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]

12
Data Mining Tasks

1. Classification: learning a function that maps an

item into one of a set of predefined classes
2. Regression: learning a function that maps an
item to a real value
3. Clustering: identify a set of groups of similar
items

13
Data Mining Tasks

4. Dependencies and associations:

identify significant dependencies between data
attributes
5. Summarization: find a compact description of
the dataset or a subset of the dataset

14
Data Mining Methods

1. Decision Tree Classifiers:

Used for modeling, classification
2. Association Rules:
Used to find associations between sets of attributes
3. Sequential patterns:
Used to find temporal associations in time series
4. Hierarchical clustering:
used to group customers, web users, etc

15
Why Data Preprocessing?

Data in the real world is dirty

incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
noisy: containing errors or outliers
inconsistent: containing discrepancies in codes or names
No quality data, no quality mining results!
Quality decisions must be based on quality data
Data warehouse needs consistent integration of quality data
Required for both OLAP and Data Mining!

16
Why can Data be Incomplete?

Attributes of interest are not available (e.g., customer

information for sales transaction data)
Data were not considered important at the time of
transactions, so they were not recorded!
Data not recorder because of misunderstanding or
malfunctions
Data may have been recorded and later deleted!
Missing/unknown values for some data

17
Data Cleaning
Data cleaning tasks
Fill in missing values
Identify outliers and smooth out noisy data
Correct inconsistent data

18
Classification: Definition

Given a collection of records (training set )

Each record contains a set of attributes, one of the attributes
is the class.
Find a model for class attribute as a function
of the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set
used to validate it.

19
Classification Example

Tid Home Marital Taxable Home Marital Taxable

Owner Status Income Default Owner Status Income Default

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?

3 No Single 70K No No Married 150K ?

4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
10

Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
Training
Learn
Model
10
10 No Single 90K Yes Set Classifier

20
Example of a Decision Tree

Splitting Attributes
Tid Home Marital Taxable
Owner Status Income Default

1 Yes Single 125K No

2 No Married 100K No HO
Yes No
3 No Single 70K No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
8 No Single 85K Yes
< 80K > 80K

9 No Married 75K No NO YES

10 No Single 90K Yes
10

Training Data Model: Decision Tree

21
Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Home Marital Taxable
Owner Status Income Default
NO HO
1 Yes Single 125K No No
Yes
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes
fits the same data!
10

22
Classification: Application 1

Direct Marketing
Goal: Reduce cost of mailing by targeting a set of consumers
likely to buy a new cell-phone product.
Approach:
Use the data for a similar product introduced before.
We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class attribute.
Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
Type of business, where they stay, how much they earn, etc.
Use this information as input attributes to learn a classifier model.

From [Berry & Linoff] Data Mining Techniques, 1997

23
Classification: Application 2

Fraud Detection
Goal: Predict fraudulent cases in credit card transactions.
Approach:
Use credit card transactions and the information on its account-
holder as attributes.
When does a customer buy, what does he buy, how often he
pays on time, etc
Label past transactions as fraud or fair transactions. This forms the
class attribute.
Learn a model for the class of the transactions.
Use this model to detect fraud by observing credit card
transactions on an account.

24
Clustering Definition

Given a set of data points, each having a set of

attributes, and a similarity measure among
them, find clusters such that
Data points in one cluster are more similar to one
another.
Data points in separate clusters are less similar to one
another.
Similarity Measures:
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.

25
Illustrating Clustering
⌧Euclidean Distance Based Clustering in 3-D space.

Intracluster distances Intercluster distances

are minimized are maximized

26
Clustering: Application 1

Market Segmentation:
Goal: subdivide a market into distinct subsets of customers
where any subset may conceivably be selected as a market
target to be reached with a distinct marketing mix.
Approach:
Collect different attributes of customers based on their
geographical and lifestyle related information.
Find clusters of similar customers.
Measure the clustering quality by observing buying patterns of
customers in same cluster vs. those from different clusters.

27
Clustering: Application 2

Document Clustering:
Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
Approach: To identify frequently occurring terms in
each document. Form a similarity measure based on
the frequencies of different terms. Use it to cluster.
Gain: Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.

28
Association Rule Discovery:
Definition
Given a set of records each of which contain some
number of items from a given collection;
Produce dependency rules which will predict occurrence of an
item based on occurrences of other items.

TID Items
1 Bread, Coke, Milk
2 Beer, Bread
Rules Discovered:
3 Beer, Coke, Diaper, Milk
{Milk} --> {Coke}
4 Beer, Bread, Diaper, Milk
{Diaper, Milk} --> {Beer}
5 Coke, Diaper, Milk

29
Association Rule Discovery:
Application 1

Marketing and Sales Promotion:

Let the rule discovered be
{softdrinks, … } --> {Potato Chips}
Potato Chips as consequent => Can be used to determine what
should be done to boost its sales.
Softdrinks in the antecedent => Can be used to see which
products would be affected if the store discontinues selling
softdrinks.
Softdrinks in antecedent and Potato chips in consequent => Can
be used to see what products should be sold with softdrinks to
promote sale of Potato chips!

30
Association Rule Discovery: Application 2

Supermarket shelf management.

Goal: To identify items that are bought together by
sufficiently many customers.
Approach: Process the point-of-sale data collected
with barcode scanners to find dependencies among
items.
A classic rule --
If a customer buys diaper and milk, then he is very likely to
buy beer.
So, don’t be surprised if you find six-packs stacked next to
diapers!
31
Association Rule Discovery: Application 3

Inventory Management:
Goal: A consumer appliance repair company wants to
anticipate the nature of repairs on its consumer
products and keep the service vehicles equipped with
right parts to reduce on number of visits to consumer
households.
Approach: Process the data on tools and parts
required in previous repairs at different consumer
locations and discover the co-occurrence patterns.

32
Regression

Predict a value of a given continuous valued variable

based on the values of other variables, assuming a linear
or nonlinear model of dependency.
Greatly studied in statistics, neural network fields.
Examples:
Predicting sales amounts of new product based on
advetising expenditure.
Predicting wind velocities as a function of
temperature, humidity, air pressure, etc.
Time series prediction of stock market indices.

33
Deviation/Anomaly Detection
Detect significant deviations from normal behavior
Applications:
Credit Card Fraud Detection

Network Intrusion
Detection

Typical network traffic at University level may reach over 100 million connections per day

34
Challenges of Data Mining

Scalability
Dimensionality
Complex and Heterogeneous Data
Data Quality
Data Ownership and Distribution
Privacy Preservation
Streaming Data

35
Data Compression

Original Data Compressed

Data
lossless

Original Data
Approximated

36
Numerosity Reduction:
Reduce the volume of data

Parametric methods
Assume the data fits some model, estimate model parameters,
store only the parameters, and discard the data (except
possible outliers)

Non-parametric methods
Do not assume models
Major families: histograms, clustering, sampling

37
Clustering

Partitions data set into clusters, and models it by one

representative from each cluster
Can be very effective if data is clustered but not if data
is “smeared”
There are many choices of clustering definitions and
clustering algorithms, more later!

38
Recommended Reference Books

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and

Techniques. Morgan Kaufmann, 3rd ed. , 2011
P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining,
Wiley, 2005 (2nd ed. 2016)
Mohammed J. Zaki and Wagner Meira Jr., Data Mining and Analysis:
Fundamental Concepts and Algorithms 2014

Data management
No ratings yet
Data management
36 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
3 DM
No ratings yet
3 DM
36 pages
Lect 1
No ratings yet
Lect 1
38 pages
Data Mining
No ratings yet
Data Mining
33 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Data Mining
No ratings yet
Data Mining
23 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Introduction
No ratings yet
Introduction
29 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
INS2061 Introductions
No ratings yet
INS2061 Introductions
75 pages
lecture1&2-đã chuyển đổi
No ratings yet
lecture1&2-đã chuyển đổi
46 pages
0 KDLVLP Đã G P
No ratings yet
0 KDLVLP Đã G P
523 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
datamining ch1
No ratings yet
datamining ch1
24 pages
Data Mining: July 18, 2019 1
No ratings yet
Data Mining: July 18, 2019 1
41 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
Data Mining
No ratings yet
Data Mining
37 pages
Lec 1
No ratings yet
Lec 1
33 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
DATA MINING
No ratings yet
DATA MINING
7 pages
7e4aa890-c48b-42f1-a1ac-77279cc316e8 (1)
No ratings yet
7e4aa890-c48b-42f1-a1ac-77279cc316e8 (1)
58 pages
Knowledge Discovery and Data Mining (KDD)
No ratings yet
Knowledge Discovery and Data Mining (KDD)
52 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
8 Data Mining Concepts 2
No ratings yet
8 Data Mining Concepts 2
75 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
What Is Data Mining?: Many Definitions
No ratings yet
What Is Data Mining?: Many Definitions
15 pages
Data Mining
No ratings yet
Data Mining
30 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
lecture1428550844
No ratings yet
lecture1428550844
84 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Data Mining Chapter 1 Notes
No ratings yet
Data Mining Chapter 1 Notes
40 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
31 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
SQL Mastery: From Novice Queries to Advanced Database Wizardry
From Everand
SQL Mastery: From Novice Queries to Advanced Database Wizardry
Scott Markham
No ratings yet
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
From Everand
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
Rob Botwright
No ratings yet
TOOL
No ratings yet
TOOL
28 pages
INTERNET
No ratings yet
INTERNET
21 pages
UNIT 2: Information Technology: Structure
No ratings yet
UNIT 2: Information Technology: Structure
31 pages
4.3 Introduction To Database Management
No ratings yet
4.3 Introduction To Database Management
14 pages
Types of Computer Software
No ratings yet
Types of Computer Software
3 pages
1.2.2 Role of Information Technologies On The Emergence of New Organizational Forms
No ratings yet
1.2.2 Role of Information Technologies On The Emergence of New Organizational Forms
9 pages
Dbms Material
No ratings yet
Dbms Material
44 pages
Un 7
No ratings yet
Un 7
8 pages
Lesson 7: Management of Informa Tion Systems and Information Technology
No ratings yet
Lesson 7: Management of Informa Tion Systems and Information Technology
7 pages
1.2.2 Role of Information Technologies On The Emergence of New Organizational Forms
No ratings yet
1.2.2 Role of Information Technologies On The Emergence of New Organizational Forms
9 pages
Paper 6: Management Information System Module 12: Internet, Intranet, Extranet, MIS & Enterprise
No ratings yet
Paper 6: Management Information System Module 12: Internet, Intranet, Extranet, MIS & Enterprise
15 pages
UNIT 2: Information Technology: Structure
No ratings yet
UNIT 2: Information Technology: Structure
31 pages
10.2 Business Intelligence in Various Business Applications Predicting Customer Behavior
No ratings yet
10.2 Business Intelligence in Various Business Applications Predicting Customer Behavior
4 pages
Chapter-1 Management Information Systems: An
No ratings yet
Chapter-1 Management Information Systems: An
14 pages
Unit 3: IT Impacts: Notes
No ratings yet
Unit 3: IT Impacts: Notes
8 pages
4.7.1 How Intranets Support Electronic Business: Notes
No ratings yet
4.7.1 How Intranets Support Electronic Business: Notes
2 pages
2.0 Data: Data, or Raw Data, Refers To A Basic Description of Products
No ratings yet
2.0 Data: Data, or Raw Data, Refers To A Basic Description of Products
12 pages
An Empirical Investigation On Work-Life Balance of Working Women in Banking Sector
No ratings yet
An Empirical Investigation On Work-Life Balance of Working Women in Banking Sector
6 pages
Chapter-1 Management Information Systems: An
No ratings yet
Chapter-1 Management Information Systems: An
14 pages
3.2.2 Implications For System Design
No ratings yet
3.2.2 Implications For System Design
34 pages
Monthly Timesheet Excel
No ratings yet
Monthly Timesheet Excel
10 pages
Work Life Balance of Female Employees A Case Study On Private Commercial Banks in Bangladesh
No ratings yet
Work Life Balance of Female Employees A Case Study On Private Commercial Banks in Bangladesh
95 pages
Earley Parser
No ratings yet
Earley Parser
6 pages
AJMS Vol.5 No.1 January June 2016 pp.17 29
No ratings yet
AJMS Vol.5 No.1 January June 2016 pp.17 29
13 pages
Time Calc
No ratings yet
Time Calc
1 page
Entrepreneueship Speech
0% (1)
Entrepreneueship Speech
3 pages
Concepts of ILP
No ratings yet
Concepts of ILP
15 pages
A Study on Impacts of Artificial Intelligence in Cyber Security
No ratings yet
A Study on Impacts of Artificial Intelligence in Cyber Security
5 pages
AI-Based Medical Chatbot For Disease Prediction
No ratings yet
AI-Based Medical Chatbot For Disease Prediction
3 pages
Bcos 186 Imp Question by Aw - Informer
No ratings yet
Bcos 186 Imp Question by Aw - Informer
12 pages
Unit - V Notes BA
No ratings yet
Unit - V Notes BA
16 pages
NR21 ML LAB MANUAL
No ratings yet
NR21 ML LAB MANUAL
34 pages
Unit V 2 Marks With Header DL
No ratings yet
Unit V 2 Marks With Header DL
6 pages
Paper+17+(2024.6.1)+The+Role+of+AI-Enhanced+Personalization+in+Customer+Experiences
No ratings yet
Paper+17+(2024.6.1)+The+Role+of+AI-Enhanced+Personalization+in+Customer+Experiences
8 pages
Project Report1
No ratings yet
Project Report1
83 pages
Detailed ML Project Presentation Titanic Housing
No ratings yet
Detailed ML Project Presentation Titanic Housing
13 pages
Fintech Derose 2018
No ratings yet
Fintech Derose 2018
11 pages
B Cisco Dna Assurance 2 2 2 Ug Chapter 010
No ratings yet
B Cisco Dna Assurance 2 2 2 Ug Chapter 010
4 pages
An_Overview_of_Hopfield_Network_and_Bolt
No ratings yet
An_Overview_of_Hopfield_Network_and_Bolt
7 pages
MITPE-Brochure-MachineLearningAICertificate-2023-web_0
No ratings yet
MITPE-Brochure-MachineLearningAICertificate-2023-web_0
12 pages
Back Propagation
No ratings yet
Back Propagation
106 pages
Coursera Machine Learning Homework
100% (1)
Coursera Machine Learning Homework
6 pages
Conference Latex Template ECCE
No ratings yet
Conference Latex Template ECCE
6 pages
Visionary Insights
No ratings yet
Visionary Insights
8 pages
Explainable Artificial Intelligence (AI) in Colorectal Cancer Detection. A Systematic Review
No ratings yet
Explainable Artificial Intelligence (AI) in Colorectal Cancer Detection. A Systematic Review
19 pages
CSC2626: Assignment 1 Due January 28 at 6pm ET 25 Points
No ratings yet
CSC2626: Assignment 1 Due January 28 at 6pm ET 25 Points
2 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Who Will Own The Secrets in Our Genes
No ratings yet
Who Will Own The Secrets in Our Genes
14 pages
BI Analytics Cloud
No ratings yet
BI Analytics Cloud
56 pages
Event Driven Programing Lab - Lec
No ratings yet
Event Driven Programing Lab - Lec
21 pages
Lab Assignment 10: Web Mining
No ratings yet
Lab Assignment 10: Web Mining
5 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
2025 Finance Technology Bullseye Report.pdf
No ratings yet
2025 Finance Technology Bullseye Report.pdf
40 pages
Crop Project Report PDF
No ratings yet
Crop Project Report PDF
56 pages
ML Unit1.notes
No ratings yet
ML Unit1.notes
8 pages
Interview Questions ML
100% (1)
Interview Questions ML
83 pages

DMlecture1

Uploaded by

DMlecture1

Uploaded by

Data Mining

Data Mining is:

(2) The analysis of (often large) observational data

Data: a set of facts (items) D, usually stored in a

The Data Mining Task:

For a given dataset D, language of facts L,

1. Fraud detection: credit cards, phone calls

1. Identify the problem

1. Understand the domain

Draws ideas from machine learning/AI,

1. Classification: learning a function that maps an

4. Dependencies and associations:

1. Decision Tree Classifiers:

Data in the real world is dirty

Attributes of interest are not available (e.g., customer

Given a collection of records (training set )

Tid Home Marital Taxable Home Marital Taxable

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?

3 No Single 70K No No Married 150K ?

1 Yes Single 125K No

9 No Married 75K No NO YES

Training Data Model: Decision Tree

From [Berry & Linoff] Data Mining Techniques, 1997

Given a set of data points, each having a set of

Intracluster distances Intercluster distances

Marketing and Sales Promotion:

Supermarket shelf management.

Predict a value of a given continuous valued variable

Original Data Compressed

Partitions data set into clusters, and models it by one

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and

You might also like