Classification and Prediction

The document discusses classification and prediction, describing classification as predicting categorical class labels by constructing a model based on training data, while regression models continuous functions. It covers issues in classification like data preparation and model evaluation, and describes decision tree induction as a method for classification that generates trees to partition data based on attribute tests at internal nodes.

Uploaded by

Bhagirath Prajapati

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Classification and Prediction

Uploaded by

Bhagirath Prajapati

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

Classification and

Prediction
Classification and Prediction
 What is classification? What is
regression?
 Issues regarding classification and
prediction
 Classification by decision tree induction
 Scalable decision tree induction
Classification vs. Prediction
 Classification:
 predicts categorical class labels
 classifies data (constructs a model) based on the
training set and the values (class labels) in a classifying
attribute and uses it in classifying new data
 Regression:
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical Applications
 credit approval
 target marketing
 medical diagnosis
 treatment effectiveness analysis
Why Classification? A motivating
application
 Credit approval
 A bank wants to classify its customers based on whether
they are expected to pay back their approved loans
 The history of past customers is used to train the
classifier
 The classifier provides rules, which identify potentially
reliable future customers
 Classification rule:
 If age = “31...40” and income = high then credit_rating =
excellent
 Future customers
 Paul: age = 35, income = high  excellent credit rating
 John: age = 20, income = medium  fair credit rating
Classification—A Two-Step Process
 Model construction: describing a set of predetermined
classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction: training set
 The model is represented as classification rules, decision
trees, or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test samples is compared with the

classified result from the model

 Accuracy rate is the percentage of test set samples that

are correctly classified by the model

 Test set is independent of training set, otherwise over-

fitting will occur

Classification Process (1):
Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Classification Process (2): Use
the Model in Prediction
Accuracy=?
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Mellisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Supervised vs. Unsupervised
Learning
 Supervised learning (classification)
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues regarding classification and
prediction (1): Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
 numerical attribute income  categorical
{low,medium,high}
 normalize all numerical attributes to [0,1)
Issues regarding classification and
prediction (2): Evaluating Classification
Methods
 Predictive accuracy
 Speed
 time to construct the model
 time to use the model
 Robustness
 handling noise and missing values
 Scalability
 efficiency in disk-resident databases
 Interpretability:
 understanding and insight provided by the model
 Goodness of rules (quality)
 decision tree size
 compactness of classification rules
Classification by Decision Tree
Induction
 Decision tree
 A flow-chart-like tree structure
 Internal node denotes a test on an attribute
 Branch represents an outcome of the test
 Leaf nodes represent class labels or class distribution
 Decision tree generation consists of two phases
 Tree construction
 At start, all the training examples are at the root

 Partition examples recursively based on selected attributes

 Tree pruning
 Identify and remove branches that reflect noise or outliers

 Use of decision tree: Classifying an unknown sample

 Test the attribute values of the sample against the decision tree
Training Dataset
age income student credit_rating buys_computer
This <=30 high no fair no
<=30 high no excellent no
follows 31…40 high no fair yes
an >40 medium no fair yes
example >40 low yes fair yes
>40 low yes excellent no
from 31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
<=30 low yes fair yes
ID3 >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Output: A Decision Tree for
“buys_computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Scalable Decision Tree Induction Methods

 SLIQ (EDBT’96 — Mehta et al.)

 Builds an index for each attribute and only class list and the
current attribute list reside in memory
 SPRINT (VLDB’96 — J. Shafer et al.)
 Constructs an attribute list data structure
 PUBLIC (VLDB’98 — Rastogi & Shim)
 Integrates tree splitting and tree pruning: stop growing the
tree earlier
 RainForest (VLDB’98 — Gehrke, Ramakrishnan &
Ganti)
 Builds an AVC-list (attribute, value, class label)
 BOAT (PODS’99 — Gehrke, Ganti, Ramakrishnan &
Loh)
 Uses bootstrapping to create several small samples

2 aug (1)
No ratings yet
2 aug (1)
31 pages
Tanel Poder Oracle Execution Plans
No ratings yet
Tanel Poder Oracle Execution Plans
32 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Unit 4- Classification and Prediction
No ratings yet
Unit 4- Classification and Prediction
72 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
331mt 3.1 (1)
No ratings yet
331mt 3.1 (1)
36 pages
CH 5
No ratings yet
CH 5
84 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Copy of Classification-1
No ratings yet
Copy of Classification-1
48 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification
No ratings yet
Classification
81 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
New Classification11
No ratings yet
New Classification11
98 pages
Classification
No ratings yet
Classification
33 pages
Down 4
No ratings yet
Down 4
83 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
UNIT 3 DM
No ratings yet
UNIT 3 DM
34 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
7 Classification
100% (3)
7 Classification
63 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Classification
No ratings yet
Classification
73 pages
Unit-3
No ratings yet
Unit-3
53 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit 3
No ratings yet
Unit 3
16 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
AZ 301 StarWar
No ratings yet
AZ 301 StarWar
188 pages
C 21
No ratings yet
C 21
1 page
Class 12th Ip Project Final..-3
No ratings yet
Class 12th Ip Project Final..-3
23 pages
Deepanshu Sethi Azure Data Engineer
No ratings yet
Deepanshu Sethi Azure Data Engineer
2 pages
Bda Manual
No ratings yet
Bda Manual
47 pages
Data Warehousing and Management (Compilation) Edited
No ratings yet
Data Warehousing and Management (Compilation) Edited
438 pages
POWER BI INTERVIEW QUESTIONS-1
No ratings yet
POWER BI INTERVIEW QUESTIONS-1
49 pages
Y2 S2 Final Exam
No ratings yet
Y2 S2 Final Exam
7 pages
Difference Between Spatial and Temporal Data Mining
No ratings yet
Difference Between Spatial and Temporal Data Mining
5 pages
01rag For LLM A Survey
No ratings yet
01rag For LLM A Survey
21 pages
Intro Palco
No ratings yet
Intro Palco
88 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
4 pages
Database Ass
No ratings yet
Database Ass
25 pages
Singer Touch and Sew 758 Sewing Machine Instruction Manual
No ratings yet
Singer Touch and Sew 758 Sewing Machine Instruction Manual
82 pages
MVC - Entity Framework - SQL Server-1
No ratings yet
MVC - Entity Framework - SQL Server-1
56 pages
DBMS Week-9 Assignment
No ratings yet
DBMS Week-9 Assignment
2 pages
Advance Java Programming-question bank-2 Answer Key (1)
No ratings yet
Advance Java Programming-question bank-2 Answer Key (1)
9 pages
Top 20 DB Monitoring SQL Scripts For DBAs-2
No ratings yet
Top 20 DB Monitoring SQL Scripts For DBAs-2
11 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Chapter 10 - Interface Python With MySQL
No ratings yet
Chapter 10 - Interface Python With MySQL
7 pages
Hibernate Notes
No ratings yet
Hibernate Notes
66 pages
Introduction to RDF and SPARQL
No ratings yet
Introduction to RDF and SPARQL
45 pages
Discovering Computers 2011: Living in A Digital World
No ratings yet
Discovering Computers 2011: Living in A Digital World
43 pages
Data Base Assignment
No ratings yet
Data Base Assignment
34 pages
XII CS - Lab Manual Index - 2022-23
No ratings yet
XII CS - Lab Manual Index - 2022-23
2 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Database Management Systems
No ratings yet
Database Management Systems
67 pages
ADF Copy Data
No ratings yet
ADF Copy Data
85 pages