0% found this document useful (0 votes)

21 views

5.classification and Prediction

Unit-5 discusses classification and prediction techniques in data mining. Classification predicts categorical labels and builds a prediction model by analyzing labeled training data. Decision tree classification uses a tree structure with internal nodes for attributes, branches for outcomes, and leaf nodes for predictions. It is easy to interpret and fast to learn. Entropy measures the uncertainty or impurity in data, and guides how decision trees split data nodes to decrease entropy and increase purity at each branch. Bayesian classification uses Bayes' theorem to predict events based on conditional probabilities.

Uploaded by

Bibek Neupane

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

5.classification and Prediction

Uploaded by

Bibek Neupane

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit-5 Classification and Prediction

5.1 Concept of Classification and Prediction, Evaluating Classification

Algorithms
 Concept of Classification and Prediction

We use classification and prediction to extract a model, representing the data

classes to predict future data trends. This analysis provides us the best
understanding of the data at a large scale. Classification predicts the categorical
labels of data with the prediction models.

 Classification:
The world of data mining is known as an interdisciplinary one. It requires
a range of disciplines such as analytics, database systems, machine
learning, simulation, and information sciences. The classification of the
data mining system allows users to understand the system and to align their
criteria with such systems. Classification is about the discovery of a model
that distinguishes groups and concepts of data. The definition is to forecast
the class of objects by using this model. The derived model relies on the
study of training data sets.

A classification task starts with a data set where the assignments of the
class are known. For example, based on observable data for multiple loan
borrowers over some time, a classification model may be established that
forecasts credit risk. The data could track job records, homeownership or
leasing, years of residency, number, and type of deposits, in addition to the
historical credit ranking, and so on. The goal would be credit ranking, the
predictors would be the other characteristics, and the data would represent
a case for each consumer.

How does Classification works?

With the help of the bank loan application that we have discussed above,
let us understand the working of classification. The Data Classification
process includes two steps –
 Building a Classifier or Model
 Using Classifier for Classification

 Prediction:
Prediction is one of the data mining techniques that discovers the relation
between independent variables and relation between dependent variables.
To detect the inaccessible data, it uses regression analysis and detects the
missing numeric values in the data. If the class mark is absent, so
classification is used to render the prediction. Due to its relevance in
business intelligence, the prediction is common. If the class mark is absent,
so the prediction is performed using classification.

Following are the examples of cases where the data analysis task is
Prediction:
Suppose the marketing manager needs to predict how much a particular
customer will spend at his company during a sale. We are bothered to
forecast a numerical value in this case. Therefore, an example of numeric
prediction is the data processing activity. In this case, a model or a
predictor will be developed that forecasts a continuous or ordered value
function.

 Evaluating Classification Algorithms

In data mining, classification involves the problem of predicting which category

or class a new observation belongs in. The derived model (classifier) is based on
the analysis of a set of training data where each data is given a class label. The
trained model (classifier) is then used to predict the class label for new, unseen
data.

To understand classification metrics, one of the most important concepts is the

confusion matrix.

Fig: Confusion Matrix

 True Positive: Prediction Positive and result true.
 True Negative: Prediction Negative and result true.
 False Positive (Type 1 error): Prediction Positive result wrong.
 False Negative (Type 2 error): Prediction Negative result wrong.

For accuracy:

The precision metric shows the accuracy of the positive class. It measures how
likely the prediction of the positive class is correct.

Sensitivity computes the ratio of positive classes correctly detected. This metric
gives how good the model is to recognize a positive class.

5.2 Bayesian Classification, Decision Tree Classification, Concept of Entropy

 Bayesian Classification
Bayesian classification uses Bayes theorem to predict the occurrence of any
event. Bayesian classifiers are the statistical classifiers with the Bayesian
probability understandings. The theory expresses how a level of belief, expressed
as a probability.
Bayes's theorem is expressed mathematically by the following equation that is
given below.

Where X and Y are the events and P (Y) ≠ 0

P(X/Y) is a conditional probability that describes the occurrence of event X is
given that Y is true.
P(Y/X) is a conditional probability that describes the occurrence of event Y is
given that X is true.
P(X) and P(Y) are the probabilities of observing X and Y independently of each
other. This is known as the marginal probability.
Interpretation:

In the Bayesian interpretation, probability determines a "degree of belief." Bayes

theorem connects the degree of belief in a hypothesis before and after accounting
for evidence. For example, lets us consider an example of the coin. If we toss a
coin, then we get either heads or tails, and the percent of occurrence of either
heads and tails is 50%. If the coin is flipped numbers of times, and the outcomes
are observed, the degree of belief may rise, fall, or remain the same depending
on the outcomes.
For proposition X and evidence Y,
 P(X), the prior, is the primary degree of belief in X
 P(X/Y), the posterior is the degree of belief having accounted for Y.
 The quotient represents the supports Y provides for X.

Bayes theorem can be derived from the conditional probability:

Where P (X⋂Y) is the joint probability of both X and Y being true, because

Bayesian Network:

Bayesian Network falls under the classification of Probabilistic Graphical

Modelling (PGM) procedure that is utilized to compute uncertainties by utilizing
the probability concept. Generally known as Belief Networks, Bayesian
Networks are used to show uncertainties using Directed Acyclic
Graphs (DAG)

A Directed Acyclic Graph is used to show a Bayesian Network, and like some
other statistical graph, a DAG consists of a set of nodes and links, where the links
signify the connection between the nodes.
The nodes here represent random variables, and the edges define the relationship
between these variables.

A DAG models the uncertainty of an event taking place based on the Conditional
Probability Distribution (CDP) of each random variable. A Conditional
Probability Table (CPT) is used to represent the CPD of each variable in a
network.

 Decision tree classification

A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome
of a test, and each leaf node holds a class label. The topmost node in the tree is
the root node.

Fig: Decision tree

Benefits of Decision tree:
 It does not require any domain knowledge.
 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and
fast.
Decision Tree Terminologies:
 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from
the tree.
 Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits
further into the next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision node
splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:
 Concept of Entropy

Entropy is the measures of impurity, disorder or uncertainty in a bunch of

examples.

Entropy controls how a Decision Tree decides to split the data. It actually effects
how a Decision Tree draws its boundaries.

On the figure below is depicted the splitting process. Red rings and blue crosses
symbolize elements with 2 different labels. The decision starts by evaluating the
feature values of the elements inside the initial set. Based on their values,
elements are put in Set 1 or Set 2. In this example, after the splitting, the state
seems tidier, most of the red rings have been put in Set 1 while a majority of blue
crosses are in Set 2.

So decision trees are here to tidy the dataset by looking at the values of the feature
vector associated with each data point. Based on the values of each feature,
decisions are made that eventually leads to a leaf and an answer.

At each step, each branching, you want to decrease the entropy, so this quantity
is computed before the cut and after the cut. If it decreases, the split is validated
and we can proceed to the next step, otherwise, we must try to split with another
feature or stop this branch.

Before and after the decision, the sets are different and have different sizes. Still,
entropy can be compared between these sets, using a weighted sum, as we will
see in the next section.
Equation of Entropy:

H(S) = -P (+) log2(p (+)) – P (-) log2(P (-))

P (+)/P (-) = Probability of positive class / Probability of negative class
S = Subset of training example

5.3 Linear Regression, Concept of Non-linear regression

Regression can be defined as a data mining technique that is generally used for the
purpose of predicting a range of continuous values (which can also be called
“numeric values”) in a specific dataset. For example, Regression can predict sales,
profits, temperature, distance and so on.
 Linear Regression
It is simplest form of regression. Linear regression attempts to model the
relationship between two variables by fitting a linear equation to observe the data.
Linear regression attempts to find the mathematical relationship between
variables. If outcome is straight line then it is considered as linear model and if it
is curved line, then it is a nonlinear model. The relationship between dependent
variable is given by straight line and it has only one independent variable.
Y =A +BX
Model ‘Y', is a linear function of 'X'. The value of 'Y' increases or decreases in
linear manner according to which the value of 'X' also changes.
 Concept of Non-linear regression
Non-Linear regression is a type of polynomial regression. It is a method to model
a non-linear relationship between the dependent and independent variables. It is
used in place when the data shows a curvy trend, and linear regression would not
produce very accurate results when compared to non-linear regression. This is
because in linear regression it is pre-assumed that the data is linear.

The scatter plot shows the relationship between GDP and time of a country, but
the relationship is not linear. Instead after 2005 the line starts to become curve
and does not follow a linear straight path. In such cases, a special estimation
method is required called the non-linear regression.

Teaching As Physics Practical Skills
100% (2)
Teaching As Physics Practical Skills
86 pages
18R 97
No ratings yet
18R 97
16 pages
BS en 12390-7-2019
100% (1)
BS en 12390-7-2019
14 pages
Unit 3
No ratings yet
Unit 3
16 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
4 Classification
No ratings yet
4 Classification
20 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Classification
No ratings yet
Classification
33 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
CH 5
No ratings yet
CH 5
84 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
UNIT 5 NOTES DWM
No ratings yet
UNIT 5 NOTES DWM
18 pages
UNIT 3 DM
No ratings yet
UNIT 3 DM
34 pages
Unit 4
No ratings yet
Unit 4
20 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Down 4
No ratings yet
Down 4
83 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Unit 4
No ratings yet
Unit 4
186 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
7 Classification
100% (3)
7 Classification
63 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
Class Basic
No ratings yet
Class Basic
67 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Unit 2
No ratings yet
Unit 2
17 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
unit 5
No ratings yet
unit 5
25 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Logger Basics
No ratings yet
Data Logger Basics
17 pages
Executive Dysfunction in Criminal Populations Comparing Forensic Psychiatric Patients and Correctional Offenders
No ratings yet
Executive Dysfunction in Criminal Populations Comparing Forensic Psychiatric Patients and Correctional Offenders
18 pages
ASTM E691-16
No ratings yet
ASTM E691-16
22 pages
CH 1 - ERRORS
100% (1)
CH 1 - ERRORS
31 pages
List Drift Untuk Pak Yuli
No ratings yet
List Drift Untuk Pak Yuli
20 pages
Synthetic Minority Over-Sampling Technique (Smote) For Predicting Software Build Outcomes
No ratings yet
Synthetic Minority Over-Sampling Technique (Smote) For Predicting Software Build Outcomes
6 pages
PS Drone Documentation
No ratings yet
PS Drone Documentation
85 pages
Dissection Rubric
No ratings yet
Dissection Rubric
2 pages
1 s2.0 S1877050920309091 Main
No ratings yet
1 s2.0 S1877050920309091 Main
10 pages
Biomarker Terminology Speaking The Same Language
No ratings yet
Biomarker Terminology Speaking The Same Language
12 pages
TLEG 10 Q2Module 6 HWM Accurcy of Keeping Updating Client Record Anjanette D. Balobalo
100% (1)
TLEG 10 Q2Module 6 HWM Accurcy of Keeping Updating Client Record Anjanette D. Balobalo
17 pages
UT Gain or Loss in DB
100% (1)
UT Gain or Loss in DB
3 pages
S.E.RTS-Chapter 5
No ratings yet
S.E.RTS-Chapter 5
96 pages
SPE 151846 Stressed Rock Penetration Depth Correlation: Ref 0 BI
No ratings yet
SPE 151846 Stressed Rock Penetration Depth Correlation: Ref 0 BI
10 pages
Cambridge International AS & A Level: Mathematics 9709/12 October/November 2021
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/12 October/November 2021
21 pages
Bellaachia PDF
No ratings yet
Bellaachia PDF
4 pages
PAO Manual
No ratings yet
PAO Manual
3 pages
Intelligence Cycle: Icitap Philippines
100% (4)
Intelligence Cycle: Icitap Philippines
47 pages
A Level Misa Meet PPT 2021
No ratings yet
A Level Misa Meet PPT 2021
113 pages
C. A. E. Goodhart - R. J. Bhansali
No ratings yet
C. A. E. Goodhart - R. J. Bhansali
64 pages
Pennycook G. & Rand D. (2019) - Susceptibility To Partisan Fake News Is Better Explained by Lack of Reasoning Than by Motivated Reasoning
No ratings yet
Pennycook G. & Rand D. (2019) - Susceptibility To Partisan Fake News Is Better Explained by Lack of Reasoning Than by Motivated Reasoning
12 pages
Non-Destructive Testing of Concrete Structures Under Water
No ratings yet
Non-Destructive Testing of Concrete Structures Under Water
6 pages
Preview - ISO 8655 1 2022
No ratings yet
Preview - ISO 8655 1 2022
10 pages
Solution MQC
No ratings yet
Solution MQC
16 pages
7 C's of Effective Business Communication
79% (14)
7 C's of Effective Business Communication
6 pages
ReviewingBookonLandSurvey G29
No ratings yet
ReviewingBookonLandSurvey G29
12 pages
RIICWD601E Project Portfolio Assessor.v1.0
No ratings yet
RIICWD601E Project Portfolio Assessor.v1.0
21 pages

5.classification and Prediction

Uploaded by

5.classification and Prediction

Uploaded by

Unit-5 Classification and Prediction

5.1 Concept of Classification and Prediction, Evaluating Classification

We use classification and prediction to extract a model, representing the data

How does Classification works?

 Evaluating Classification Algorithms

In data mining, classification involves the problem of predicting which category

To understand classification metrics, one of the most important concepts is the

Fig: Confusion Matrix

5.2 Bayesian Classification, Decision Tree Classification, Concept of Entropy

Where X and Y are the events and P (Y) ≠ 0

In the Bayesian interpretation, probability determines a "degree of belief." Bayes

Bayes theorem can be derived from the conditional probability:

Bayesian Network falls under the classification of Probabilistic Graphical

 Decision tree classification

Fig: Decision tree

Entropy is the measures of impurity, disorder or uncertainty in a bunch of

H(S) = -P (+) log2(p (+)) – P (-) log2(P (-))

5.3 Linear Regression, Concept of Non-linear regression

You might also like