0% found this document useful (0 votes)

5 views

DWM UNIT-V NOTES

The document discusses classification and prediction in data analysis, highlighting their applications in banking, marketing, and medical research. It outlines the processes involved in classification, including data cleaning, relevance analysis, and decision tree induction, as well as comparing different classification methods like Bayesian classifiers and Support Vector Machines. Key concepts such as information gain and rule-based classification are also explained, emphasizing their importance in building effective predictive models.

Uploaded by

mohansaigudipalli1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

DWM UNIT-V NOTES

Uploaded by

mohansaigudipalli1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT-V-I

Classification and Prediction

What Is Classification? What Is Prediction?

A bank loans officer needs analysis of her data in order to learn which loan applicants are
―safe‖ and which are ―risky‖ for the bank. A marketing manager at All Electronics needs data
analysis to help guess whether a customer with a given profile will buy a new computer. A medical
researcher wants to analyze breast cancer data in order to predict which one of three specific
treatments a patient should receive. In each of these examples, the data analysis task is classification,
where a model or classifier is constructed to predict categorical labels, such as ―safe‖ or ―risky‖ for
the loan application data; ―yes‖ or ―no‖ for the marketing data; or ―treatment A,‖ ―treatment B,‖ or
―treatment C‖ for the medical data. These categories can be represented by discrete values, where the
ordering among values has no meaning. For example, the values 1, 2, and 3 may be used to
represent treatments A, B, and C, where there is no ordering implied among this group of treatment
regimes.

Suppose that the marketing manager would like to predict how much a given cus tomer will spend
during a sale at AllElectronics. This data analysis task is an example of numeric prediction, where the
model constructed predicts a continuous- valued function, or ordered value, as opposed to a categorical
label. This model is a predictor

“How does classification work? Data classification is a two-step process, as shown for the

loan application data of Figure 6.1. (The data are simplified for illustrative purposes. In reality, we
may expect many more attributes to be considered.) In the first step, a classifier is built describing a
predetermined set of data classes or concepts. This is the learning step (or training phase), where a
classification algorithm builds the classifier by analyzing or ―learning from‖ a training set made up of
database tuples and their associated class labels
Issues Regarding Classification and Prediction

• Data cleaning: This refers to the preprocessing of data in order to remove or reduce noise (by
applying smoothing techniques, for example) and the treatment of missing values (e.g., by
replacing a missing value with the most commonly occurring value for that attribute, or with
the most probable value based on statistics). Although most classification algorithms have some
mechanisms for handling noisy or missing data, this step can help reduce confusion during
learning.

• Relevance analysis: Many of the attributes in the data may be redundant. Correlation analysis
can be used to identify whether any two given attributes are statistically related. For example, a
strong correlation between attributes A1 and A2 would suggest that one of the two could be
removed from further analysis. A database may also contain irrelevant attributes. Attribute subset
selection4 can be used in these cases to find a reduced set of attributes such that the resulting
probability distribution of the data classes is as close as possible to the original distribution
obtained using all attributes. Hence, relevance analysis, in the form of correlation analysis and
attribute subset selection, can be used to detect attributes that do not contribute to the
classification or prediction task. Including such attributes may otherwise slow down, and
possibly mislead, the learning step. Ideally, the time spent on relevance analysis, when added
to the time spent on learning from the resulting ―reduced‖ attribute (or feature) subset, should be
less than the time that would have been spent on learning from the original set of attributes.
Hence, such analysis can help improve classification efficiency and scalability.

• Data transformation and reduction: The data may be transformed by normalization,

particularly when neural networks or methods involving distance measurements are used in the
learning step. Normalization involves scaling all values for a given attribute so that they fall
within a small specified range, such as -1.0 to 1.0, or 0.0 to 1.0. In methods that use distance
measurements, for example, this would prevent attributes with initially large ranges (like, say,
income) from out weighing attributes with initially smaller ranges (such as binary attributes)

Comparing Classification and Prediction Methods

Classification and prediction methods can be compared and evaluated according to the following
criteria
• Accuracy
• Speed
• Robustness
• Scalability
• Interpretability

Classification by Decision Tree Induction (16 Mark Question)

Decision tree induction is the learning of decision trees from class -labeled training tuples. A
decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test
on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node)
holds a class label. The topmost node in a tree is the root node.

A typical decision tree is shown in Figure. It represents the concept buys computer, that is, it predicts
whether a customer at All Electronics is likely to purchase a computer. Internal nodes are denoted by
rectangles, and leaf nodes are denoted by ovals. Some decision tree algorithms produce only binary
trees (where each internal node branches to exactly two other nodes), whereas others can produce
non binary trees.

“How are decision trees used for classification?” Given a tuple, X, for which the associated class
label is unknown, the attribute values of the tuple are tested against the decision tree. A path is traced
from the root to a leaf node, which hold s the class prediction for that tuple. Decision trees can easily
be converted to classification rules.

Decision Tree Induction

The algorithm is called with three parameters: D, attribute list, and Attribute selection method. We
refer to D as a data partition. Initially, it is the complete set of training tuples and their associated class
labels. The parameter attribute list is a list of attributes describing the tuples. Attribute selection
method specifies a heuristic procedure for selecting the attribute that
―best‖ discriminates the given tuples according to class. This procedure employs an attribute
selection measure, such as information gain or the gini index. Whether the tree is strictly binary is
generally driven by the attribute selection measure. Some attribute selection measures, such as the
gini index, enforce the resulting tree to be binary. Others, like information gain, do not, therein
allowing multi way splits (i.e., two or more branches to be grown from a node).
• The tree starts as a single node, N, representing the training tuples in D (step 1)

• If the tuples in D are all of the same class, then node N becomes a leaf and is labeled with
that class (steps 2 and 3). Note that steps 4 and 5 are terminating conditions. All of the
terminating conditions are explained at the end of the algorithm.

• Otherwise, the algorithm calls Attribute selection method to determine the splitting criterion.
The splitting criterion tells us which attribute to test at node N by determining the ―best‖ way
to separate or partition the tuples in D into individual classes(step 6). The splitting criterion also
tells us which branches to grow from node N with respect to the outcomes of the chosen test.
More specifically, the splitting criterion indicates the splitting attribute and may also indicate
either a split-point or a splitting subset. The splitting criterion is determined so that, ideally,
the resulting partitions at each branch are as ―pure‖ as possible. A partition is pure if all of the
tuples in it belong to the same class. In other words, if we were to split up the tuples in D
according to the mutually exclusive outcomes of the splitting criterion, we hope for the resulting
partitions to be as pure as possible.

• The node N is labeled with the splitting criterion, which serves as a test at the node (step 7).
A branch is grown from node N for each of the outcomes of the splitting criterion. The tuples
in D are partitioned accordingly (steps 10 to 11). There are three possible scenarios, as
illustrated in Figure. Let A be the splitting attribute. A has v distinct values, {a1, a2, : : : , av},
based on the training data.

Information gain

ID3 uses information gain as its attribute selection measure.

Information gain is defined as the difference between the original information requirement
(i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after partitioning
on A). That is,

In other words, Gain(A) tells us how much would be gained by branching on A. It is the expected
reduction in the information requirement caused by knowing the value of A. The attribute A with the
highest information gain, (Gain(A)), is chosen as the splitting attribute at node N.

Example 6.1 Induction of a decision tree using information gain.

Table 6.1 presents a training set, D, of class-labeled tuples randomly selected from the All
Electronics customer database. (The data are adapted from [Qui86]. In this example, each attribute is
discrete-valued. Continuous-valued attributes have been generalized.) The class label attribute, buys
computer, has two distinct values (namely, {yes, no}); therefore, there are two distinct classes (that is,
m = 2). Let class C1 correspond to yes and class C2 correspond to no. There are nine tuples of class
yes and five tuples of class no. A (root) node N is created for the tuples in D. To find the splitting
criterion for these tuples, we must compute the information gain of each attribute. We first use
Equation (6.1) to compute the expected information needed to classify a tuple in D:
The expected information needed to classify a tuple in D if the tuples are partitioned according to age is

Hence, the gain in information from such a partitioning would be

Similarly, we can compute Gain(income) = 0.029 bits, Gain(student) = 0.151 bits, and Gain(credit
rating) = 0.048 bits. Because age has the highest information gain among the attributes, it is selected
as the splitting attribute. Node N is labeled with age, and branches are grown for each of the
attribute’s values. The tuples are then partitioned accordingly, as shown in Figure 6.5. Notice that the
tuples falling into the partition for age = middle aged all belong to the same class. Because they all
belong to class “yes,” a leaf should therefore be created at the end of this branch and labeled with “yes.”
The final decision tree returned by the algorithm is shown in Figure 6.5.
Bayesian Classification (16 mark Question )

“What are Bayesian classifiers?” Bayesian classifiers are statistical classifiers. They can predict
class membership probabilities, such as the probability that a given tuple belongs to a particular class.

Bayesian classification is based on Bayes’ theorem, described below. Studies comparing classification
algorithms have found a simple Bayesian classifier known as the naïve Bayesian classifier to be
comparable in performance with decision tree
selected neural network classifiers. Bayesian classifiers have also exhibited high accuracy and speed
when applied to large databases.

1) Bayes’ Theorem

Bayes’ theorem is named after Thomas Bayes, a nonconformist English clergyman who did
early work in probability and decision theory during the 18th century. Let X be a data tuple. In Bayesian
terms, X is considered ―evidence.‖ As usual, it is described by measurements made on a set of n
attributes. Let H be some hypothesis, such as that the data tuple X belongs to a specified class C. For
classification problems, we want to determine P(HjX), the probability that the hypothesis H holds
given the ―evidence‖ or observed data tuple X. In other words, we are looking for the probability that
tuple X belongs to class C, given that we know the attribute description of X.

“How are these probabilities estimated?” P(H), P(XjH), and P(X) may be estimated from the
given data, as we shall see below. Bayes’ theorem is useful in that it provides a way of calculating the
posterior probability, P(HjX), from P(H), P(XjH), and P(X). Bayes’ theorem is
2) Naïve Bayesian Classification
Rule-Based Classification
We look at rule-based classifiers, where the learned model is represented as a set of IF-THEN rules. We
first examine how such rules are used for classification. We then study ways in which they can be
generated, either from a decision tree or directly from the training data using a sequential covering
algorithm.

1) Using IF-THEN Rules for Classification

Rules are a good way of representing information or bits of knowledge. A rule-based classifier uses a set
of IF-THEN rules for classification. An IF-THEN rule is an expression of the form

IF condition THEN conclusion.

An example is rule R1,

R1: IF age = youth AND student = yes THEN buys computer = yes.

The ―IF‖-part (or left-hand side)of a rule is known as the rule antecedent or precondition. The
―THEN‖-part (or right- hand side) is the rule consequent. In the rule antecedent, the condition
consists of one or more attribute tests (such as age = youth, and student = yes) that are logically
ANDed. The rule’s consequent contains a class prediction (in this case, we are predicting whether a
customer will buy a computer). R1 can also be written as

If the condition (that is, all of the attribute tests) in a rule antecedent holds true for a given tuple,
we say that the rule antecedent is satisfied (or simply, that the rule is satisfied) and that the rule covers
the tuple.

A rule R can be assessed by its coverage and accuracy. Given a tuple, X, from a class labeled
data set D, let ncovers be the number of tuples covered by R; ncorrect be the number of tuples correctly
classified by R; and |D| be the number of tuples in
D. We can define the coverage and accuracy of R as

That is, a rule’s coverage is the percentage of tuples that are covered by the rule (i.e. whose
attribute values hold true for the rule’s antecedent). For a rule’s accuracy, we look at the tuples that it
covers and see what percentage of them the rule can correctly classify.

2) Rule Extraction from a Decision Tree

We learned how to build a decision tree classifier from a set of training data. Decision tree
classifiers are a popular method of classification—it is easy to understand how decision trees work and
they are known for their accuracy. Decision trees can become large and difficult to interpret. In this
subsection, we look at how to build a rule based classifier by extracting IF-THEN rules from a decision
tree. In comparison with a decision tree, the IF-THEN rules may be easier for humans to understand,
particularly if the decision tree is very large.

To extract rules from a decision tree, one rule is created for each path from the root to a leaf
node. Each splitting criterion along a given path is logically ANDed to form the rule antecedent (―IF‖
part). The leaf node holds the class prediction, formin g the rule conseq ue nt (―T H E N ‖ part) .

Support Vector Machines

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:

➢ Here, the Solid line is called hyperplane, and the other two lines are known as boundary lines.

Example:

SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM
algorithm.
We will first train our model with lots of images of cats and dogs so that it can learn about different
features of cats and dogs, and then we test it with this strange creature .
So as support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog.

SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:
Types of SVM

SVM can be of two types:

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.

SVM PRIMAL AND DUAL:

SVM is defined in two ways one is dual form and the other is the primal form. Both get the same
optimization result but the way they get it is very different. Before we delve deep into mathematics let
me tell you which one is used when. Primal mode is preferred when we don’t need to apply kernel
trick to the data and the dataset is large but the dimension of each data point is small. Dual form is
preferred when data has a huge dimension and we need to apply the kernel trick
Lazy Learners (or Learning from Your Neighbors)
• Lazy vs. eager learning – Eager learning e.g. decision tree
induction, Bayesian classification, rule based classification Given a
set of training set, constructs a classification model before
receiving new (e.g., test) data to classify
• Lazy Learners – Lazy learning e.g., k-nearest-neighbor
classifiers, case-based reasoning classifiers Simply stores training
data (or only minor processing) and waits until it is given a new
instance
• Lazy: less time in training but more time in predicting
Lazy learners store training examples and delay the processing
(“lazy evaluation”) until a new instance must be classified
• Accuracy – Lazy method effectively uses a richer hypothesis
space since it uses many local linear functions to form Lazy
Learners its implicit global approximation to the target function –
Eager: must commit to a single hypothesis that covers the entire
instance space

Example Problem: Face Recognition

We have a database of (say) 1 million face images We are given a new
image and want to find the most similar images in the database
Represent faces by (relatively) invariant values, Lazy Learners e.g., ratio
of nose width to eye width
Each image represented by a large number of numerical features
Problem: given the features of a new face, find those in the DB that are
close in at least ¾ (say) of the features Introduction
Typical approaches – k-nearest neighbor approach Instances represented
as points in a Euclidean space. – Case-based reasoning Uses symbolic
representations and knowledge-based inference

Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Module 3 Notes (1)
No ratings yet
Module 3 Notes (1)
31 pages
CH 4
No ratings yet
CH 4
21 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Module 3
No ratings yet
Module 3
64 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
CH 5
No ratings yet
CH 5
84 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
4. Classification
No ratings yet
4. Classification
75 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Classification
No ratings yet
Classification
81 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
classification-by-decision-tree-induction
No ratings yet
classification-by-decision-tree-induction
25 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Classification
100% (1)
Classification
37 pages
Classification
No ratings yet
Classification
73 pages
dm4
No ratings yet
dm4
68 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Copy of Classification-1
No ratings yet
Copy of Classification-1
48 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
System Analysis and Design
100% (1)
System Analysis and Design
24 pages
Production of Bioethanol From Oil Palm E
No ratings yet
Production of Bioethanol From Oil Palm E
8 pages
IPR
No ratings yet
IPR
18 pages
Y5 Unit 1 Worksheets 2
No ratings yet
Y5 Unit 1 Worksheets 2
42 pages
Introduction To Science: Study Guide For Module No. 1
No ratings yet
Introduction To Science: Study Guide For Module No. 1
6 pages
Daily dose - set - 13
No ratings yet
Daily dose - set - 13
36 pages
HRMantra Diamond Channel Partnership Agreement
No ratings yet
HRMantra Diamond Channel Partnership Agreement
30 pages
tm160128AFB TIANMA
No ratings yet
tm160128AFB TIANMA
23 pages
The Program As Problem: Origins and Impact of CRS's Problem Seeking
No ratings yet
The Program As Problem: Origins and Impact of CRS's Problem Seeking
9 pages
VSD Workshop
No ratings yet
VSD Workshop
121 pages
7.4 Rca and Capa Process For Daily Work Management
No ratings yet
7.4 Rca and Capa Process For Daily Work Management
3 pages
Hpe Partner Promotion Programs: Ngoc Dang
No ratings yet
Hpe Partner Promotion Programs: Ngoc Dang
27 pages
20250107_Wood Pole Inspection and Testing
No ratings yet
20250107_Wood Pole Inspection and Testing
4 pages
Literature Review Turabian Example
100% (1)
Literature Review Turabian Example
4 pages
NW Lec 8
No ratings yet
NW Lec 8
35 pages
Baby Animals LP PDF
No ratings yet
Baby Animals LP PDF
7 pages
Chapter 3 (Descriptive)
No ratings yet
Chapter 3 (Descriptive)
78 pages
Project Assignment
No ratings yet
Project Assignment
30 pages
ARUF Series: Multi-Position, Multi-Speed Air Handler With PSC Motor 1 To 5 Tons
No ratings yet
ARUF Series: Multi-Position, Multi-Speed Air Handler With PSC Motor 1 To 5 Tons
12 pages
Fertilisers IX
100% (1)
Fertilisers IX
2 pages
10th Class Math Notes Exercise 1.2 PDF
No ratings yet
10th Class Math Notes Exercise 1.2 PDF
5 pages
Datasheet 74153
No ratings yet
Datasheet 74153
7 pages
The Rift Chronicles
No ratings yet
The Rift Chronicles
53 pages
RRB - Guwahati - V2 List of Candidates Shortlisted For PET CEN 2-2018
No ratings yet
RRB - Guwahati - V2 List of Candidates Shortlisted For PET CEN 2-2018
33 pages
HANDWASHING
No ratings yet
HANDWASHING
23 pages
Economic - Currency Exchange Rate
No ratings yet
Economic - Currency Exchange Rate
4 pages
PCO 202 Pharmacology - Basics-2
No ratings yet
PCO 202 Pharmacology - Basics-2
18 pages
Crossword
No ratings yet
Crossword
1 page
Restaurant Style Tandoori Butter Chicken
No ratings yet
Restaurant Style Tandoori Butter Chicken
2 pages
Start Investing Article Assignment 2024-09-09 15 - 47 - 16
No ratings yet
Start Investing Article Assignment 2024-09-09 15 - 47 - 16
2 pages

DWM UNIT-V NOTES

Uploaded by

DWM UNIT-V NOTES

Uploaded by

UNIT-V-I

Classification and Prediction

What Is Classification? What Is Prediction?

• Data transformation and reduction: The data may be transformed by normalization,

Comparing Classification and Prediction Methods

Classification by Decision Tree Induction (16 Mark Question)

Decision Tree Induction

ID3 uses information gain as its attribute selection measure.

Example 6.1 Induction of a decision tree using information gain.

Hence, the gain in information from such a partitioning would be

1) Using IF-THEN Rules for Classification

IF condition THEN conclusion.

An example is rule R1,

2) Rule Extraction from a Decision Tree

Support Vector Machines

SVM can be of two types:

SVM PRIMAL AND DUAL:

Example Problem: Face Recognition

You might also like