AIML - UNIT-4 Modified
AIML - UNIT-4 Modified
the world
College of
Engineeri
ng
Unit-IV
Contents
Basic idea:
If it walks like a duck, quacks like a duck, then
it’s probably a duck
Comput
e Test
Distanc Record
e
Training Choose k of
Records the “nearest”
records
Nearest Neighbor Classifiers
1 1 1111111 0 0 0000000
1 1 0 v 0 0 1
0 1 1111111 s 1 0 0000000
1 1 1 0 0 0
Euclidean distance = 1.4142 for both
pairs, but the cosine similarity measure
has different values for these pairs.
Nearest Neighbor Classification…
Algorithm:
Majority Voting
Distance-Weighted Voting
One way to reduce the impact of k is to weight the influence of
each nearest neighbor xi according to its distance: wi =
As a result, training examples that are located far away from z
have a weaker impact on the classification compared to those
that are located close to z.
Nearest-neighbor classifiers
Nearest neighbor
classifiers are local
classifiers
They can produce
decision boundaries
of arbitrary shapes.
Characteristics of Nearest-Neighbor Classifiers
Nearest-neighbor classification is part of a more general
technique known as instance-based learning, which
uses specific training instances to make predictions without
having to maintain an abstraction (or model) derived from
data. Instance-based learning algorithms require a proximity
measure to determine the similarity or distance between
instances and a classification function that returns the
predicted class of a test instance based on its proximity to
other instances.
Lazy learners such as nearest-neighbor classifiers do not
require model building. However, because we need to
computclassifying a test example can be quite
expensive e the proximity values individually between the
test and training examples. In contrast, eager learners often
spend the bulk of their computing resources for model
building. Once a model has been built, classifying a test
example is extremely fast
Nearest-neighbor classifiers make their predictions based
on local information, whereas decision tree and rule-based
classifiers attempt to find a global model that fits the entire input
space. Because the classification decisions are made locally, nearest-
neighbor classifiers (with small values of k) are quite susceptible to
noise.
Nearest-neighbor classifiers can produce arbitrarily shaped
decision boundaries. Such boundaries provide a more flexible
model representation compared to decision tree and rule-based
classifiers that are often constrained to rectilinear decision boundaries.
Nearest-neighbor classifiers can produce wrong predictions unless the
appropriate proximity measure and data preprocessing steps are taken.
For example, suppose we want to classify a group of people based on
attributes such as height (measured in meters) and weight (measured
in pounds). The height attribute has a low variability, ranging from 1.5
m to 1.85 m, whereas the weight attribute may vary from 90 lb. to 250
lb. If the scale of the attributes are not taken into
consideration, the proximity measure may be dominated by
differences in the weights of a person.
Nearest Neighbor Classification…
Conditional Probability:
Bayes theorem:
Bayes Classifier
Approach:
compute posterior probability P(Y | X1, X2, …, Xd)
using the Bayes theorem
We need to estimate
P(Evade = Yes | X) and P(Evade =
No | X)
In the following we will replace
Evade = Yes by Yes, and
Evade = No by No
Example Data
Given a Test
Record:
Conditional Independence
P(X | Yes) =
P(Refund = No | Yes) x
P(Divorced | Yes) x
P(Income = 120K | Yes)
P(X | No) =
P(Refund = No | No) x
P(Divorced | No) x
P(Income = 120K | No)
Estimate Probabilities for Categorical Attributes
P(y) = fraction of instances of
class y
e.g., P(No) = 7/10,
P(Yes) = 3/10
For categorical
attributes: P(Xi =c| y)
= nc / n
where |Xi =c| is number of
instances having attribute
value Xi =c and belonging
to class y
Examples:
P(Status=Married|No) =
4/7P(Refund=Yes|Yes)=0
Estimate Probabilities continuous attribute
mean of a
population is
population
variance is
3.1415926
5359
e( mathematical 2.71
constant) 82
Example of Naïve Bayes Classifier
Given a Test
Record:
2/3
P(Marital Status = Divorced | Yes) Since P(X|No)P(No) > P(X|Yes)P(Yes)
= 1/3 Therefore P(No|X) > P(Yes|X) =>
P(Marital Status = Married | Yes)
=0 Class = No
For Taxable Income:
If class = No: sample mean =
110
sample variance = 2975
If class = Yes: sample mean = 90
sample variance = 25
Example of Naïve Bayes Classifier
Given a Test
Record:
Even in absence of
information about any P(Yes) = 3/10
attributes, we can use Apriori P(No) = 7/10
Probabilities of Class
Variable: If we only know that marital status is
Naïve Bayes Classifier: Divorced, then:
P(Refund = Yes | No) = 3/7
P(Yes | Divorced) = 1/3 x 3/10 / P(Divorced)
P(Refund = No | No) = 4/7
P(Refund = Yes | Yes) = 0 P(No | Divorced) = 1/7 x 7/10 / P(Divorced)
P(Refund = No | Yes) = 1 If we also know that Refund = No, then
P(Marital Status = Single | No) =
2/7 P(Yes | Refund = No, Divorced) = 1 x 1/3 x 3/10
P(Marital Status = Divorced | No) / P(Divorced, Refund = No)
= 1/7 P(No | Refund = No, Divorced) = 4/7 x 1/7 x
P(Marital Status = Married | No)
7/10 / P(Divorced, Refund = No)
= 4/7
P(Marital Status = Single | Yes) = If we also know that Taxable Income =
2/3 120, then
P(Marital Status = Divorced | Yes)
P(Yes | Refund = No, Divorced, Income = 120)
= 1/3
P(Marital Status = Married | Yes)
= 1.2 x10-9 x 1 x 1/3
=0 x 3/10 / P(Divorced, Refund = No, Income =
For Taxable Income: 120 )
If class = No: sample mean = P(No | Refund = No, Divorced Income = 120)
110 = 0.0072 x 4/7 x 1/7 x
sample variance = 2975
7/10 / P(Divorced, Refund = No,
If class = Yes: sample mean = 90
sample variance = 25 Income = 120)
Issues with Naïve Bayes Classifier
Given a Test
Record: X = (Married)
n: number of training
instances belonging to
class y
nc: number of
instances with Xi = c
and Y = y
v: total number of
attribute values that
Xi can take
p: initial estimate of
(P(Xi = c|y) known
apriori
m: hyper-parameter
for our confidence in p
Example of Naïve Bayes Classifier
A: attributes
M: mammals
N: non-mammals
Example of Naïve Bayes Classifier
A: attributes
M: mammals
N: non-mammals
D is parent of C
A is child of C
B is descendant of
D
D is ancestor of A
Both B and D are
also non-
descendants of A.
Conditional Independence
D is parent of C
A is child of C
B is descendant of
D
D is ancestor of A
2/14/2024
Ensemble Methods
2/14/2024
Example: Why Do Ensemble Methods Work?
2/14/2024
Necessary Conditions for Ensemble Methods
2/14/2024
Necessary Conditions for Ensemble Methods
2/14/2024
Rationale for Ensemble Learning
2/14/2024
Constructing Ensemble Classifiers
2/14/2024
Methods for Constructing an Ensemble Classifier
1. By manipulating the training set.
In this approach, multiple training sets are
created by resampling the original data
according to some sampling distribution and
constructing a classifier from each training set.
The sampling distribution determines how likely
it is that an example will be selected for training,
and it may vary from one trial to another.
Bagging and boosting are two examples of
ensemble methods that manipulate their training
sets
2/14/2024
Methods for Constructing an Ensemble Classifier
2. By manipulating the input features.
In this approach, a subset of input features is
chosen to form each training set.
The subset can be either chosen randomly or
based on the recommendation of domain
experts.
Some studies have shown that this approach
works very well with data sets that contain
highly redundant features.
Random forest, is an ensemble method that
manipulates its input features and uses decision
trees as its base classifiers.
2/14/2024
Methods for Constructing an Ensemble Classifier
3. By manipulating the class labels.
This method can be used when the number of classes is
sufficiently large.
The training data is transformed into a binary class
problem by randomly partitioning the class labels into two
disjoint subsets, A0 and A1.
Training examples whose class label belongs to the subset
A0 are assigned to class 0, while those that belong to the
subset A1 are assigned to class 1. The relabeled examples
are then used to train a base classifier.
By repeating this process multiple times, an ensemble of
base classifiers is obtained.
When a test example is presented, each base classifier Ci
is used to predict its class label.
If the test example is predicted as class 0, then all the
classes that belong to A0 will receive a vote. Conversely, if
it is predicted to be class 1, then all the classes that belong
to A1 will receive a vote.
The votes are tallied and the class that receives the
2/14/2024
Methods for Constructing an Ensemble Classifier
3. By manipulating the learning algorithm.
Many learning algorithms can be manipulated in such a
way that applying the algorithm several times on the same
training data will result in the construction of different
classifiers.
For example, an ensemble of decision trees can be
constructed by injecting randomness into the tree-growing
procedure. For example, instead of choosing the best
splitting attribute at each node, we can randomly choose
onean
Once ofensemble
the top k of
attributes forhas
classifiers splitting
been learned, a test example
x is classified by combining the predictions made by the base
classifiers
2/14/2024
Bagging (Bootstrap AGGregatING)
2/14/2024
Bagging Algorithm
2/14/2024
Bagging Example
Tru False
e
yleft yrig
2/14/2024 ht
Bagging Example
Bagging Example
Bagging Example
Bagging Example
2/14/2024
Bagging Example
Use majority vote (sign of sum of predictions) to
determine class of ensemble classifier
2/14/2024
Boosting
2/14/2024
AdaBoost
Weight update:
x≤k
Tru False
e
yleft yrig
2/14/2024 ht
Random Forest Algorithm
Construct an ensemble of decision trees by
manipulating training set as well as features
2/14/2024
Random Forest Algorithm
Given a training set D consisting of n instances and d attributes, the basic procedure
of training a random forest classifier can be summarized using the following steps
1. Construct a bootstrap sample Di of the training set by randomly sampling n instances
2/14/2024
Random Forest Algorithm
Decision trees involved in a random forest
are unpruned trees, as they are allowed to
grow to their largest possible size till every
leaf is pure. Hence, the base classifiers of
random forest represent unstable
classifiers that have low bias but high
variance, because of their large size.
Another property of the base classifiers
learned in random forests is the lack of
correlation among their model parameters
and test predictions.
2/14/2024
Characteristics of Random Forest
2/14/2024
Gradient Boosting
Constructs a series of models
Models can be any predictive model that has
a differentiable loss function
Commonly, trees are the chosen model
XGboost (extreme gradient boosting) is a popular
package because of its impressive performance
Boosting can be viewed as optimizing the loss
function by iterative functional gradient
descent.
Implementations of various boosted algorithms
are available in Python, R, Matlab, and more.
2/14/2024