UNIT3_class
UNIT3_class
1. Model Parameters
2. Model HyperParameters
• Model Parameters : configuration variables that are internal to the model, and a
model learns them on its own.
o They are used by the model for making predictions.
o They are learned by the model from the data itself
o These are usually not set manually.
o These are the part of the model and key to a machine learning Algorithm.
e.g.,
W Weights or Coefficients of independent variables in the Linear regression model.
or Weights or Coefficients of independent variables in SVM,
weight, ,
biases of a neural network,
cluster centroid in clustering.
Hyperparameter
• Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process. Some key points for model parameters are as
follows:
o These are usually defined manually by the machine learning engineer.
o One cannot know the exact best value for hyperparameters for the given
problem. The best value can be determined either by the rule of thumb or by
trial and error.
•
Bias and variance
Ensemble learning
Ensemble learning is a machine learning paradigm where multiple models (often
called “weak learners”) are trained to solve the same problem and combined to get
better results. The main hypothesis is that when weak models are correctly combined
we can obtain more accurate and/or robust models.
Ensemble learning tries to balance this bias-variance trade-off by reducing either the bias or
the variance.
Ensemble learning improves a model’s performance in mainly three ways:
• By reducing the variance of weak learners
• By reducing the bias of weak learners,
• By improving the overall accuracy of strong learners.
Bagging is used to reduce the variance of weak learners.
Boosting is used to reduce the bias of weak learners.
Stacking is used to improve the overall accuracy of strong learners.
Ensemble learning….
By suitably combining multiple base learners then, accuracy can be improved
How the learners can be different:
1. Different algorithms make different assumptions about
the data and lead to different classifiers.
2. same learning algorithm or different but use it with
different hyperparameters.
3. Separate base-learners may be using different
representations of the same input object or event, making
it possible to integrate different types of
sensors/measurements/modalities.
4. Another possibility is to train different base-learners by
different subsets of the training set.
Model combination scheme
• bagging, that often considers homogeneous weak learners, learns them
independently from each other in parallel and combines them following some kind of
deterministic averaging process
• boosting, that often considers homogeneous weak learners, learns them
sequentially in a very adaptative way (a base model depends on the previous ones)
and combines them following a deterministic strategy
• stacking, that often considers heterogeneous weak learners, learns them in
parallel and combines them by training a meta-model to output a prediction based on
the different weak models predictions
Bagging
• Bootstrap Aggregating
• use bagging for combining weak learners of high variance
• These weak learners are homogenous (are of the same type).
• It consists of two steps: 1. bootstrapping and 2. aggregation.
Bootstrapping
• Involves resampling subsets of data with replacement from an initial dataset.
• In other words, subsets of data are taken from the initial dataset.
• These subsets of data are called bootstrapped datasets or, simply, bootstraps.
• Resampled ‘with replacement’ means an individual data point can be sampled
multiple times.
• Each bootstrap dataset is used to train a weak learner.
The steps of bagging are as follows:
Bagging 1. We have an initial training dataset
containing n-number of instances.
2. We create a m-number of subsets of
data from the training set. We take a subset of
N sample points from the initial dataset for
each subset. Each subset is taken with
replacement. This means that a specific data
point can be sampled more than once.
3. For each subset of data, we train the
corresponding weak learners independently.
These models are homogeneous, meaning that
they are of the same type.
4. Each model makes a prediction.
5. The predictions are aggregated into a
single prediction. For this, either max voting or
averaging is used.
• When bootstrap aggregating is performed, two independent sets
are created.
The simplest way of combining predictions that belong A way of combining predictions that belong to
to the same type. the different types.
Aim to decrease variance, not bias. Aim to decrease bias, not variance.
In this base classifiers are trained parallelly. In this base classifiers are trained sequentially.
Types Of Boosting
Gradient Boosting
XGBoost
Stacking
K=3
Suppose we have height, weight and T-shirt size of some customers and we need to predict
the T-shirt size of a new customer given only height and weight information we have. Data
including height, weight and T-shirt size information is shown below. Assume k=5
Find the size of New customer named 'Ravi' has height 161cm and weight 61kg.
t-shirt Eclidian
height weight size dist The Euclidean distance formula
158 58 M 4.24 says:
158 59 M 3.61
d = √[ (x2 – x1)2 + (y2 – y1)2]
158 63 M 3.61
160 59 M 2.24 where,
160 60 M 1.41
(x1, y1) are the coordinates
163 60 M 2.24
163 61 M 2.00 of one point.
160 64 L 3.16 (x2, y2) are the coordinates of
163 64 L 3.61 the other point.
165 61 L 4.00 d is the distance between (x1,
165 62 L 4.12
y1) and (x2, y2).
165 65 L 5.66
x1=161 cm
168 62 L 7.07 y1=61 kg
168 63 L 7.28
168 66 L 8.60
170 63 L 9.22
170 64 L 9.49
170 68 L 11.40
distance size Rank Sort Rank ( smallest to largest)
1.41 M 1
2.00 M 2 Since K =5
2.24 M 3
Look at the first 5 ranks and their majority
2.24 M 4
size.
3.16 L 5 The majority size of first 5 is ‘M’
3.61 M 6
3.61 M 7 Ravi’s T-shirt size is ‘M’
3.61 L 8
4.00 L 9
4.12 L 10
4.24 M 11
5.66 L 12
7.07 L 13
7.28 L 14
8.60 L 15
9.22 L 16
9.49 L 17
11.40 L 18
How to Choose the Value of K in the K-NN Algorithm
There is no particular way of choosing the value K, but here are some common
conventions to keep in mind:
• Choosing a very low value will most likely lead to inaccurate predictions.
• The commonly used value of K is 5.
• Always use an odd number as the value of K.
K-NN vs K-Means
Exercise -1
2. Apply KNN classification on the following dataset and predict the quality of paper_5 having
Acid Durability = 3 and Strength = 7 for K= 3 (Nearest Neighbor).
The data from a survey and objective testing with two attributes (Acid durability and Strength)
can be used to classify whether the quality of the sample paper is good or bad.
The below table shows four training samples
Paper_1 7 7 Bad
Paper_2 7 4 Bad
Paper_3 3 4 Good
Paeper_4 1 4 Good
solution
Exercise 2
1. The following table gives the value of brightness, saturation and class for bulbs. Given a new bulb of brightness
(x2)=20 and saturation(y2)=35, find which class color it is? Take k=5
BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Solution ( k=5)
BRIGHTN SATURATI CLASS DISTANCE
ESS ON
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01