0% found this document useful (0 votes)
14 views

UNIT3_class

The document outlines key concepts in machine learning, focusing on model parameters and hyperparameters, as well as ensemble learning techniques such as bagging, boosting, and stacking. It explains how these methods improve model accuracy by combining weak learners and addresses instance-based learning with a detailed explanation of the K-nearest neighbors (KNN) algorithm. Additionally, it provides examples and exercises to illustrate the application of KNN in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

UNIT3_class

The document outlines key concepts in machine learning, focusing on model parameters and hyperparameters, as well as ensemble learning techniques such as bagging, boosting, and stacking. It explains how these methods improve model accuracy by combining weak learners and addresses instance-based learning with a detailed explanation of the K-nearest neighbors (KNN) algorithm. Additionally, it provides examples and exercises to illustrate the application of KNN in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Parameters for Machine learning

1. Model Parameters
2. Model HyperParameters

• Model Parameters : configuration variables that are internal to the model, and a
model learns them on its own.
o They are used by the model for making predictions.
o They are learned by the model from the data itself
o These are usually not set manually.
o These are the part of the model and key to a machine learning Algorithm.

e.g.,
W Weights or Coefficients of independent variables in the Linear regression model.
or Weights or Coefficients of independent variables in SVM,
weight, ,
biases of a neural network,
cluster centroid in clustering.
Hyperparameter
• Hyperparameters are those parameters that are explicitly defined by the user to
control the learning process. Some key points for model parameters are as
follows:
o These are usually defined manually by the machine learning engineer.
o One cannot know the exact best value for hyperparameters for the given
problem. The best value can be determined either by the rule of thumb or by
trial and error.

•



Bias and variance
Ensemble learning
Ensemble learning is a machine learning paradigm where multiple models (often
called “weak learners”) are trained to solve the same problem and combined to get
better results. The main hypothesis is that when weak models are correctly combined
we can obtain more accurate and/or robust models.

Ensemble learning tries to balance this bias-variance trade-off by reducing either the bias or
the variance.
Ensemble learning improves a model’s performance in mainly three ways:
• By reducing the variance of weak learners
• By reducing the bias of weak learners,
• By improving the overall accuracy of strong learners.
 Bagging is used to reduce the variance of weak learners.
 Boosting is used to reduce the bias of weak learners.
 Stacking is used to improve the overall accuracy of strong learners.
Ensemble learning….
By suitably combining multiple base learners then, accuracy can be improved
How the learners can be different:
1. Different algorithms make different assumptions about
the data and lead to different classifiers.
2. same learning algorithm or different but use it with
different hyperparameters.
3. Separate base-learners may be using different
representations of the same input object or event, making
it possible to integrate different types of
sensors/measurements/modalities.
4. Another possibility is to train different base-learners by
different subsets of the training set.
Model combination scheme
• bagging, that often considers homogeneous weak learners, learns them
independently from each other in parallel and combines them following some kind of
deterministic averaging process
• boosting, that often considers homogeneous weak learners, learns them
sequentially in a very adaptative way (a base model depends on the previous ones)
and combines them following a deterministic strategy
• stacking, that often considers heterogeneous weak learners, learns them in
parallel and combines them by training a meta-model to output a prediction based on
the different weak models predictions
Bagging
• Bootstrap Aggregating
• use bagging for combining weak learners of high variance
• These weak learners are homogenous (are of the same type).
• It consists of two steps: 1. bootstrapping and 2. aggregation.

Bootstrapping
• Involves resampling subsets of data with replacement from an initial dataset.
• In other words, subsets of data are taken from the initial dataset.
• These subsets of data are called bootstrapped datasets or, simply, bootstraps.
• Resampled ‘with replacement’ means an individual data point can be sampled
multiple times.
• Each bootstrap dataset is used to train a weak learner.
The steps of bagging are as follows:
Bagging 1. We have an initial training dataset
containing n-number of instances.
2. We create a m-number of subsets of
data from the training set. We take a subset of
N sample points from the initial dataset for
each subset. Each subset is taken with
replacement. This means that a specific data
point can be sampled more than once.
3. For each subset of data, we train the
corresponding weak learners independently.
These models are homogeneous, meaning that
they are of the same type.
4. Each model makes a prediction.
5. The predictions are aggregated into a
single prediction. For this, either max voting or
averaging is used.
• When bootstrap aggregating is performed, two independent sets
are created.

One set, the bootstrap sample, is the data chosen to be "in-the-


bag" by sampling with replacement.
The out-of-bag set is all data not chosen in the sampling
process.
Boosting
• use of boosting for combining weak learners with high bias.
• Boosting aims to produce a model with a lower bias than that of the individual
models.
• Like in bagging, the weak learners are homogeneous
• boosting aggregates the results at each step.
• They are aggregated using weighted averaging.
• Boosting involves sequentially training weak learners.
• Here, each subsequent learner improves the errors of previous learners in the
sequence.
• A sample of data is first taken from the initial dataset. This sample is used to
train the first model, and the model makes its prediction.
• The samples can either be correctly or incorrectly predicted.
• The samples that are wrongly predicted are reused for training the next model.
In this way, subsequent models can improve on the errors of previous models.
Boosting 1. We sample m-number of subsets from an
initial training dataset.
2. Using the first subset, we train the first
weak learner.
3. We test the trained weak learner using the
training data. As a result of the testing, some data
points will be incorrectly predicted.
4. Each data point with the wrong prediction is
sent into the second subset of data, and this subset
is updated.
5. Using this updated subset, we train and test
the second weak learner.
6. We continue with the following subset until
the total number of subsets is reached.
7. We now have the total prediction. The
overall prediction has already been aggregated at
each step, so there is no need to calculate it.
Boosting vs Bagging
1. Both are ensemble methods to get N learners from 1 learner.
2. Both generate several training data sets by random sampling.
3. Both make the final decision by averaging the N learners (or taking the majority of them i.e
Majority Voting).
4. Both are good at reducing variance and provide higher stability.
Bagging Boosting

The simplest way of combining predictions that belong A way of combining predictions that belong to
to the same type. the different types.
Aim to decrease variance, not bias. Aim to decrease bias, not variance.

Models are weighted according to their


Each model receives equal weight. performance.

New models are influenced


Each model is built independently. by the performance of previously built models.

Different training data subsets are selected using row


sampling with replacement and random sampling Every new subset contains the elements that
methods from the entire training dataset. were misclassified by previous models.
If the classifier is unstable (high variance), then If the classifier is stable and simple (high bias) the
apply bagging. apply boosting.

In this base classifiers are trained parallelly. In this base classifiers are trained sequentially.

Example: The Random forest model uses


Bagging. Example: The AdaBoost uses Boosting techniques

Types Of Boosting

Adaptive Boosting or AdaBoost

Gradient Boosting

XGBoost
Stacking

• Stacking often considers heterogeneous weak learners,


• learns them in parallel, and
• combines them by training a meta-learner to output a
prediction based on the different weak learner’s
predictions.
• A meta learner inputs the predictions as the features and
the target being the ground truth values in data D,
• it attempts to learn how to best combine the input
predictions to make a better output prediction.
Stacking…
Example of stacking
How stacking works?

1. We split the training data into K-folds just like


K-fold cross-validation.
2. A base model is fitted on the K-1 parts and
predictions are made for Kth part.
3. We do for each part of the training data.
4. The base model is then fitted on the whole
train data set to calculate its performance on the test
set.
5. We repeat the last 3 steps for other base
models.
6. Predictions from the train set are used as
features for the second level model.
7. Second level model is used to make a
prediction on the test set.
instance-based learning

• memory-based learning or lazy-learning (because they delay processing until a new


instance must be classified).
• systems that learn the training examples by heart and then generalizes to new
instances based on some similarity measure
• it builds the hypotheses from the training instances
• Each time whenever a new query is encountered, its previously stores data is
examined. And assign to a target function value for the new instance.
Some of the instance-based learning algorithms are :
K Nearest Neighbor (KNN)
Self-Organizing Map (SOM)
Learning Vector Quantization (LVQ)
Locally Weighted Learning (LWL)
Case-Based Reasoning
K-nearest neighbors (KNN)
• K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which
can be used for both classification as well as regression predictive problems.
 Lazy learning algorithm − KNN is a lazy learning algorithm because it does not
have a specialized training phase and uses all the data for training while
classification.
 Non-parametric learning algorithm − KNN is also a non-parametric learning
algorithm because it doesn’t assume anything about the underlying data.
Working of KNN Algorithm
• Step 1 − For implementing any algorithm, we need dataset. So during the first
step of KNN, we must load the training as well as test data.
• Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can
be any integer.
• Step 3 − For each point in the test data do the following −
 3.1 − Calculate the distance between test data and each row of training data with
the help of any of the method namely: Euclidean, Manhattan or Hamming
distance. The most commonly used method to calculate distance is Euclidean.
 3.2 − Now, based on the distance value, sort them in ascending order.
 3.3 − Next, it will choose the top K rows from the sorted array.
 3.4 − Now, it will assign a class to the test point based on most frequent class of
these rows.
• Step 4 − End
KNN example

K=3
Suppose we have height, weight and T-shirt size of some customers and we need to predict
the T-shirt size of a new customer given only height and weight information we have. Data
including height, weight and T-shirt size information is shown below. Assume k=5
Find the size of New customer named 'Ravi' has height 161cm and weight 61kg.
t-shirt Eclidian
height weight size dist The Euclidean distance formula
158 58 M 4.24 says:
158 59 M 3.61
d = √[ (x2 – x1)2 + (y2 – y1)2]
158 63 M 3.61
160 59 M 2.24 where,
160 60 M 1.41
 (x1, y1) are the coordinates
163 60 M 2.24
163 61 M 2.00 of one point.
160 64 L 3.16  (x2, y2) are the coordinates of
163 64 L 3.61 the other point.
165 61 L 4.00  d is the distance between (x1,
165 62 L 4.12
y1) and (x2, y2).
165 65 L 5.66
x1=161 cm
168 62 L 7.07 y1=61 kg
168 63 L 7.28
168 66 L 8.60
170 63 L 9.22
170 64 L 9.49
170 68 L 11.40
distance size Rank Sort Rank ( smallest to largest)
1.41 M 1
2.00 M 2 Since K =5
2.24 M 3
Look at the first 5 ranks and their majority
2.24 M 4
size.
3.16 L 5 The majority size of first 5 is ‘M’
3.61 M 6
3.61 M 7 Ravi’s T-shirt size is ‘M’
3.61 L 8
4.00 L 9
4.12 L 10
4.24 M 11
5.66 L 12
7.07 L 13
7.28 L 14
8.60 L 15
9.22 L 16
9.49 L 17
11.40 L 18
How to Choose the Value of K in the K-NN Algorithm

There is no particular way of choosing the value K, but here are some common
conventions to keep in mind:
• Choosing a very low value will most likely lead to inaccurate predictions.
• The commonly used value of K is 5.
• Always use an odd number as the value of K.
K-NN vs K-Means
Exercise -1
2. Apply KNN classification on the following dataset and predict the quality of paper_5 having
Acid Durability = 3 and Strength = 7 for K= 3 (Nearest Neighbor).
The data from a survey and objective testing with two attributes (Acid durability and Strength)
can be used to classify whether the quality of the sample paper is good or bad.
The below table shows four training samples

Sample Paper Acid Durability Strength Quality

Paper_1 7 7 Bad

Paper_2 7 4 Bad

Paper_3 3 4 Good

Paeper_4 1 4 Good
solution
Exercise 2
1. The following table gives the value of brightness, saturation and class for bulbs. Given a new bulb of brightness
(x2)=20 and saturation(y2)=35, find which class color it is? Take k=5
BRIGHTNESS SATURATION CLASS

40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Solution ( k=5)
BRIGHTN SATURATI CLASS DISTANCE
ESS ON
10 25 Red 10

40 20 Red 25

50 50 Blue 33.54

25 80 Blue 45

60 10 Red 47.17

70 70 Blue 61.03

60 90 Blue 68.01

You might also like