0% found this document useful (0 votes)

25 views31 pages

Chapter 2

The document provides an introduction to supervised machine learning algorithms for classification and regression problems. It discusses major classification algorithms like k-nearest neighbors, naive Bayes, and logistic regression. It also covers evaluating the performance of supervised learning models using metrics like log loss and confusion matrices.

Uploaded by

shifaratesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views31 pages

Chapter 2

Uploaded by

shifaratesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Introduction to

Machine Learning
Chapter 2 - Supervised learning

by Mintesinot Getachew (MSc.)

❑ Introduction

Outline
❑ Regression
❑ Classification
❑ KNN, Naïve Bayes, Logistic regression,
SVM
❑ Evaluating performance SL algorithms

2
Introduction
• Supervised learning is used whenever we want to predict a certain outcome
from a given input, and we have examples of input/output pairs. We build a
machine learning model from these input/output pairs, which comprise our
training set.
• It is a research field at the intersection of statistics, artificial intelligence, and
computer science and is also known as predictive analytics or statistical
learning.
• Our goal is to make accurate predictions for new, never-before-seen data.
• Super‐vised learning often requires human effort to build the training set, but
afterward automates and often speeds up an otherwise laborious or infeasible
task. 3
Supervised Learning

Classification Regression

To predict a class label To predict a continuous

number

The two major types of supervised machine learning problems

In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities.

• There are two types of Classifications:

1. Binary Classifier: If the classification
problem has only two possible
outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE,
SPAM or NOT SPAM, CAT or DOG, etc.
2. Multi-class Classifier: If a classification
problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of
animal, Classification of types of music.

5
Types of Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:
•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbors
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
6
Evaluating a Classification model
1. Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a classifier, whose output is
a probability value between the 0 and 1.
• For a good binary Classification model, the value of log loss should be
near to 0.
2. Confusion Matrix:
• The confusion matrix provides us a matrix/table as output and
describes the performance of the model.
• It is also known as the error matrix.
• The matrix consists of predictions result in a summarized form, which
has a total number of correct predictions and incorrect predictions.
7
k-Nearest Neighbors
• The k-NN algorithm is arguably the simplest machine learning
algorithm.
• Building the model consists only of storing the training dataset.
• To make a prediction for a new data point, the algorithm finds the
closest data points in the training dataset its “nearest neighbors.”

8
k-NN classification
• In its simplest version, the k-NN algorithm only considers exactly one
nearest neighbor, which is the closest training data point to the point
we want to make a prediction for.

9
Predictions made by the one-nearest-neighbor model on the forge dataset
• Instead of considering only the closest neighbor, we can also consider an
arbitrary number, k, of neighbors. When considering more than one neighbor,
we use voting to assign a label. This means that for each test point, we count
how many neighbors belong to class 0 and how many neighbors belong to
class 1. We then assign the class that is more frequent: in other words, the
majority class among the k-nearest neighbors.

Predictions made by the three-nearest-neighbor model on the forge dataset 10

Strengths, weaknesses, and parameters
There are two important parameters to the k-NN classifier:
1. The number of neighbors
2. How you measure distance between data points
• Commonly, using a small number of neighbors like three or five often
works well.
• The most common distance measure used in k-NN classification is the
Euclidean distance.

11
Naïve Bayes Classifier
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles. 12
Naïve Bayes Classifier …Cont’d

Bayes' Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
𝑝 𝐵 ȁ𝐴 𝑃 𝐴
𝑝 𝐴ȁ𝐵 =
𝑃 𝐵
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
13
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

14
Dataset
Outlook Play Problem: If the weather is sunny,
then the Player should play or not?
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
15
13 Overcast Yes
Frequency table for the Weather Conditions:

Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5

16
Likelihood table of weather condition

Weather No Yes Likelihood

Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71

17
Applying Bayes'theorem

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

18
Logistic Regression
• Logistic regression is used for predicting the categorical dependent
variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that
how they are used. Linear Regression is used for solving Regression
problems, whereas Logistic regression is used for solving the
classification problems.

19
Regression …Cont’d

• In Logistic regression, instead of fitting a

regression line, we fit an "S" shaped logistic
function, which predicts two maximum values
(0 or 1).
• The curve from the logistic function indicates
the likelihood of something such as whether
the cells are cancerous or not, a mouse is
obese or not based on its weight, etc.
• Logistic Regression is a significant machine
learning algorithm because it has the ability to
provide probabilities and classify new data
using continuous and discrete datasets.
20
Support Vector Machine
• Support Vector Machine used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.

• The goal of the SVM algorithm is to create the best line or decision
boundary that can separate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the

hyperplane. These extreme cases are called as support vectors
21
How does SVM works?
• Suppose we have a dataset that has two tags (green and blue), and the dataset
has two features x1 and x2. We want a classifier that can classify the pair(x1, x2)
of coordinates in either green or blue. Consider the below image:

As it is 2-d space so by just

using a straight line, we can
easily separate these two
classes. But there can be
multiple lines that can separate
these classes.

22
How does SVM works? …
• Hence, the SVM algorithm helps to find the
best line or decision boundary; this best
boundary or region is called as a hyperplane.
SVM algorithm finds the closest point of the
lines from both the classes. These points are
called support vectors. The distance
between the vectors and the hyperplane is
called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with
maximum margin is called the optimal
hyperplane.

23
Support Vector Machine …Cont’d
SVM can be of two types
1. Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
24
Regression

• For regression tasks, the goal is to predict a continuous number, or a

floating-point number in programming terms (or real number in
mathematical terms).
• Predicting a person’s annual income from their education, their age, and
where they live is an example of a regression task.
• When predicting income, the predicted value is an amount, and can be
any number in a given range.

25
Classification and Regression
• An easy way to distinguish between classification and regression tasks
is to ask whether there is some kind of continuity in the output. If there
is continuity between possible outcomes, then the problem is a
regression problem.
• By contrast, for the task of recognizing the language of a website (which
is a classification problem), there is no matter of degree. A website is in
one language, or it is in another. There is no continuity between
languages, and there is no language that is between English and French.
26
Linear Regression
• Linear regression is a supervised machine learning algorithm that
models a linear relationship between a dependent variable and one or
more independent variables.
• The equation for a simple linear regression is Y = b0 + b1X + e, where Y
is the dependent variable, X is the independent variable, b0 is the
intercept, b1 is the slope, and e is the error term.
• Linear regression is used to predict numeric values and can be used for
both simple and multiple regression problems.
• Applications of linear regression include predicting sales, demand,
revenue, stock prices, and housing prices, among others. It is widely
used in business, economics, finance, and social sciences. 27
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial.
• The Polynomial Regression equation is given below:

• It is a linear model with some modification in order to increase the

accuracy.
• The dataset used in Polynomial regression for training is of non-linear
nature.
28
Evaluating Regression models
• There are several metrics that are commonly used to evaluate the
performance of a regression model. Some of the most commonly used
regression evaluation metrics are:
1. R-squared (R2): measures the proportion of variance in the dependent
variable that is explained by the independent variables. The value of
R-squared ranges from 0 to 1, with a higher value indicating a better
fit.
2. Mean Squared Error (MSE): measures the average squared difference
between the predicted values and the actual values. The lower the
MSE, the better the fit.
3. Root Mean Squared Error (RMSE): is the square root of MSE, and
provides a measure of the average distance between the predicted
values and the actual values. A lower RMSE indicates a better fit. 29
Evaluating Regression models …Cont’d

4. Mean Absolute Error (MAE): measures the average absolute difference

between the predicted values and the actual values. A lower MAE
indicates a better fit.

5. Adjusted R-squared: adjusts the R-squared value for the number of

independent variables in the model. It penalizes the use of additional
variables that do not improve the fit.

30
Introduction to AI
"Artificial intelligence is the future and the future
is here.” Dave Waters

6515-Teaching of Mathematics
100% (1)
6515-Teaching of Mathematics
2 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Classification
No ratings yet
Classification
7 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Assignment 2
No ratings yet
Assignment 2
111 pages
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit 5
No ratings yet
Unit 5
28 pages
Ml Module4 Classification
No ratings yet
Ml Module4 Classification
79 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Supervised Learning
No ratings yet
Supervised Learning
46 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Week 8. Supervised Learning. Classification
No ratings yet
Week 8. Supervised Learning. Classification
45 pages
Unit 3 PPT
No ratings yet
Unit 3 PPT
20 pages
Machine Learning
100% (6)
Machine Learning
115 pages
BECE352E Module 3
No ratings yet
BECE352E Module 3
64 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
ML-Unit-2
No ratings yet
ML-Unit-2
33 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
ML & Cloud Computing For Iot: Topics in Module-3
No ratings yet
ML & Cloud Computing For Iot: Topics in Module-3
38 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
7 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
unit 1
100% (1)
unit 1
13 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Notes
No ratings yet
Notes
32 pages
ML notes
No ratings yet
ML notes
10 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Unit 1
No ratings yet
Unit 1
15 pages
Chap2 SupervisedLearning
No ratings yet
Chap2 SupervisedLearning
24 pages
Unit 3 ML
No ratings yet
Unit 3 ML
28 pages
Module Iii
No ratings yet
Module Iii
15 pages
ML UNIT-4
No ratings yet
ML UNIT-4
20 pages
ARTIFICIAL INTELLIGENCE LEC 3
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 3
17 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Classification
100% (2)
Classification
105 pages
03 Classification
No ratings yet
03 Classification
66 pages
AIML
No ratings yet
AIML
30 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
CH 4
No ratings yet
CH 4
92 pages
Chapter Three
No ratings yet
Chapter Three
91 pages
Unit 1 Introduction To Computer Security: COSC 4035
0% (1)
Unit 1 Introduction To Computer Security: COSC 4035
47 pages
Chapter 2computer SecurityThreatsFinalized
No ratings yet
Chapter 2computer SecurityThreatsFinalized
32 pages
Engineering Optimization Course Tugas Kelompok Ke-1 Minggu 05 / Session 06
No ratings yet
Engineering Optimization Course Tugas Kelompok Ke-1 Minggu 05 / Session 06
7 pages
20250416131200-pg002m-a-m-sc-2semestermay-2025
No ratings yet
20250416131200-pg002m-a-m-sc-2semestermay-2025
6 pages
Story of A Successful Entrepreneur
No ratings yet
Story of A Successful Entrepreneur
3 pages
Jiangsu Alia Tech catalogue(2025)(2025-04-30 09_14_15)
No ratings yet
Jiangsu Alia Tech catalogue(2025)(2025-04-30 09_14_15)
16 pages
Presentation-4 - ESP Soot Blower
No ratings yet
Presentation-4 - ESP Soot Blower
19 pages
Boimba 5-NP3231-665-3-680-475 PDF
No ratings yet
Boimba 5-NP3231-665-3-680-475 PDF
7 pages
Well Control Methods
No ratings yet
Well Control Methods
4 pages
Radiography: Radiography Is An Imaging Technique
No ratings yet
Radiography: Radiography Is An Imaging Technique
52 pages
Key Machines
No ratings yet
Key Machines
30 pages
RIAI Housing Timeline Report
No ratings yet
RIAI Housing Timeline Report
24 pages
Chassis KS2A-N Manual de Servicio PDF
No ratings yet
Chassis KS2A-N Manual de Servicio PDF
56 pages
6 Leadership Development Activities For Teens and Youth
No ratings yet
6 Leadership Development Activities For Teens and Youth
3 pages
Cat Parts Value Story - 2012
100% (2)
Cat Parts Value Story - 2012
70 pages
Civil Guruji: Civil Engineers Training Institute
No ratings yet
Civil Guruji: Civil Engineers Training Institute
6 pages
RPSC AEN Main Syllabus
No ratings yet
RPSC AEN Main Syllabus
17 pages
TOOL_FOR_RESILIENT_MENTAL_HEALTH__1735467160
No ratings yet
TOOL_FOR_RESILIENT_MENTAL_HEALTH__1735467160
24 pages
I. The Endomembrane System Includes Plasma
No ratings yet
I. The Endomembrane System Includes Plasma
2 pages
Kobelco Sk200-8, Sk210lc-8 (Hs Engine) Hydraulic Excavator Service Repair Manual Instant Download (Book Code No. S5yn0023e02)
No ratings yet
Kobelco Sk200-8, Sk210lc-8 (Hs Engine) Hydraulic Excavator Service Repair Manual Instant Download (Book Code No. S5yn0023e02)
21 pages
Section One: Listening (50 Points) Hướng Dẫn Phần Thi Nghe Hiểu
100% (1)
Section One: Listening (50 Points) Hướng Dẫn Phần Thi Nghe Hiểu
3 pages
Varanasi - GST
No ratings yet
Varanasi - GST
3 pages
Green Leaf Resume Template
No ratings yet
Green Leaf Resume Template
3 pages
MCB 11 FILE 2nd Long Exam Reviewer On MCB 11 LAB
No ratings yet
MCB 11 FILE 2nd Long Exam Reviewer On MCB 11 LAB
5 pages
SPI Parameter & Standrad: Process Detail
No ratings yet
SPI Parameter & Standrad: Process Detail
2 pages
A. Japanese Alphabets: 1. Write From Top To Bottom B. Write From Left To Right
No ratings yet
A. Japanese Alphabets: 1. Write From Top To Bottom B. Write From Left To Right
7 pages
Thermodynamic Applications: Cooling Systems in Internal Combustion Engines
No ratings yet
Thermodynamic Applications: Cooling Systems in Internal Combustion Engines
15 pages
Strategic Theory for the 21st Century The Little Book on Big Strategy 1st edition by Harry Yarger ISBN 1300039264 978-1300039266 - Download the full ebook set with all chapters in PDF format
100% (7)
Strategic Theory for the 21st Century The Little Book on Big Strategy 1st edition by Harry Yarger ISBN 1300039264 978-1300039266 - Download the full ebook set with all chapters in PDF format
77 pages
122-Article Text-408-1-10-20190201 PDF
No ratings yet
122-Article Text-408-1-10-20190201 PDF
6 pages
Sample IA
No ratings yet
Sample IA
24 pages
Final Thesis Baby MOTAR Final Na Final Na Talaga For Binding - 1
No ratings yet
Final Thesis Baby MOTAR Final Na Final Na Talaga For Binding - 1
83 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Introduction to

by Mintesinot Getachew (MSc.)

To predict a class label To predict a continuous

The two major types of supervised machine learning problems

• There are two types of Classifications:

Predictions made by the three-nearest-neighbor model on the forge dataset 10

Weather No Yes Likelihood

P(Sunny|Yes)= 3/10= 0.3

Hence on a Sunny day, Player can play the game.

• In Logistic regression, instead of fitting a

• SVM chooses the extreme points/vectors that help in creating the

As it is 2-d space so by just

• For regression tasks, the goal is to predict a continuous number, or a

• It is a linear model with some modification in order to increase the

4. Mean Absolute Error (MAE): measures the average absolute difference

5. Adjusted R-squared: adjusts the R-squared value for the number of

You might also like