0% found this document useful (0 votes)
20 views

Chapter 2

The document provides an introduction to supervised machine learning algorithms for classification and regression problems. It discusses major classification algorithms like k-nearest neighbors, naive Bayes, and logistic regression. It also covers evaluating the performance of supervised learning models using metrics like log loss and confusion matrices.

Uploaded by

shifaratesfaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Chapter 2

The document provides an introduction to supervised machine learning algorithms for classification and regression problems. It discusses major classification algorithms like k-nearest neighbors, naive Bayes, and logistic regression. It also covers evaluating the performance of supervised learning models using metrics like log loss and confusion matrices.

Uploaded by

shifaratesfaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to

Machine Learning
Chapter 2 - Supervised learning

by Mintesinot Getachew (MSc.)


❑ Introduction

Outline
❑ Regression
❑ Classification
❑ KNN, Naïve Bayes, Logistic regression,
SVM
❑ Evaluating performance SL algorithms

2
Introduction
• Supervised learning is used whenever we want to predict a certain outcome
from a given input, and we have examples of input/output pairs. We build a
machine learning model from these input/output pairs, which comprise our
training set.
• It is a research field at the intersection of statistics, artificial intelligence, and
computer science and is also known as predictive analytics or statistical
learning.
• Our goal is to make accurate predictions for new, never-before-seen data.
• Super‐vised learning often requires human effort to build the training set, but
afterward automates and often speeds up an otherwise laborious or infeasible
task. 3
Supervised Learning

Classification Regression

To predict a class label To predict a continuous


number

The two major types of supervised machine learning problems


In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities.

• There are two types of Classifications:


1. Binary Classifier: If the classification
problem has only two possible
outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE,
SPAM or NOT SPAM, CAT or DOG, etc.
2. Multi-class Classifier: If a classification
problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of
animal, Classification of types of music.

5
Types of Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:
•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbors
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
6
Evaluating a Classification model
1. Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a classifier, whose output is
a probability value between the 0 and 1.
• For a good binary Classification model, the value of log loss should be
near to 0.
2. Confusion Matrix:
• The confusion matrix provides us a matrix/table as output and
describes the performance of the model.
• It is also known as the error matrix.
• The matrix consists of predictions result in a summarized form, which
has a total number of correct predictions and incorrect predictions.
7
k-Nearest Neighbors
• The k-NN algorithm is arguably the simplest machine learning
algorithm.
• Building the model consists only of storing the training dataset.
• To make a prediction for a new data point, the algorithm finds the
closest data points in the training dataset its “nearest neighbors.”

8
k-NN classification
• In its simplest version, the k-NN algorithm only considers exactly one
nearest neighbor, which is the closest training data point to the point
we want to make a prediction for.

9
Predictions made by the one-nearest-neighbor model on the forge dataset
• Instead of considering only the closest neighbor, we can also consider an
arbitrary number, k, of neighbors. When considering more than one neighbor,
we use voting to assign a label. This means that for each test point, we count
how many neighbors belong to class 0 and how many neighbors belong to
class 1. We then assign the class that is more frequent: in other words, the
majority class among the k-nearest neighbors.

Predictions made by the three-nearest-neighbor model on the forge dataset 10


Strengths, weaknesses, and parameters
There are two important parameters to the k-NN classifier:
1. The number of neighbors
2. How you measure distance between data points
• Commonly, using a small number of neighbors like three or five often
works well.
• The most common distance measure used in k-NN classification is the
Euclidean distance.

11
Naïve Bayes Classifier
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles. 12
Naïve Bayes Classifier …Cont’d

Bayes' Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
𝑝 𝐵 ȁ𝐴 𝑃 𝐴
𝑝 𝐴ȁ𝐵 =
𝑃 𝐵
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
13
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

14
Dataset
Outlook Play Problem: If the weather is sunny,
then the Player should play or not?
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
15
13 Overcast Yes
Frequency table for the Weather Conditions:

Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5

16
Likelihood table of weather condition

Weather No Yes Likelihood


Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71

17
Applying Bayes'theorem

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3


P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.


18
Logistic Regression
• Logistic regression is used for predicting the categorical dependent
variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that
how they are used. Linear Regression is used for solving Regression
problems, whereas Logistic regression is used for solving the
classification problems.

19
Regression …Cont’d

• In Logistic regression, instead of fitting a


regression line, we fit an "S" shaped logistic
function, which predicts two maximum values
(0 or 1).
• The curve from the logistic function indicates
the likelihood of something such as whether
the cells are cancerous or not, a mouse is
obese or not based on its weight, etc.
• Logistic Regression is a significant machine
learning algorithm because it has the ability to
provide probabilities and classify new data
using continuous and discrete datasets.
20
Support Vector Machine
• Support Vector Machine used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.

• The goal of the SVM algorithm is to create the best line or decision
boundary that can separate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the


hyperplane. These extreme cases are called as support vectors
21
How does SVM works?
• Suppose we have a dataset that has two tags (green and blue), and the dataset
has two features x1 and x2. We want a classifier that can classify the pair(x1, x2)
of coordinates in either green or blue. Consider the below image:

As it is 2-d space so by just


using a straight line, we can
easily separate these two
classes. But there can be
multiple lines that can separate
these classes.

22
How does SVM works? …
• Hence, the SVM algorithm helps to find the
best line or decision boundary; this best
boundary or region is called as a hyperplane.
SVM algorithm finds the closest point of the
lines from both the classes. These points are
called support vectors. The distance
between the vectors and the hyperplane is
called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with
maximum margin is called the optimal
hyperplane.

23
Support Vector Machine …Cont’d
SVM can be of two types
1. Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
24
Regression

• For regression tasks, the goal is to predict a continuous number, or a


floating-point number in programming terms (or real number in
mathematical terms).
• Predicting a person’s annual income from their education, their age, and
where they live is an example of a regression task.
• When predicting income, the predicted value is an amount, and can be
any number in a given range.

25
Classification and Regression
• An easy way to distinguish between classification and regression tasks
is to ask whether there is some kind of continuity in the output. If there
is continuity between possible outcomes, then the problem is a
regression problem.
• By contrast, for the task of recognizing the language of a website (which
is a classification problem), there is no matter of degree. A website is in
one language, or it is in another. There is no continuity between
languages, and there is no language that is between English and French.
26
Linear Regression
• Linear regression is a supervised machine learning algorithm that
models a linear relationship between a dependent variable and one or
more independent variables.
• The equation for a simple linear regression is Y = b0 + b1X + e, where Y
is the dependent variable, X is the independent variable, b0 is the
intercept, b1 is the slope, and e is the error term.
• Linear regression is used to predict numeric values and can be used for
both simple and multiple regression problems.
• Applications of linear regression include predicting sales, demand,
revenue, stock prices, and housing prices, among others. It is widely
used in business, economics, finance, and social sciences. 27
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial.
• The Polynomial Regression equation is given below:

• It is a linear model with some modification in order to increase the


accuracy.
• The dataset used in Polynomial regression for training is of non-linear
nature.
28
Evaluating Regression models
• There are several metrics that are commonly used to evaluate the
performance of a regression model. Some of the most commonly used
regression evaluation metrics are:
1. R-squared (R2): measures the proportion of variance in the dependent
variable that is explained by the independent variables. The value of
R-squared ranges from 0 to 1, with a higher value indicating a better
fit.
2. Mean Squared Error (MSE): measures the average squared difference
between the predicted values and the actual values. The lower the
MSE, the better the fit.
3. Root Mean Squared Error (RMSE): is the square root of MSE, and
provides a measure of the average distance between the predicted
values and the actual values. A lower RMSE indicates a better fit. 29
Evaluating Regression models …Cont’d

4. Mean Absolute Error (MAE): measures the average absolute difference


between the predicted values and the actual values. A lower MAE
indicates a better fit.

5. Adjusted R-squared: adjusts the R-squared value for the number of


independent variables in the model. It penalizes the use of additional
variables that do not improve the fit.

30
Introduction to AI
"Artificial intelligence is the future and the future
is here.” Dave Waters

31

You might also like