Chapter 2
Chapter 2
Machine Learning
Chapter 2 - Supervised learning
Outline
❑ Regression
❑ Classification
❑ KNN, Naïve Bayes, Logistic regression,
SVM
❑ Evaluating performance SL algorithms
2
Introduction
• Supervised learning is used whenever we want to predict a certain outcome
from a given input, and we have examples of input/output pairs. We build a
machine learning model from these input/output pairs, which comprise our
training set.
• It is a research field at the intersection of statistics, artificial intelligence, and
computer science and is also known as predictive analytics or statistical
learning.
• Our goal is to make accurate predictions for new, never-before-seen data.
• Super‐vised learning often requires human effort to build the training set, but
afterward automates and often speeds up an otherwise laborious or infeasible
task. 3
Supervised Learning
Classification Regression
5
Types of Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:
•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbors
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
6
Evaluating a Classification model
1. Log Loss or Cross-Entropy Loss:
• It is used for evaluating the performance of a classifier, whose output is
a probability value between the 0 and 1.
• For a good binary Classification model, the value of log loss should be
near to 0.
2. Confusion Matrix:
• The confusion matrix provides us a matrix/table as output and
describes the performance of the model.
• It is also known as the error matrix.
• The matrix consists of predictions result in a summarized form, which
has a total number of correct predictions and incorrect predictions.
7
k-Nearest Neighbors
• The k-NN algorithm is arguably the simplest machine learning
algorithm.
• Building the model consists only of storing the training dataset.
• To make a prediction for a new data point, the algorithm finds the
closest data points in the training dataset its “nearest neighbors.”
8
k-NN classification
• In its simplest version, the k-NN algorithm only considers exactly one
nearest neighbor, which is the closest training data point to the point
we want to make a prediction for.
9
Predictions made by the one-nearest-neighbor model on the forge dataset
• Instead of considering only the closest neighbor, we can also consider an
arbitrary number, k, of neighbors. When considering more than one neighbor,
we use voting to assign a label. This means that for each test point, we count
how many neighbors belong to class 0 and how many neighbors belong to
class 1. We then assign the class that is more frequent: in other words, the
majority class among the k-nearest neighbors.
11
Naïve Bayes Classifier
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles. 12
Naïve Bayes Classifier …Cont’d
Bayes' Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
𝑝 𝐵 ȁ𝐴 𝑃 𝐴
𝑝 𝐴ȁ𝐵 =
𝑃 𝐵
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
13
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
14
Dataset
Outlook Play Problem: If the weather is sunny,
then the Player should play or not?
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
15
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
16
Likelihood table of weather condition
17
Applying Bayes'theorem
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
19
Regression …Cont’d
• The goal of the SVM algorithm is to create the best line or decision
boundary that can separate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.
22
How does SVM works? …
• Hence, the SVM algorithm helps to find the
best line or decision boundary; this best
boundary or region is called as a hyperplane.
SVM algorithm finds the closest point of the
lines from both the classes. These points are
called support vectors. The distance
between the vectors and the hyperplane is
called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with
maximum margin is called the optimal
hyperplane.
23
Support Vector Machine …Cont’d
SVM can be of two types
1. Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
24
Regression
25
Classification and Regression
• An easy way to distinguish between classification and regression tasks
is to ask whether there is some kind of continuity in the output. If there
is continuity between possible outcomes, then the problem is a
regression problem.
• By contrast, for the task of recognizing the language of a website (which
is a classification problem), there is no matter of degree. A website is in
one language, or it is in another. There is no continuity between
languages, and there is no language that is between English and French.
26
Linear Regression
• Linear regression is a supervised machine learning algorithm that
models a linear relationship between a dependent variable and one or
more independent variables.
• The equation for a simple linear regression is Y = b0 + b1X + e, where Y
is the dependent variable, X is the independent variable, b0 is the
intercept, b1 is the slope, and e is the error term.
• Linear regression is used to predict numeric values and can be used for
both simple and multiple regression problems.
• Applications of linear regression include predicting sales, demand,
revenue, stock prices, and housing prices, among others. It is widely
used in business, economics, finance, and social sciences. 27
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial.
• The Polynomial Regression equation is given below:
30
Introduction to AI
"Artificial intelligence is the future and the future
is here.” Dave Waters
31