0% found this document useful (0 votes)
43 views

Unit 2 - Machine Learning

This document provides an overview of machine learning techniques presented by Dr. Anita, including regression, Bayes' theorem, naive Bayes classifier, and support vector machines (SVM). It discusses linear regression, logistic regression, and their applications. It also explains Bayes' theorem and how it is used in naive Bayes classification. Naive Bayes makes strong independence assumptions and is often used for text and medical data classification.

Uploaded by

ANMOL SINGH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Unit 2 - Machine Learning

This document provides an overview of machine learning techniques presented by Dr. Anita, including regression, Bayes' theorem, naive Bayes classifier, and support vector machines (SVM). It discusses linear regression, logistic regression, and their applications. It also explains Bayes' theorem and how it is used in naive Bayes classification. Naive Bayes makes strong independence assumptions and is often used for text and medical data classification.

Uploaded by

ANMOL SINGH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

United College of Engineering and Research

Department of Computer Science and


Engineering
Machine Learning Technique

Presented By
Dr. Anita
.
OUTLINE
 Regression
• Linear Regression
• Logistic Regression
 Bayes Theorem
 Bays Optimal classifier
 Naive Bays Classifier
 SVM

2
Regression
INTRODUCTION
 Regression analysis is a fundamental
concept in the field of machine learning.

 Supervised Learning

 Regression is a technique used to model and


analyze the relationships between
Dependent Variables and Independent
variables) 3
 Linear Regression
 Linear Regression is a supervised machine
learning algorithm.
 It tries to find out the best linear relationship
that describes the data you have.
 It assumes that there exists a linear
relationship between a dependent variable
and independent variable(s).
 The value of the dependent variable of a
linear regression model is a continuous
value i.e. real numbers.
4
Representing Linear Regression Model-
 Linear regression model represents the linear
relationship between a dependent variable and
independent variable(s) via a sloped straight line.

5

 Simple Linear Regression-


 In simple linear regression, the dependent variable
depends only on a single independent variable.
 For simple linear regression, the form of the
model is-
Y = β0 + β1X
 Here,
 Y is a dependent variable.
 X is an independent variable. 6
 β0 and β1 are the regression coefficients.
 β0 is the intercept a line.
 β1 is the slope or weight that specifies the
factor by which X has an impact on Y.

Multiple Linear Regression-

In multiple linear regression, the


dependent variable depends on more
than one independent variables. 7
For multiple linear regression, the form of the
model is-
Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

Here,
Y is a dependent variable.
X1, X2, …., Xn are independent variables.
β0, β1,…, βn are the regression coefficients.
βj is the slope or weight that specifies the
factor by which Xj has an impact on Y.

8
9
 regression coefficient
 y = 0.8+ 1.2x1 +2x2 +4x3 +1x4
 here X and Y both values not added
 x3= 4 (Maximum value)
 X3 is most appropriate feature to find out
the y (Dependent Variable)
Logistic Regression
 Logistic regression is one of the most
popular Machine Learning algorithms,
which comes under the Supervised Learning
technique.

 It is used for predicting the categorical


dependent variable using a given set of
independent variables.

 Logistic regression predicts the output of a


10
categorical dependent variable.
Therefore the outcome must be a categorical or
discrete value. It can be either Yes or No, 0 or 1, true
or False, etc.

but instead of giving the exact value as 0 and 1, it


gives the probabilistic values which lie between 0
and 1.

Logistic Regression is much similar to the Linear


Regression except that how they are used.

Linear Regression is used for solving Regression


problems, whereas Logistic regression is used for
11
solving the classification problems.
 In Logistic regression, instead of fitting a
regression line, we fit an "S" shaped logistic
function (Sigmoid), which predicts two maximum
values (0 or 1).
 Sigmoid function is trying to convent independent
variable into expression of probability that ranges
from 0 to 1 with respect to dependent variable.
Class A

Binary
Classification

Class B
12
Logistic regression uses the concept of
predictive modeling as regression;

therefore, it is called logistic regression, but is


used to classify samples; Therefore, it falls
under the classification algorithm.

13
 Bayes' Theorem is a simple mathematical
formula used for calculating conditional
probabilities.
 that is, the probability of one event occurring
given that another event has already
occurred. ...
 Thus, conditional probabilities are a must in
determining accurate predictions and
probabilities in Machine Learning.
 More data – it can contribute to more
accurate results.
14
 Example of conditional probability
P(X/A)
Where, X = selecting element
A = Selecting from Something
Suppose we have BOX A and in that 5 Red
Balls and 2 white Balls are there,
5 Red
BOX A
2 White

If you want to select red balls from box A so


what is the probability, 15
 P (R/A) = Red Balls 5 5
Total no of Balls 5+2 7

16
 Two conditional probability are used i.e.
two event are used event A and event B.

 P (A / B) means Probability of event A want


to find and event B already occurred.

17
 RHS same so we

18
19
 Naive Bays Classifier
 Naive Bayes classifiers are a collection of
classification algorithms based on Bayes’
Theorem.

 It is not a single algorithm but a family of


algorithms where all of them share a
common principle.

 i.e. every pair of features being classified is


independent of each other.
20
21

 Naïve Bayes algorithm is a supervised learning algorithm,


which is based on Bayes theorem and used for solving
classification problems.
 It is mainly used in text classification that includes a high-
dimensional training dataset.
 Naïve Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building
the fast machine learning models that can make quick
predictions.
 It is a parobabilistic classifier, which means it predicts
on the basis of the probability of an object.
 Some popular examples of Naïve Bayes Algorithm
are spam filtration, Sentimental analysis, and classifying
articles.
 Why is it called Naïve Bayes? 22

 The Naïve Bayes algorithm is comprised of two


words Naïve and Bayes, Which can be described
as:
 Naïve: It is called Naïve because it assumes that
the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the
fruit is identified on the bases of color, shape, and
taste, then red, spherical, and sweet fruit is
recognized as an apple. Hence each feature
individually contributes to identify that it is an
apple without depending on each other.
 Bayes: It is called Bayes because it depends on the
principle of Bayes' Theorem.
23
 Bayes' Theorem:
 Bayes' theorem is also known as Bayes' Rule or Bayes'
law, which is used to determine the probability of a
hypothesis with prior knowledge. It depends on the
conditional probability.
 The formula for Bayes' theorem is given as:
 Where,
 P(A|B) is Posterior probability: Probability of hypothesis
A on the observed event B.
 P(B|A) is Likelihood probability: Probability of the
evidence given that the probability of a hypothesis is true.
 P(A) is Prior Probability: Probability of hypothesis
before observing the evidence.
 P(B) is Marginal Probability: Probability of Evidence.
 Working of Naïve Bayes' Classifier: 24

 Working of Naïve Bayes' Classifier can be


understood with the help of the below example:
 Suppose we have a dataset of weather
conditions and corresponding target variable
"Play". So using this dataset we need to decide that
whether we should play or not on a particular day
according to the weather conditions. So to solve
this problem, we need to follow the below steps:
 Convert the given dataset into frequency tables.
 Generate Likelihood table by finding the
probabilities of given features.
 Now, use Bayes theorem to calculate the posterior
probability.
25
 Advantages of Naïve Bayes Classifier:
 Naïve Bayes is one of the fast and easy ML algorithms
to predict a class of datasets.
 It can be used for Binary as well as Multi-class
Classifications.
 It performs well in Multi-class predictions as compared
to the other Algorithms.
 It is the most popular choice for text classification
problems.
 Disadvantages of Naïve Bayes Classifier:
 Naive Bayes assumes that all features are independent
or unrelated, so it cannot learn the relationship between
features.
26

 Applications of Naïve Bayes Classifier:


 It is used for Credit Scoring.
 It is used in medical data classification.
 It can be used in real-time
predictions because Naïve Bayes Classifier
is an eager learner.
 It is used in Text classification such
as Spam filtering and Sentiment analysis.
27
 Types of Naïve Bayes Model:
 There are three types of Naive Bayes Model, which are
given below:
 Gaussian: The Gaussian model assumes that features
follow a normal distribution. This means if predictors take
continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian
distribution.
 Multinomial: The Multinomial Naïve Bayes classifier is
used when the data is multinomial distributed. It is
primarily used for document classification problems, it
means a particular document belongs to which category
such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the
predictors.
28

 Bernoulli: The Bernoulli classifier


works similar to the Multinomial
classifier, but the predictor variables are
the independent Booleans variables. Such
as if a particular word is present or not in
a document. This model is also famous
for document classification tasks.
29
Principle of Naive Bayes Classifier:

 The crux of the classifier is based on the


Bayes theorem.

 Using Bayes theorem, we can find the


probability of A happening, given that B has
occurred.
30
Cont….

 Here, B is the evidence and A is the


hypothesis.
The assumption made here is that the
predictors/features are independent.
That is presence of one particular
feature does not affect the other.
Hence it is called naive.
 Example of Navie Bayes Classifier
 Consider given dataset and naive Bayes algorithm,
and predict that if fruit has following property, then
which type of fruit it is.
 Fruit( Yellow, Sweet, Long)

Orange

31
 Fruit = Orange
 P(Yellow/orange)=P(Orange/Yellow) P(Y)
 P(O)
 Find probability of orange it is yellow
 P(O/Y)= 350/ 800
 P(Yellow)= 800/1200
 P(Orange)= 650/1200

32
33
 P( Sweet/ Orange)= P(O/S) P(S)
 P(O)
 P(O/S)= 450/ 850
 P(S)= 850/1200
 P(O) = 650/1200
 = 0.69
 P(Long/ Orange)= P(O/L) P(L)
 P(O)
 P(O/L)= 0*400
 P( Fruit/ Orange)= 0.53*0.69*0
 =0
 Home Work………..
 Banana= ?
 P(Yellow/ Banana)
 P( Sweet/ Banana)= ?
 P(Long/ Banana)= ?
 Others= ?
 P(Yellow/ Others)
 P( Sweet/ Others)= ?
 P(Long/ Others)= ?

34
 SVM (Support Vector Machine)
 Introduction
 Types of Support Vector Kernel
 Liner Kernel
 Polynomial Kernel
 Gaussian Kernel
 Hyperplane
 Properties of SVM
 Issue of SVM

35
 SVM (Support Vector Machine)
 Support Vector Machine or SVM is one of
the most popular Supervised Learning
algorithms.

 Used for Classification as well as


Regression problems.

 However, primarily, it is used for


Classification problems in Machine
Learning.
36
 Suppose we see a strange cat that also has
some features of dogs,
 so if we want a model that can accurately
identify whether it is a cat or dog,
 so such a model can be created by using the
SVM algorithm.

Testing
37
 We will first train our model with lots of
images of cats and dogs.
 so that it can learn about different features of
cats and dogs, and then
 Then testing .
 So as support vector creates a decision
boundary between these two data (cat and
dog) and choose extreme cases (support
vectors).
 SVM algorithm can be used for Face
detection, image classification, text
38
categorization, etc.
Types of SVM

Linear SVM: Linear SVM is used for linearly


separable data, which means if a dataset can
be classified into two classes by using a single
straight line.

Non-linear SVM: Non-Linear SVM is used


for non-linearly separated data, which means
if a dataset cannot be classified by using a
straight line 39

 Suppose we have a dataset that has two tags (green
and blue)
 the dataset has two features x1 and x2.
 We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue. Consider
the below image:

40
 So as it is 2-d space so by just using a straight line,
we can easily separate these two classes.
 But there can be multiple lines that can separate
these classes.

41
 SVM algorithm helps to find the best line or
decision boundary this best boundary or region is
called as a Hyperplane.
 SVM algorithm finds the closest point of the lines
from both the classes. These points are called
support vectors.
 The distance between the vectors and the
Hyperplane is called as margin.
 Goal of SVM is to maximize this margin.
The Hyperplane with maximum margin is called
the optimal Hyperplane.

42

43
Non-Linear SVM:
If data is linearly arranged, then we can separate it by
using a straight line, but for non-linear data, cannot
draw a single straight line. Consider the below image:

44
 So to separate these data points, we need to
add one more dimension.

 For linear data, we have used two


dimensions x and y, so for non-linear data,
and add a third dimension z.

45
46

Linear SVM Non-Linear SVM

It cannot be easily
It can be easily separated
separated with a linear
with a linear line.
line.
We use Kernels to make
Data is classified with
non-separable data into
the help of hyperplane.
separable data.
Data can be easily We map data into high
classified by drawing a dimensional space to
straight line. classify.
Hyperplane and Support Vectors in the
SVM algorithm:
 Hyperplane: There can be multiple
lines/decision boundaries to separate the
classes.
 but we need to find out the best decision
boundary that helps to classify the data
points.

 This best boundary is known as the


Hyperplane of SVM
47
Support Vectors:

The data points or vectors that are the closest


to the hyperplane

which affect the position of the hyperplane are


termed as Support Vector.

Since these vectors support the hyperplane,


hence called a Support vector.
48
Kernel Function 49

 Kernel Function is a method used to take data


as input (low dimensional) and transform into the
required form of processing data (high dimension)

 “Kernel” is used due to set of mathematical


functions used in Support Vector Machine
provides the window to manipulate the data.

 So, Kernel Function generally transforms the


training set of data so that a non-linear decision
surface is able to transformed to a linear equation
in a higher number of dimension spaces
 SVM Kernel Functions 50

 SVM algorithms use a set of mathematical


functions that are defined as the kernel.
 The function of kernel is to take data as input and
transform it into the required form.
 Different SVM algorithms use different types of
kernel functions.
 These functions can be different types. For
example linear, nonlinear, polynomial, radial
basis function (RBF), and sigmoid.
 The most preferred kind of kernel function is
RBF. Because it's localized and has a finite
response along the complete x-axis.
51

 The kernel functions return the scalar product


between two points in an exceedingly suitable
feature space.
 Thus by defining a notion of resemblance, with a
little computing cost even in the case of very
high-dimensional spaces.
1. Linear Kernel 52

 It is the most basic type of kernel, usually one


dimensional in nature.
 It proves to be the best function when there are lots
of features. The linear kernel is mostly preferred
for text-classification problems as most of these
kinds of classification problems can be linearly
separated.
 Linear kernel functions are faster than other
functions.
 Linear Kernel Formula
 F(x, xj) = sum( x.xj)
 Here, x, xj represents the data you’re trying to
classify.
53
2. Polynomial Kernel
 It is a more generalized representation of the
linear kernel.
 It is not as preferred as other kernel
functions as it is less efficient and accurate.
 Polynomial Kernel Formula
 F(x, xj) = (x.xj+1)^d
 Here ‘.’ shows the dot product of both the
values, and d denotes the degree.
 F(x, xj) representing the decision
boundary to separate the given classes.
54
3. Gaussian Radial Basis Function (RBF)
 It is one of the most preferred and used kernel
functions in svm.
 It is usually chosen for non-linear data. It helps to
make proper separation when there is no prior
knowledge of data.
 Gaussian Radial Basis Formula
 F(x, xj) = exp(-gamma * ||x - xj||^2)
 The value of gamma varies from 0 to 1.
 You have to manually provide the value of
gamma in the code. The most preferred value
for gamma is 0.1.
55

4. Sigmoid Kernel
 It is mostly preferred for neural networks.
 This kernel function is similar to a two-layer
perceptron model of the neural network, which
works as an activation function for neurons.
 It can be shown as,
 Sigmoid Kenel Function
 F(x, xj) = tanh(αxay + c)
56

5. Gaussian Kernel
 It is a commonly used kernel. It is used when there
is no prior knowledge of a given dataset.
 Gaussian Kernel Formula
 Properties of SVM
 Ability to handle large feature spaces
(complexity does not depend on the
dimensionality of the feature space)
 Overfitting can be controlled by soft
margin approach
 Nice math property
 Feature Selection
 Flexibility in choosing a similarity
function 57
Advantages of SVM

Good for smaller datasets

Accurate results.

Useful for both linearly separable data


and non – linearly separable data.

Effective in high dimensional spaces.


58
Disadvantages of SVM

Not suitable for large datasets, as the


training time can be too much.

Not so effective on a dataset with


overlapping classes.

Picking the right kernel can be


computationally intensive
59
60
 SVM Applications
SVM has been used successfully in many real
world problems
1. Text (and hypertext) categorization
2. image classification
3. bioinformatics (Protein classification,
Cancer classification)
4. hand-written character recognition
61
62
63
THANK YOU!

64

You might also like