Unit 2 - Machine Learning
Unit 2 - Machine Learning
Presented By
Dr. Anita
.
OUTLINE
Regression
• Linear Regression
• Logistic Regression
Bayes Theorem
Bays Optimal classifier
Naive Bays Classifier
SVM
2
Regression
INTRODUCTION
Regression analysis is a fundamental
concept in the field of machine learning.
Supervised Learning
5
Here,
Y is a dependent variable.
X1, X2, …., Xn are independent variables.
β0, β1,…, βn are the regression coefficients.
βj is the slope or weight that specifies the
factor by which Xj has an impact on Y.
8
9
regression coefficient
y = 0.8+ 1.2x1 +2x2 +4x3 +1x4
here X and Y both values not added
x3= 4 (Maximum value)
X3 is most appropriate feature to find out
the y (Dependent Variable)
Logistic Regression
Logistic regression is one of the most
popular Machine Learning algorithms,
which comes under the Supervised Learning
technique.
Binary
Classification
Class B
12
Logistic regression uses the concept of
predictive modeling as regression;
13
Bayes' Theorem is a simple mathematical
formula used for calculating conditional
probabilities.
that is, the probability of one event occurring
given that another event has already
occurred. ...
Thus, conditional probabilities are a must in
determining accurate predictions and
probabilities in Machine Learning.
More data – it can contribute to more
accurate results.
14
Example of conditional probability
P(X/A)
Where, X = selecting element
A = Selecting from Something
Suppose we have BOX A and in that 5 Red
Balls and 2 white Balls are there,
5 Red
BOX A
2 White
16
Two conditional probability are used i.e.
two event are used event A and event B.
18
19
Naive Bays Classifier
Naive Bayes classifiers are a collection of
classification algorithms based on Bayes’
Theorem.
Orange
31
Fruit = Orange
P(Yellow/orange)=P(Orange/Yellow) P(Y)
P(O)
Find probability of orange it is yellow
P(O/Y)= 350/ 800
P(Yellow)= 800/1200
P(Orange)= 650/1200
32
33
P( Sweet/ Orange)= P(O/S) P(S)
P(O)
P(O/S)= 450/ 850
P(S)= 850/1200
P(O) = 650/1200
= 0.69
P(Long/ Orange)= P(O/L) P(L)
P(O)
P(O/L)= 0*400
P( Fruit/ Orange)= 0.53*0.69*0
=0
Home Work………..
Banana= ?
P(Yellow/ Banana)
P( Sweet/ Banana)= ?
P(Long/ Banana)= ?
Others= ?
P(Yellow/ Others)
P( Sweet/ Others)= ?
P(Long/ Others)= ?
34
SVM (Support Vector Machine)
Introduction
Types of Support Vector Kernel
Liner Kernel
Polynomial Kernel
Gaussian Kernel
Hyperplane
Properties of SVM
Issue of SVM
35
SVM (Support Vector Machine)
Support Vector Machine or SVM is one of
the most popular Supervised Learning
algorithms.
Testing
37
We will first train our model with lots of
images of cats and dogs.
so that it can learn about different features of
cats and dogs, and then
Then testing .
So as support vector creates a decision
boundary between these two data (cat and
dog) and choose extreme cases (support
vectors).
SVM algorithm can be used for Face
detection, image classification, text
38
categorization, etc.
Types of SVM
40
So as it is 2-d space so by just using a straight line,
we can easily separate these two classes.
But there can be multiple lines that can separate
these classes.
41
SVM algorithm helps to find the best line or
decision boundary this best boundary or region is
called as a Hyperplane.
SVM algorithm finds the closest point of the lines
from both the classes. These points are called
support vectors.
The distance between the vectors and the
Hyperplane is called as margin.
Goal of SVM is to maximize this margin.
The Hyperplane with maximum margin is called
the optimal Hyperplane.
42
43
Non-Linear SVM:
If data is linearly arranged, then we can separate it by
using a straight line, but for non-linear data, cannot
draw a single straight line. Consider the below image:
44
So to separate these data points, we need to
add one more dimension.
45
46
It cannot be easily
It can be easily separated
separated with a linear
with a linear line.
line.
We use Kernels to make
Data is classified with
non-separable data into
the help of hyperplane.
separable data.
Data can be easily We map data into high
classified by drawing a dimensional space to
straight line. classify.
Hyperplane and Support Vectors in the
SVM algorithm:
Hyperplane: There can be multiple
lines/decision boundaries to separate the
classes.
but we need to find out the best decision
boundary that helps to classify the data
points.
4. Sigmoid Kernel
It is mostly preferred for neural networks.
This kernel function is similar to a two-layer
perceptron model of the neural network, which
works as an activation function for neurons.
It can be shown as,
Sigmoid Kenel Function
F(x, xj) = tanh(αxay + c)
56
5. Gaussian Kernel
It is a commonly used kernel. It is used when there
is no prior knowledge of a given dataset.
Gaussian Kernel Formula
Properties of SVM
Ability to handle large feature spaces
(complexity does not depend on the
dimensionality of the feature space)
Overfitting can be controlled by soft
margin approach
Nice math property
Feature Selection
Flexibility in choosing a similarity
function 57
Advantages of SVM
Accurate results.
64