MLT by engineering express
MLT by engineering express
All
BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal
Classifier, Naïve Bayes classifier, Bayesian belief networks, EM algorithm.
SUPPORT VECTOR MACHINE: Introduction, Types of support vector
UNIT-2
kernel – (Linear kernel, polynomial kernel,and Gaussiankernel),
Hyperplane– (Decision surface), Properties of SVM, and Issues in SVM.
Engineering Express
Regression
• Regression is a technique in machine learning and statistics used to
find relationships between variables and make predictions. It helps us
understand how the value of one thing (called the dependent
variable) changes when other things (called independent variables)
change.
• For example- If you want to predict the price of a house based on its
size, regression can help create a model that shows how the house
price changes as the size (in square feet) increases.
• This way, you can estimate the price of a house just by knowing its
size.
• In simple terms, regression is like drawing a line (or curve) through
data points to see how one thing affects another and make
predictions about future outcomes.
Types of Regression
1. Linear Regression
Linear Regression is a simple statistical method used to predict a
continuous outcome (like price or temperature) based on the
relationship between two variables, where one is the input
(independent variable) and the other is the output (dependent
variable). It models this relationship using a straight line to make
predictions.
The formula for linear regression is:
Y=mX+b
Where:
Y is the predicted output (dependent variable).
X is the input (independent variable).
m is the slope of the line (how much Y changes with X).
b is the intercept (the value of Y when X is 0).
Engineering Express
Linear regression assumes a linear relationship between the input and output.
It tries to minimize the difference between the actual data points and
the line (using a method called least squares).
Example:
If you want to predict a house's price based on its size, linear
regression can help you find a straight line that best fits the data
points representing different house sizes and their corresponding
prices.
2. Logistic Regression
Logistic Regression is a statistical method used in machine learning to
predict a categorical outcome (like yes/no, true/false) based on one or
more input variables. Unlike linear regression, it predicts probabilities
and uses a logistic function to model a binary outcome (0 or 1).
In simple terms, it's used when the result is something like "Will it
happen or not?" instead of predicting a continuous value.
Example:
If you want to predict whether an email is spam or not based on
features like the presence of certain words, the sender's address, or
the email length, logistic regression can help you estimate the
probability that an email is spam.
The formula for Logistic Regression is:
Where,
P(Y=1) is a predicted probability that the dependent variable Y equals 1
(e.g., event occurs).
X is the input (independent variable)
b0 is the intercept the value of the log-odds when X is 0.
b1 is the coefficient (the change in the log-odds of Y for a one unit -in X.
e Is the base of the natural logarithm (approximately equal to 2.71828)
Engineering Express
Bayesian Learning
Bayesian Learning is a statistical approach to machine learning that
applies Bayes' theorem to update the probability of a hypothesis as
new evidence is acquired. It focuses on incorporating prior knowledge
and evidence to improve predictions and decision-making.
Formula:
The theorem can be expressed mathematically as follows:
Where,
P(H|D): Posterior Probability - The probability of the hypothesis H
being true after observing the data D.
P(D|H): Likelihood - The probability of observing the data D given
that the hypothesis H is
true.
P(H): Prior Probability — The initial probability of the hypothesis H
before observing any
data.
P(D): Marginal Probability - The total probability of data D occurring
under all hypotheses.
Bayes Theorem
Bayes' Theorem is a fundamental concept in probability theory that
describes how to update the probability of a hypothesis based on new
evidence. It provides a way to calculate the conditional probability of
an event based on prior knowledge and observed data.
Formula:
The theorem can be expressed mathematically as follows:
Where,
P(H|D): Posterior Probability - The probability of the hypothesis H
being true after observing the data D.
P(D|H): Likelihood - The probability of observing the data D given
that the hypothesis H is true.
P(H): Prior Probability — The initial probability of the hypothesis H
before observing any data.
P(D): Marginal Probability - The total probability of the data D
occurring under all hypotheses.
Engineering Express
Explanation:
• Prior Probability reflects your belief about the hypothesis before
considering the evidence.
• Likelihood measures how well the hypothesis explains the observed data.
• Posterior Probability is the updated belief after taking the evidence
into account.
Example:
Suppose you want to determine the probability that someone has a
disease (hypothesis H) after testing positive for it (evidence D).
You would use Bayes' theorem to update your belief about the
disease's probability based on the test's accuracy and the general
prevalence of the disease in the population.
Importance:
Bayes' theorem is widely used in various fields, including statistics,
machine learning, finance, and medicine, as it provides a coherent
method for reasoning about uncertainty and making decisions based
on incomplete information.
Question Example
You are a doctor, and you know that there is a disease (F) known as
"tuberculosis." You have observed that:
• 5% of people have tuberculosis.
P(F) = 0.05
• If someone has tuberculosis, 90% of them have a persistent cough.
P(C|F) = 0.9
• However, if someone does not have tuberculosis, only 10% of them
have a persistent cough.
P(C|-F) 0.1
If someone has a persistent cough (C), what is the probability that
they have tuberculosis (P(FC))?
Engineering Express
Answer Steps
1. Prior Probability:
P(F) = 0.05
P(-F) 1-P(F) 10.05 0.95
2. Likelihood:
P(C|F) 0.9
P(C-F) 0.1
Example:
In spam email classification, it would compute the probability of an
email being spam or not spam based on the presence of certain
keywords, choosing the class with the highest probability.
Limitations:
• Computationally Intensive: Requires significant computation, especially
with many features or classes.
• Independence Assumption: Assumes features are independent given the
class, which may not be true in real scenarios.
Importance:
It serves as a benchmark for evaluating other classification algorithms,
like Naive Bayes, which simplifies some of the assumptions of the
Bayes Optimal Classifier.
Engineering Express
Naïve Bayes Classifier
The Naïve Bayes Classifier is a simple, yet powerful classification
algorithm based on Bayes' theorem. It is called "naïve" because it
assumes that the features (or attributes) used for classification are
independent of each other, which is often not the case in real-world data.
2. Multinomial Naïve Bayes: Suitable for discrete data, often used in text
classification.
Advantages
Simplicity: Easy to understand and implement.
Limitations
Independence Assumption: The assumption of feature independence is
rarely true, which can lead to inaccuracies.
Importance
The Naïve Bayes classifier is widely used in various Applications,
including text classification (spam detection, sentiment analysis),
document categorization, and recommendation systems, due to its
efficiency and effectiveness in handling large datasets.
Engineering Express
Bayesian Belief Networks (BBNs)
Bayesian Belief Networks (BBNs), also known as Bayesian Networks,
are graphical models that represent the probabilistic relationships
among a set of variables using a directed acyclic graph (DAG).
Applications
Used in medical diagnosis, decision-making systems, and predictive
analytics.
Advantages
Handles uncertainty well and shows causal relationships clearly.
Engineering Express
EM Algorithm (Expectation-Maximization Algorithm)
The Expectation-Maximization (EM) Algorithm is a statistical method
used for estimating the parameters of probabilistic models, particularly
when dealing with incomplete or missing data. It is commonly applied
in various fields like machine learning, computer vision, and natural
language processing.
1. Latent Variables: These are variables that are not directly observed but
are inferred from the observed data. The EM algorithm is particularly
useful in scenarios involving latent variables.
Applications
Clustering: Used in Gaussian Mixture Models to find clusters in data.
2. Support Vectors: The data points that are closest to the hyperplane
are called "Support Vectors." These points are critical to SVM as they
determine the position of the hyperplane and help differentiate
between the classes.
3. Margin: The distance between the hyperplane and the nearest support
vectors on both sides is called the margin. SVM aims to find a
hyperplane with the maximum margin, as a larger margin results in
more robust classification.
5. Kernel Trick: When data is not linearly separable, SVM applies kernel
functions to project the data into higher dimensions, where it can be
separated easily. Popular kernel functions include Polynomial kernel,
Radial Basis Function (RBF), and Gaussian kernel.
Engineering Express
Types of SVM Kernels:
1. Linear Kernel:
Definition: The linear kernel is used when the data can be separated
by a straight line (or hyperplane in higher dimensions). It is the
simplest form of kernel.
Usage: It is typically used when the data is linearly separable,
meaning the two classes can be separated by a straight boundary.
Example: Classifying spam and non-spam emails when the features
(like words) can be separated by a linear boundary.
2. Polynomial Kernel:
Definition: The polynomial kernel is useful for datasets that are not
linearly separable but can be separated using polynomial decision
boundaries. It adds more complexity by creating curved boundaries.
Usage: It is used when the relationship between the data points is
non-linear but can be captured by polynomial functions.
Example: When the data points form a circular or curved pattern, a
polynomial kernel can be used to classify them.
Decision Surface:
The hyperplane is also called the decision surface because it helps
SVM decide which category the points belong to.
Best Hyperplane: SVM's task is to find the hyperplane that creates the
maximum margin between the classes, meaning it tries to keep the
points of different classes as far away from the hyperplane as possible
for more accurate classification.
Engineering Express
Some properties OF SVM:
1. Maximum Margin Classifier:
SVM's objective is to find the maximum margin between classes to
separate them as effectively as possible. The farther the hyperplane is
from the class points, the better the classification.
4. Robustness to Overfitting:
If the data is well-separated, SVM reduces the chances of overfitting.
Overfitting occurs when a model fits the training data so well that it
doesn't perform well on testing data.
5. Support Vectors:
In SVM, only a few important points (called support vectors) are used
to define the hyperplane. These points lie very close to the hyperplane
and help form the decision boundary.
Engineering Express
Issues in SVM (Challenges with SVM)
1. Computational Complexity:
If you have a very large dataset, the training process of SVM can be
slow, as finding the best hyperplane requires significant computation.
2. Choice of Kernel:
For non-linear data, selecting the right kernel function is crucial. If
the wrong kernel is chosen, SVM’s performance can degrade. It’s often
difficult to determine which kernel (Polynomial, Gaussian, or others)
is the best fit for the data.
4. Interpretability:
The results of SVM can be harder to interpret. When working in high
dimensions, it’s not always easy to understand how or why the
hyperplane is being formed.