0% found this document useful (0 votes)
9 views

Unit 5

The document discusses the Naive Bayes classifier algorithm. It begins by explaining what Naive Bayes is and some common applications. It then provides more details on Bayes' Theorem and how Naive Bayes uses it to make probabilistic predictions. Finally, it covers the advantages, disadvantages, and different types of Naive Bayes models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 5

The document discusses the Naive Bayes classifier algorithm. It begins by explaining what Naive Bayes is and some common applications. It then provides more details on Bayes' Theorem and how Naive Bayes uses it to make probabilistic predictions. Finally, it covers the advantages, disadvantages, and different types of Naive Bayes models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT 05

Naïve Bayes Classifier

• Naïve Bayes algorithm is a supervised learning algorithm,


which is based on Bayes theorem and used for solving
classification problems.
• It is mainly used in text classification that includes a high-
dimensional training dataset.
• It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
• Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features.
Bayes' Theorem:

• Bayes’ Theorem finds the probability of an event occurring given


the probability of another event that has already occurred. Bayes’
theorem is stated mathematically as the following equation:

• where A and B are events and P(B) ≠ 0


• P(A) is the priori of A (the prior probability, i.e. Probability of event
before evidence is seen).
• P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
• P(A|B) is a posteriori probability of B, i.e. probability of event after
evidence is seen.
Working of Naïve Bayes' Classifier:

• Convert the given dataset into frequency


tables.
• Generate Likelihood table by finding the
probabilities of given features.
• Now, use Bayes theorem to calculate the
posterior probability.
Advantages of Naïve Bayes Classifier:

• Naïve Bayes is one of the fast and easy ML


algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class
Classifications.
• It performs well in Multi-class predictions as
compared to the other Algorithms.
• It is the most popular choice for text
classification problems.
Disadvantages of Naïve Bayes Classifier:

• Naive Bayes assumes that all features are


independent or unrelated, so it cannot learn
the relationship between features.
Applications of Naïve Bayes Classifier:

• It is used for Credit Scoring.


• It is used in medical data classification.
• It can be used in real-time
predictions because Naïve Bayes Classifier is
an eager learner.
• It is used in Text classification such as Spam
filtering and Sentiment analysis.
Types of Naïve Bayes Model:
• Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from
the Gaussian distribution.

• Multinomial: The Multinomial Naïve Bayes classifier is used when the


data is multinomial distributed. It is primarily used for document
classification problems, it means a particular document belongs to
which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.

• Bernoulli: The Bernoulli classifier works similar to the Multinomial


classifier, but the predictor variables are the independent Booleans
variables. Such as if a particular word is present or not in a document.
This model is also famous for document classification tasks.
• K-Nearest Neighbour Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then
it can be easily classified into a well suitable category.
• K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm
• It is also called a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the
time of classification, it performs an action on the dataset.
Working of knn
• The K-NN working can be explained on the basis of the
below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of
neighbors
• Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
• Step-4: Among these k neighbors, count the number of the
data points in each category.
• Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
Firstly, we will choose the number of
neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean
distance between the data points. The
Euclidean distance is the distance
between two points, which we have
already studied in geometry. It can be
calculated as:
As we can see the 3 nearest neighbors are
from category A, hence this new data point
must belong to category A.
• Feature scaling is crucial for the KNN algorithm, as it helps in
preventing features with larger magnitudes from dominating the
distance calculations.
• Feature scaling is an essential step in the data preprocessing
pipeline, especially for distance-based algorithms like the KNN.
• Distance-based algorithms, such as the KNN, calculate the
distance between data points to determine their similarity.
• Features with larger magnitudes can disproportionately
influence the distance calculation, leading to biased results.
• Feature scaling addresses this issue by transforming the features
to a comparable range or scale, ensuring that each feature
contributes fairly to the final result.
• Imagine you’re measuring the similarity
between two houses based on their size (in
square feet) and the number of rooms.

If you don’t scale the features, the difference in size would


dominate the distance calculation, while the difference in the
number of rooms would barely contribute.
Feature scaling helps to balance these two features, allowing for a
more accurate comparison.
• Two common feature scaling methods are
Min-Max scaling and Standardization.
• Min-Max scaling transforms the features by
scaling their values to a specific range,
typically [0, 1].
• It is calculated using the formula:

We subtract the minimum value from each feature and divide the result by the
difference between the maximum and minimum values.
• Standardization does it by centering their
values around the mean (0) and scaling them
based on the standard deviation.
• It is calculated using the formula:
Advantages of KNN Algorithm:

• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is
large.
Disadvantages of KNN Algorithm:

• Always needs to determine the value of K


which may be complex some time.
• The computation cost is high because of
calculating the distance between the data
points for all the training samples.
Support Vector Machine

• SVM Supervised Learning algorithms, which is used for


Classification as well as Regression problems.
• It is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so
that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors,
and hence algorithm is termed as Support Vector Machine.
• SVM algorithm can be used for Face detection, image
classification, text categorization, etc.
Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Types of SVM

• Linear SVM: Linear SVM is used for linearly separable


data, which means if a dataset can be classified into
two classes by using a single straight line, then such
data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-
linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used
is called as Non-linear SVM classifier.

You might also like