0% found this document useful (0 votes)
2 views

Bayes Classifier

Uploaded by

2205456
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Bayes Classifier

Uploaded by

2205456
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Naive Bayes Classifier

Probability and Statistics


• Sample Space (S): The set of all possible outcomes of a random
experiment. Example: Rolling a die. S = 1, 2, 3, 4, 5, 6.
• Outcomes: An individual result of the random experiment. Outcomes
are the elements of the sample space.
• Event (E): A subset of the sample space. It consists of one or more
outcomes and represents a specific condition or results.
• Experiment: Rolling a die, Flipping a coin twice, etc.
• Sample Space: S = 1, 2, 3, 4, 5, 6 and S = HH, HT, TH, TT
• Event: (Rolling an odd number), (Getting at least one head) E = 1, 3, 5
• E = HH, HT, TH
• Probability deals with predicting the likelihood of future events.
• Statistics involves the analysis of the frequency of past events.
• In probability, we are given a model and asked what kind of data we
are likely to see.
• In statistic, we are given data and asked what kind of model is likely to
have generated it.
• In probability, a random variable is a variable that represents the
outcome of an event. It assigns a numerical value to each possible
result of the event.
• A random variable is a variable that assumes numerical values
associated with the random outcomes of an experiment, where one
(and only one) numerical value is assigned to each sample point.
• Random variable contains numerical values that can be discrete or
continuous.
Introduction Naïve Bayes
• The naive Bayes classifier is probably among the most effective
algorithms for learning tasks to classify text documents.
• The naive Bayes technique is extremely helpful in case of huge
datasets.
• For example, Google employs naive Bayes classifier to correct the
spelling mistakes in the text typed in by users.
• it gives a meaningful perspective to the comprehension of
various learning algorithms that do not explicitly manipulate
probabilities.
• Bayes theorem is the cornerstone of Bayesian learning methods.
Bayes theorem
• Bayes theorem offers a method of calculating the probability of a
hypothesis on the basis of its prior probability, the probabilities of
observing different data given the hypothesis, and the observed data itself.
• The distribution of all possible values of discrete random variable y is
expressed as probability distribution.

• We assume that there is some a priori probability (or simply prior) P(yq)
that the next feature vector belongs to the class q.
Bayes theorem
• The continuous attributes are binned and converted to categorical variables.
• Therefore, each attribute xj is assumed to have value set that are countable.
• Bayes theorem provides a way to calculate posterior P(yk |x); k ϵ{1, …, M}
from the known priors P(yq), together with known conditional probabilities
P(x| yq); q = 1, …, M.

Using this relation, easier


Directly, difficult to calculate

• P(x) expresses variability of the observed data, independent of the class.


Naive Bayes Classifier
• Takes into account the features as equally important and independent of
each other, considering the class.
• Not the scenario in real-life data.
• Each of the P(yq) may be estimated simply by counting the frequency with
which class yq occurs in the training data:

• If the decision must be made with so little information, it seems logical to


use the following rule: (Just like Naive rule)

For balanced data, it will not work

Very much greater decision will be right


Naive Bayes Classifier
• In most other circumstances, we need to estimate class-conditional
probabilities P(x|yq) as well

• According to the assumption (attribute values are conditionally


independent, given the class), given the class of the pattern, the
probability of observing the conjunction x1, x2, …, xn is just the product of
the probabilities for the individual attributes:
Naive Bayes Classifier

• where yNB denotes the class output by the naive Bayes classifier.
• The number of distinct P(xj | yq) terms that must be estimated from the
training data is just the number of distinct attributes (n) times the number
of distinct classes (M).
Summary
Introduction: Naïve Bayes Classifier
• The naive Bayes classifier is probably among the most effective
algorithms for learning tasks to classify text documents.
• The naive Bayes technique is extremely helpful in case of huge
datasets.
• For example, Google employs naive Bayes classifier to correct the
spelling mistakes in the text typed in by users.
• it gives a meaningful perspective to the comprehension of
various learning algorithms that do not explicitly manipulate
probabilities.
• Bayes theorem is the cornerstone of Bayesian learning methods.
Bayesian Classification: Why?
• Probabilistic learning: Calculate explicit probabilities for hypothesis,
among the most practical approaches to certain types of learning
problems
• Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct. Prior
knowledge can be combined with observed data.
• Probabilistic prediction: Predict multiple hypotheses, weighted by
their probabilities
• Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
Bayesian classification
• The classification problem may be formalized using a-posteriori
probabilities:
• P(C|X) = probability that the sample tuple
X=<x1,…,xk> is of class C.

• E.g. P(class=N | outlook=sunny, windy=true,…)

• Idea: assign to sample X the class label C such that P(C|X) is maximal

15
Estimating a-posteriori probabilities
• Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
• P(X) is constant for all classes
• P(C) = relative frequence of class C samples
• C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum

16
Naïve Bayesian Classification
• Naïve assumption: attribute independence
P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
• If i-th attribute is categorical:
P(xi|C) is estimated as the relative frequence of samples
having value xi as i-th attribute in class C

• If i-th attribute is continuous:


P(xi|C) is estimated thru a Gaussian density function
• Computationally easy in both cases
17
Play Tennis Dataset
Outlook Temperature Humidity Wind Play Tennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Naïve Bayesian Classifier
Play Tennis Example [Classifying Test Sample x]
Problem of zero probability

Where:
• α is the smoothing parameter (usually 𝛼=1).
• Count (𝑥𝑖 and 𝑦) is the count of instances where 𝑥𝑖 occurs with 𝑦.
• count(𝑦) is the count of instances where 𝑦 occurs.
• number of unique values for 𝑥𝑖number of unique values for 𝑥𝑖 is the
number of distinct categories for the feature 𝑥𝑖 .
Smoothing Outlook Feature
Yes Class No Class
2+𝛼 3+𝛼
𝑃 𝑠𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 = 𝑃 𝑠𝑢𝑛𝑛𝑦 𝑁𝑜 =
9+3𝛼 5+3𝛼

4+𝛼 2+𝛼
𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 𝑌𝑒𝑠 = 𝑃 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 𝑁𝑜 =
9 + 3𝛼 5 + 3𝛼

2+𝛼 𝛼
𝑃 𝑟𝑎𝑖𝑛 𝑌𝑒𝑠 = 𝑃 𝑟𝑎𝑖𝑛 𝑁𝑜 =
9+3𝛼 5+3𝛼

• 3𝛼 means Outlook feature has three different values (sunny, overcast, rain)
When to Adjust Alpha
• Laplace Smoothing is the method to handle zero probabilities.
• With Laplace Smoothing, none of the probabilities are zero, and the
model can handle unseen data without invalidating predictions.
• Small Dataset: Use 𝛼>1 to add more smoothing since fewer examples
increase the likelihood of zero probabilities.
• Large Dataset: 𝛼=1 is typically sufficient since larger datasets naturally
reduce the chances of zero probabilities.
• Validation: Test different 𝛼 values using cross-validation to find the best
fit for your dataset.
• The default value of alpha is 1. (from sklearn.naive_bayes import
CategoricalNB) nb = CategoricalNB(alpha=1)
Problem:(Confident = Yes, Sick = No) => (Fail or Pass)

• Dataset for classification:

• Find out whether the student with attribute Confident = Yes, Sick =
No will Fail or Pass using Bayesian classification.
• Let, C1 correspond to the class Result = Pass and C2 correspond to
Result = Fail.
Solution
• We wish to classify a test feature vector X = (Confident = Yes, Sick =
No) is more likely to C1 or C2?
• The classifier predicts that the class label of tuple X is the class Ci if
and only if
• Step 1: Compute prior probability
• The prior probability of each class, can be computed based on the
training set:
• Step 2: (Compute likelihood probability)
• To compute P(X| Ci), for i = 1, 2, we compute the following
conditional probabilities:

• Step 3: (Compute posterior probability)


• Predict the class for test observation feature vector X

• Therefore, the naive Bayesian classifier predicts Result = Pass for tuple X.
How to solve continuous feature in naïve bayes
Gaussian Naïve Bayes
• In gaussian naïve bayes the continuous values associated with each
feature is assumed to follow gaussian distribution. A random variable
is said to follow a gaussian/normal distribution when plotted gives a
bell-shaped curve which is symmetric about the mean.
• The likelihood of the feature is assumed to be gaussian and hence the
conditional probability is given by:
Solve Problem using Gaussian Distribusion
Multivariate Gaussian distribution
Application of Naïve Bayes
• Spam Detection: One of the earliest and most famous applications of
Naive Bayes is in the filtering of unwanted emails based on the
likelihood of certain words appearing in spam versus non-spam emails.
• Sentiment Analysis: Naive Bayes is commonly used in sentiment
analysis, determining whether a text expresses positive, negative, or
neutral sentiments, particularly useful in social media monitoring and
market research.
• Document Classification: It is extensively used in classifying
documents, such as categorizing news articles into various topics or
organizing books into genres.
• Healthcare: Naive Bayes has applications in the medical field for
disease prediction and discovering relationships between various risk
factors and diagnosis.
Advantages of Using Naive Bayes
• Efficiency: Naive Bayes is known for its simplicity and speed. It can
make quick predictions even with large datasets, which is invaluable
in real-time applications.
• Easy to Implement: With fewer parameters to tune, Naive Bayes can
be easier to implement compared to more complex models like
neural networks.
• Good Performance with Small Data: Unlike some models that require
vast amounts of training data to perform well, Naive Bayes can
achieve good results even with a smaller dataset.
• Probabilistic Interpretation: The model provides probabilities for
outcomes, offering more insight into the results, such as how likely a
given class is the correct classification.
Limitations and Considerations
• Independence Assumption: The biggest limitation is the
assumption of independent predictors. In real-world scenarios,
features often influence each other, and this assumption can
lead to incorrect predictions.
• Zero-Frequency Problem: If a categorical variable has a
category in test data set which was not observed in training
data set, it will assign a 0 probability and will be unable to
make a prediction. This is often mitigated by a smoothing
technique.
• Biased Estimates: Because it relies heavily on the actual
distribution of classes and features in the training set, Naive
Bayes can produce biased estimates if the training data is not
representative.

You might also like