AI&MLUnit 2
AI&MLUnit 2
PART – A
As we know, Bayes theorem defines the probability of an event based on the prior
knowledge of the conditions related to the event. In case, if we know the conditional
probability, we can easily find the reverse probabilities using the Bayes theorem.
Where VNB denotes the target value output by the naive Bayes classifier. In a
naive Bayes classifier the number of distinct P(ai / vj) terms that must be estimated from
the training data is just the number of distinct attribute values times the number of
distinct target values-a much smaller number than if we were to estimate the P(a1, a2 . . .
an | vj) terms as first contemplated.
Dr.R.Ahila Page 33
CS3491-AI&ML-QB
The conditional probability table for the Campjire node is shown at the right,
where Campjire is abbreviated to C, Storm abbreviated to S, and BusTourGroup
abbreviated to B.
10 .What is MLE?
Bayesian Network is used to represent the graphical model for probability relationship
among a set of variables.
A Bayesian network is a probabilistic graphical model that measures the
conditional dependence structure of a set of random variables based on the Bayes theorem
A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian
model.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.
Dr.R.Ahila Page 34
CS3491-AI&ML-QB
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference.
where:
Dr.R.Ahila Page 35
CS3491-AI&ML-QB
already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
Dr.R.Ahila Page 36
CS3491-AI&ML-QB
Doctors can diagnose patients by using the information that the classifier
provides. Healthcare professionals can use Naive Bayes to indicate if a patient is at
high risk for certain diseases and conditions, such as heart disease, cancer, and other
ailments.
o News Classification
With the help of a Naive Bayes classifier, Google News recognizes whether the
news is political, world news, and so on.
20. What are the advantages of Naïve Bayes model?
PART B
Let E1, E2,…, En be a set of events associated with a sample space S, where all the events E1,
E2,…, En have nonzero probability of occurrence and they form a partition of S. Let A be any
event associated with S, then according to Bayes theorem,
Dr.R.Ahila Page 37
CS3491-AI&ML-QB
Note:
The following terminologies are also used when the Bayes theorem is applied:
Hypotheses: The events E1, E2,… En is called the hypotheses
Priori Probability: The probability P(Ei) is considered as the priori probability of hypothesis E i
Posteriori Probability: The probability P(Ei|A) is considered as the posteriori probability of
hypothesis Ei
Bayes’ theorem is also called the formula for the probability of “causes”. Since the Ei‘s are a
partition of the sample space S, one and only one of the events E i occurs (i.e. one of the events
Ei must occur and the only one can occur). Hence, the above formula gives us the probability of
a particular Ei (i.e. a “Cause”), given that the event A has occurred.
b) Explain the axioms of probability.
Axioms of Probability
There are three axioms of probability that make the foundation of probability theory-
Axiom 1: Probability of Event
The first one is that the probability of an event is always between 0 and 1. 1 indicates
definite action of any of the outcome of an event and 0 indicates no outcome of the event
is possible.
Axiom 2: Probability of Sample Space
For sample space, the probability of the entire sample space is 1. The probability that at
least one of all the possible outcomes of a process (such as rolling a die) will occur is 1.
Axiom 3: Mutually Exclusive Events
And the third one is- the probability of the event containing any possible outcome of two
mutually disjoint is the summation of their individual probability. If two events A and B
Dr.R.Ahila Page 38
CS3491-AI&ML-QB
are mutually exclusive, then the probability of either A or B occurring is the probability
of A occurring plus the probability of B occurring
1. Probability of Event
The first axiom of probability is that the probability of any event is between 0 and 1.
As we know the formula of probability is that we divide the total number of outcomes in
the event by the total number of outcomes in sample space.
Let’s take an example from the dataset. Suppose we need to find out the probability of
churning for the female customers by their occupation type.
In our data-set, we have 4 female customers, one of them is Salaried and three of them
are self-employed. The salaried female is going to churn. Since we have only one salaried
female who is going to churn, the number of salaried female customers who are not going
to churn is 0. Amongst 3 self-employed female customers, two are going to churn and we
can see that one self-employed female is not going to churn. This is the complete dataset:
Dr.R.Ahila Page 39
CS3491-AI&ML-QB
So the probability of the churning status of female customer by profession, in the sample
space of the problem we actually have:
Salaried Churn, Salaried Not churn, Self-employed Churn, Self-employed Not churn
And as we discussed their distribution earlier, in this sample space of female customer:
Salaried Churn = 1
Salaried Not churn = 0
Self-employed Churn = 2
Self-employed Not churn = 1
Dr.R.Ahila Page 40
CS3491-AI&ML-QB
If you remember the union formula you will recall that the intersection term is not here,
which means there is nothing common between A and B. Let us understand these
particular type of events which is called Mutually Exclusive Events.
These Mutually exclusive events mean that such events cannot occur together or in other
words, they don’t have common values or we can say their intersection is zero/null. We
can also represent such events as follows:
This means that the intersection is zero or they do not have any common value. For
example, if the Event A: is getting a number greater than 4 after rolling a die, the
possible outcomes would be 5 and 6.
Even B: is getting a number less than 3 on rolling a die. Here the possible outcomes
would be 1 and 2.
Clearly, both these events cannot have any common outcome. An interesting thing to note here is that
events A and B are not complemented of each other but yet they’re mutually exclusive.
Dr.R.Ahila Page 41
CS3491-AI&ML-QB
2. We have prior knowledge over entire population of people only 0.008 have Cancer. The
test returns a correct positive result in only 98% of the cases in which the disease is actually
present, and a correct negative result in only 97% of the cases in which the disease is not
present. In other cases, the test returns the opposite result. A patient takes a lab test and
the result comes back positive. Evaluate whether the patients have cancer or not using
Bayes learning? Nov/Dec 20
Dr.R.Ahila Page 42
CS3491-AI&ML-QB
3. Given 14 training examples of the target concert play tennis with attributes outlook,
temperature, humidity and wind. The frequency of play tennis = 9 Frequency of not play tennis = 5
Conditional probabilities are given as P(outlook = rainy|Play = Yes ) = 2/9 P(temp = cool|Play =
Yes) = 3/9 P(humidity = High|Play = Yes) = 3/9 P(Windy = true|Play = Yes) = 3/9 P(Outlook =
rainy|Play = No ) = 3/5 P(temp = cool|Play = No) = 1/5 P(humidity = High|Play = No) = 4/5 P(Windy
= true|Play=No ) = 3/5 Classify the new instance whether play = yes or No (Outlook = rainy, Temp =
hot, Humidity = high, wind = false) using Naive Bayes Classifier.
Apr/May 21
Solution:
Dr.R.Ahila Page 43
CS3491-AI&ML-QB
Assumption
Dr.R.Ahila Page 44
CS3491-AI&ML-QB
Computation of probabilities
Remarks
Dr.R.Ahila Page 45
CS3491-AI&ML-QB
5. Write a short note on Bayesian network. Or Explain the Bayesian network by taking an
example. How is the Bayesian network representation for uncertainty knowledge?
Apr/May 21
Dr.R.Ahila Page 46
CS3491-AI&ML-QB
of the presence of various diseases. Efficient algorithms can perform inference and learning in
Bayesian networks.
Another Example:
Applications:
Prediction
Anomaly detection
Diagnostics
Automated insight
Reasoning
Time series prediction
Decision making under uncertainty.
6.a. Explain the role of prior probability and posterior probability in Bayesian classification.
Prior Probability:
Bayesian statistical inference is the probability of an event before new data is collected. This is the best
rational assessment of the probability of an outcome based on current knowledge.
Dr.R.Ahila Page 47
CS3491-AI&ML-QB
For example,
In the Mortgage case, P(Y) is the default rate on a home mortgage, which is 2%. P(Y|X) is called the
conditional probability, which provides the probability of an outcome given the evidence, that is, when
the value of X is known.
Posterior Probability:
It is calculated using Bayes’ Theorem. Prior probability gets updated when new data is available, to
produce a more accurate measure of a potential outcome.
A posterior probability can subsequently become a prior for a new updated posterior probability as new
information arises and is incorporated into the analysis.
Answer:
Inference over a Bayesian network can come in two forms. The first is simply evaluating the joint
probability of a particular assignment of values for each variable (or a subset) in the network
In exact inference, we analytically compute the conditional probability distribution over the variables of
interest.
Simulation Methods:
It uses the network to generate samples from the conditional probability distribution and estimate
conditional probabilities of interest when the number of samples is sufficiently large.
With machine learning, the inputs are known exactly, but the model is unknown prior to training.
Regarding output, the differences are more subtle. Both give an output, but the source of uncertainty is
different.
Variational Methods:
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising
in Bayesian inference and machine learning.
PART C
1. a) An experiment consists of observing the sum of the outcomes when two fair dice are
thrown. Find the probability that the sum is 7 and find the probability that the sum is greater than
10. May/June 2016
Solution
Sample space for total number of possible outcomes
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6),
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)
Dr.R.Ahila Page 48
CS3491-AI&ML-QB
b)
Solution
Let A, B, C denote the events that a randomly selected bulb was manufactured in factory
A, B, C respectively. Let D denote the event that a bulb is defective. We have the
following data:
2. Consider a training data set consisting of the fauna of the world. Each unit has three features
named “Swim”, “Fly” and “Crawl”. Let the possible values of these features be as follows:
Swim Fast, Slow, No
Fly Long, Short, Rarely, No
Crawl Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data set be as
in Table. Use naive Bayes algorithm to classify a particular species if its features are (Slow, Rarely,
No)?
Dr.R.Ahila Page 49
CS3491-AI&ML-QB
Solution
In this example, the features are
F1 = “Swim”; F2 = “Fly”; F3 = “Crawl”:
The class labels are
c1 = “Animal”; c2 = “ Bird”; c3 = “Fish”:
The test instance is (Slow, Rarely, No) and so we have:
x1 = “Slow”; x2 = “Rarely”; x3 = “No”:
We construct the frequency table shown in Table which summarizes the data. (It may be noted that the
construction of the frequency table is not part of the algorithm.)
Dr.R.Ahila Page 50
CS3491-AI&ML-QB
3. Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and
Sophia, who have taken a responsibility to inform Harry at work when they hear the alarm. David
always calls Harry when he hears the alarm, but sometimes he got confused with the phone ringing
Dr.R.Ahila Page 51
CS3491-AI&ML-QB
and calls at that time too. On the other hand, Sophia likes to listen to high music, so sometimes she
misses to hear the alarm. Here we would like to compute the probability of Burglary Alarm. Using
Bayesian network Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called the Harry.
Solution:
The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off, but David and Sophia's calls depend on
alarm probability.
The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
The conditional distributions for each node are given as conditional probabilities table or
CPT.
Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
In CPT, a boolean variable with k boolean parents contains 2 K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values
o List of all events occurring in this network:
Burglary (B)
Earthquake(E)
Alarm(A)
David Calls(D)
Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
Dr.R.Ahila Page 52
CS3491-AI&ML-QB
Dr.R.Ahila Page 53
CS3491-AI&ML-QB
5. In a clinic, the probability of the patients having HIV virus is 0.15. A blood test done on
patients : If patient has virus, then the test is +ve with probability 0.95. If the patient does
not have the virus, then the test is +ve with probability 0.02. Assign labels to events :H=
patient has virus , P=test +ve Given :P(H)= 0.15, P(P/H)=0.95, P(P/ ┐H) =0.02 Find : If the
test is +ve what are the probabilities that the patient i) has the virus ie P(H|P) ; ii) does not
have virus ie P(¬H|P) ; If the test is -ve what are the probabilities that the patient iii) has the
virus ie P(H|¬P) ; iv) does not have virus ie P(¬H|¬P) ;
Dr.R.Ahila Page 54
CS3491-AI&ML-QB
To solve this problem, we can use Bayes' theorem, which allows us to update our
beliefs about the probability of an event given new evidence. Let's denote:
H: Patient has the virus (HIV).
¬H: Patient does not have the virus (HIV).
P: Test result is positive.
¬P: Test result is negative.
Given probabilities:
P(H)=0.15: Probability of a patient having the virus.
P(P∣H)=0.95: Probability of a positive test result given the patient has the virus.
P(P∣¬H)=0.02: Probability of a positive test result given the patient does not have
the virus.
We need to find:
i)P(H∣P): Probability that the patient has the virus given the test is positive.
ii) P(¬H∣P): Probability that the patient does not have the virus given the test is
positive.
iii) P(H∣¬P): Probability that the patient has the virus given the test is negative.
iv) P(¬H∣¬P): Probability that the patient does not have the virus given the test is
negative.
Applying Bayes' Theorem:
i) P(H∣P)= P(P∣H)×P(H)/ P(P)
ii) P(¬H∣P)=P(P∣¬H)×P(¬H)/ P(P)
iii) P(H∣¬P)=P(¬P)P(¬P∣H)×P(H)
iv) P(¬H∣¬P)= P(¬P∣¬H)×P(¬H)/ P(¬P)
Where:
P(P)=P(P∣H)×P(H)+P(P∣¬H)×P(¬H)
P(¬P)=1−P(P)
P(¬P∣H)=1−P(P∣H)
P(¬P∣¬H)=1−P(P∣¬H)
Calculations:
Given:
P(H)=0.15
P(¬H)=1−P(H)=0.85
P(P∣H)=0.95
P(P∣¬H)=0.02
Calculate:
P(P)
P(¬P)
P(¬P∣H)
P(¬P∣¬H)
Then, substitute the values into Bayes' theorem to find P(H∣P),P(¬H∣P), P(H∣¬P),
and P(¬H∣¬P).
Let's do the calculations:
Calculations:
Dr.R.Ahila Page 55
CS3491-AI&ML-QB
1. P(P)=P(P∣H)×P(H)+P(P∣¬H)×P(¬H)
=0.95×0.15+0.02×0.85
=0.1425+0.017
=0.1595
2. P(¬P)=1−P(P)
=1−0.1595
=0.8405
3. P(¬P∣H)=1−P(P∣H)
=1−0.95
=0.05
4. P(¬P∣¬H)=1−P(P∣¬H)
=1−0.02
=0.98
Now, apply Bayes' theorem:
i) P(H∣P)= P(P∣H)×P(H)/ P(P)
=0.95×0.15/0.1595
≈0.1425/0.1595
≈0.893
ii) P(¬H∣P)= P(P∣¬H)×P(¬H)/ P(P)
=0.02×0.85/0.1595
≈0.017/0.1595
≈0.107
iii) P(H∣¬P)= P(¬P∣H)×P(H)/ P(¬P)
=0.05×0.15/0.8405
≈0.0075/0.8405
≈0.009
iv) P(¬H∣¬P)= P(¬P∣¬H)×P(¬H) / P(¬P)
=0.98×0.85/0.8405
≈0.833/0.8405
≈0.991
Results:
i) P(H∣P)≈0.893 or 89.3%
ii) P(¬H∣P)≈0.107 or 10.7%
iii) P(H∣¬P)≈0.009 or 0.9%
iv) P(¬H∣¬P)≈0.991 or 99.1%
These probabilities represent the likelihood of a patient having or not having the virus
given the test results.
6.Construct a Bayesian Network and define the necessary CPTs for the given scenario.
We have a bag of three biased coins a,b and c with probabilities of coming up heads of
20%, 60% and 80% respectively. One coin is drawn randomly from the bag (with equal
likelihood of drawing each of the three coins) and then the coin is flipped three times to
generate the outcomes X1, X2 and X3.
a.Draw a Bayesian network corresponding to this setup and define the relevant CPTs.
b.Calculate which coin is most likely to have been drawn if the flips come up HHT
Dr.R.Ahila Page 56
CS3491-AI&ML-QB
Dr.R.Ahila Page 57