0% found this document useful (0 votes)
7 views13 pages

Bayes Theorem

Bayes theorem provides a way to calculate the posterior probability of a hypothesis based on its prior probability, the probability of observing data given the hypothesis, and the observed data. The document discusses how Bayes theorem relates to concept learning problems and describes a brute force Bayes concept learning algorithm. It also discusses how consistent learners, which always output a hypothesis that fits the training examples, will output the maximum a posteriori (MAP) hypothesis under certain probability distributions over hypotheses and data.

Uploaded by

Shashti D B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Bayes Theorem

Bayes theorem provides a way to calculate the posterior probability of a hypothesis based on its prior probability, the probability of observing data given the hypothesis, and the observed data. The document discusses how Bayes theorem relates to concept learning problems and describes a brute force Bayes concept learning algorithm. It also discusses how consistent learners, which always output a hypothesis that fits the training examples, will output the maximum a posteriori (MAP) hypothesis under certain probability distributions over hypotheses and data.

Uploaded by

Shashti D B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

BAYES

THEOREM
Submitted To: Dr. Rachana P G
Submitted By:
Shashti D B 4GM20CS095
Sushma B C 4GM20CS110
Vanamali 4GM20CS121
Sumanth 4GM21CS408
Contents

B AY E S T H E O R E M
INTRODUCTION B AY E S T H E O R E M AND CONCEPT
LEARNING

B R U T E - F O R C E B AY E S MAP HYPOTHESES AND


CONCEPT LEARNING CONSISTENT LEARNERS
INTRODUCTION:
Bayesian learning methods are relevant to study of
machine learning for two different reasons.

• First, Bayesian learning algorithms that


calculate explicit probabilities for hypotheses,
such as the naive Bayes classifier, are among the
most practical approaches to certain types of
learning problems.

• The second reason is that they provide a useful


perspective for understanding many learning
algorithms that do not explicitly manipulate
probabilities.
F E AT U R E S
• I N C R E M E N TA L

• B AY E S I A N

• PROBABILISTIC

• C O M B I N AT I O N

• S TA N D A R D

P R A C T I A L D I F F I C U LT Y
• PROBABILITIES

• C O S T E S T I M AT I O N
B AY E S T H E O R E M
Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior probability, the
probabilities of observing various data given the hypothesis, and the observed data itself.

N O TAT I O N S MAP HYPOTHESIS ML HYPOYHESIS


EXAMPLE
Consider a medical diagnosis problem in which there are two alternative hypotheses
• The patient has a particular form of cancer (denoted by cancer)
• The patient does not (denoted by ¬ cancer) The available data is from a particular laboratory with two
possible outcomes:
+ (positive) and - (negative)

• Suppose a new patient is observed for whom the lab


test returns a positive (+) result.
• Should we diagnose the patient as having cancer or
not?
B AY E S T H E O R E M A N D C O N C E P T L E A R N I N G

What is the relationship between Bayes theorem and the problem of concept learning?
Since Bayes theorem provides a principled way to calculate the posterior probability of each
hypothesis given the training data, and can use it as the basis for a straightforward learning algorithm
that calculates the probability for each possible hypothesis, then outputs the most probable​.
B R U T E - F O R C E B AY E S C O N C E P T L E A R N I N G

We can design a straightforward concept learning algorithm to output the maximum a posteriori hypothesis,
based on Bayes theorem, as follows:

Brute-Force MAP Learning Algorithm

In order specify a learning problem for the BRUTE-FORCE MAP


LEARNING algorithm we must specify what values are to be used
for P(h) and for P(D|h) ?
Lets choose P(h) and for P(D|h) to be consistent with the following
assumptions:
• The training data D is noise free (i.e., di = c(xi ))
• The target concept c is contained in the hypothesis space H
• We have no a priori reason to believe that any hypothesis is more
probable than any other.​
W H AT VA L U E S S H O U L D W E S P E C I F Y F O R P ( H ) ?

• Given no prior knowledge that one hypothesis is more likely than


another, it is reasonable to assign the same prior probability to every
hypothesis h in H.
• Assume the target concept is contained in H and require that these
prior probabilities sum to 1

W H AT C H O I C E S H A L L W E M A K E F O R P ( D | H ) ?

• P(D|h) is the probability of observing the target values D = (d1 . . .dm)


for the fixed set of instances (x1 . . . xm), given a world in which
hypothesis h holds
• Since we assume noise-free training data, the probability of observing
classification di given h is just 1 if di = h(xi ) and 0 if di # h(xi ).
Given these choices for P(h) and for P(D|h) we now have a fully-defined problem for the above BRUTE-
FORCE MAP LEARNING algorithm.
In a first step, we have to determine the probabilities for P(h|D)

To summarize, Bayes theorem implies that the posterior probability P(h|D) under our assumed P(h)
and P(D|h) is

where |VSH,D| is the number of hypotheses from H consistent with D


T H E E V O L U T I O N O F P R O B A B I L I T I E S A S S O C I AT E D W I T H H Y P O T H E S E S

• Figure (a) all hypotheses have the same probability.


• Figures (b) and (c), As training data accumulates, the posterior probability for inconsistent hypotheses becomes zero
while the total probability summing to 1 is shared equally among the remaining consistent hypotheses.
MAP HYPOTHESES AND CONSISTENT LEARNERS
A learning algorithm is a consistent learner if it outputs a hypothesis that commits zero errors over the training
examples. Every consistent learner outputs a MAP hypothesis, if we assume a uniform prior probability distribution
over H (P(hi ) = P(hj ) for all i, j), and deterministic, noise free training data (P(D|h) =1 if D and h are consistent, and
0 otherwise)

Example:
• FIND-S outputs a consistent hypothesis, it will output a MAP hypothesis under the probability distributions P(h) and
P(D|h) defined above.
• Are there other probability distributions for P(h) and P(D|h) under which FIND-S outputs MAP hypotheses? Yes.
• Because FIND-S outputs a maximally specific hypothesis from the version space, its output hypothesis will be a MAP
hypothesis relative to any prior probability distribution that favours more specific hypotheses.
• Bayesian framework is a way to characterize the behaviour of learning algorithms
• By identifying probability distributions P(h) and P(D|h) under which the output is a optimal hypothesis, implicit
assumptions of the algorithm can be characterized (Inductive Bias)
• Inductive inference is modelled by an equivalent probabilistic reasoning system based on Bayes theorem
THANK YOU

You might also like