0% found this document useful (0 votes)

0 views

CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4(1)

The document discusses probabilistic reasoning in artificial intelligence, particularly in the context of medical diagnosis. It highlights the challenges of using first-order logic due to the complexity and uncertainty in medical conditions, emphasizing the importance of probability theory to manage degrees of belief. Key concepts such as random variables, atomic events, probability distributions, and conditional probabilities are explained to illustrate how to infer probabilities and make decisions under uncertainty.

Uploaded by

Erhan Tiryaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4(1)

Uploaded by

Erhan Tiryaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 58

Artificial Intelligence

CS 6364

Section 12
Probabilistic Reasoning
Acting under uncertainty
 Logical agents assume propositions are
- True
- False
- Unknown  acting under uncertainty
Example: diagnosis (for medicine)  dental diagnosis using 1st order
logic:
Wrong if one think that all patients with toothaches must have
cavities.
Why?
not all patients p with toothaches have cavities!
p Symptom( p, Toothache)  Disease( p, Cavity)
Some have gum disease, an abscess or other problems

Conclusion: To make the rule true, we have to add an almost

unlimited list( pof
p Symptom possible
, Toothache )causes
Disease ( p, Cavity )  Disease ( p, GumDisease)  Disease ( p, Abscess )
2
Medical Diagnosis
Trying to use first-order logic to cope with a domain like
medical diagnosis fails because
1. Laziness – too much work to list the complete list of rules
+ too hard to use such rules
Example of causal rule:
p Disease( p, Cavity)  Symptom( p, Toothache)

wrong, not all cavities cause pain  need to augment

the antecedent with all conditions that cause toothaches
2. Theoretical ignorance: Medical science has no complete
theory for the domain
3. Practical ignorance: Even if we know all the rules, we
might be uncertain about a particular patient, because
not all necessary tests have been or can be run (too
costly or too time-consuming)

3
Degree of belief
 When propositions are not known to be true or false, the agent can at
best provide a degree of belief in relevant sentences.
 The main tool for dealing with degrees of belief is the Probability Theory

function DT-AGENT(percept) returns an action

static belief_state, probabilistic beliefs about the current state of the world
action, the agent’s action

1. update belief_state based on action and percept

2. calculate outcome probabilities for actions,
given action descriptions and current belief_state
3. select action with highest expected utility
given probabilities of outcomes and utility information

return action
4
Basic Probability Notation
 Random Variables – thought as referring to a “part” of the
world whose “status” is unknown
 Random variables have domains  values they may take.
Depending on the domain, random variables may be
classified as
- Boolean random variables  have the domain <true,
false>
Example: Cavity = true Cavity = false (cavity)
- Discrete random variables  take values from a
countable domain
- Continuous random variables  values from the real
numbers
Example: the proposition x=4.02 asserts that the
random variable x has the exact value 4.02. We can
also have propositions that use inequalities like x4.20
5
Atomic Events
 An atomic event is a complete specification of the state
of the world about which the agent is uncertain. If the
world is described by a set of random variables, an
atomic event is a particular assignment of values to the
random variables

Example: 2 random variables: Cavity and Toothache

How many atomic events? 4
E1 (cavity  false)  (toothache  false)
E2 (cavity  false)  (toothache true)
E3 (cavity true)  (toothache  false)
E4 (cavity true)  (toothache true)

Properties:
a) Mutually exclusive: at most one can be true
b) The set of all possible atomic events is exhaustive
6
Axioms of Probability
1. 0 ≤ P(a) ≤ 1
2. P(true) = 1, P(false) = 0
3. P(a  b) = P(a) + P(b) – P(a  b)

We can deduce P(a) = 1 – P(a)

because P(true) = P(a) + P(a) – P(false)

P(true) = P(a  a) = 1

P(false) = P(a  a) = 0

P(a) + P(a) = 1

7
Prior probability
 Prior probability (or unconditional probability) associated
with proposition a is the degree of belief accorded to it in the
absence of any other information
Example: p(cavity = true) = 0.1
 Important: P(a) can be used only when there is no other
information. As soon as some new information is known, we
must reason with the conditional probability of a, given that
new information
 Sometimes we are interested in all possible values of a
random variable  use expressions such as P(weather)
which denotes a vector of values for the probabilities of each
individual state of the weather
Example: P(weather = sunny) = 0.7
P(weather = rain) = 0.2
P(weather = cloudy) = 0.08
P(weather = snow) = 0.02
also written as:
P(weather) = <0.7, 0.2, 0.08, 0.02>
8
Probability distributions
Example 1:
Hair_color is a discrete random variable.
It has the domain <blond, brown, red, black, white, none>
Out of a sample of 10000 people, we find that 1872 had
blond hair, 4325 had brown hair, 2135 had black hair, 652
had red hair, 321 had white hair and the rest were bald
The probability distribution is:
P(Hair_color) = <0.1872, 0.4325, 0.652, 0.2135, 0.0321,
0.0721>

Example 2:
UTD_student is a binary random variable  the domain is
True or False
Out of a sample of 1000 youngsters between 19 and 21 in
the Dallas area, 321 were students at UTD
P(UTD_student) = <0.321, 0.679> 9
Full Joint Probability Distribution
Suppose the world consists only of the variables
Cavity Toothache Weather 2x2x4=16

2 values 2 values 4 values entries for the

(binary (binary random (sunny, rain joint
random variable) cloudy, snow) distribution
variable)
P(Cavity) = <0.23,0.77>
P(Toothache) = <0.35,0.65>
P(Weather) = <0.7,0.2,0.008,0.02>
Represent the joint probability distribution:
We need to
Cavity Cavity
know to
Weathe Toothache Toothache Toothach Toothache
e
compute
r
conditional
Sunny ? ? ? ?
probabilities
Rain ? ? ? ?
Cloudy ? ? ? ?
Snow ? ? ? ?
10
Probability Density Functions
 For continuous variables, it is not possible to write
out the entire distribution as a table, because there
are infinitely many values.

Instead, we define the probability that a random

variable takes on some value x as a parameterized
function of x.

 Example: Let the random variable X denote

tomorrow’s maximum rain fall in Dallas.

 The sentence P(X=x) = U[1,3](x) expresses the

belief that X is distributed uniformly between 1in
and 3in
11
12
Conditional Probability
 When new evidence concerning a previously
unknown random variable is found, prior
probabilities no longer apply.
 We use conditional probabilities:
- If a – a random variable
- b – a random variable
- P(a |b) denotes “the probability of a, given that all
we know is b”

 Example: P(cavity |toothache) = 0.8

P ( a  b)
 How do we compute P(a |b)? P ( a | b) 
P (b)

 Same rule: P(a  b) = P(a |b)P(b) => the product rule!

13
Conditional Distributions
 If two random variables X and Y define the world,
P(X|Y) gives the values for P(X=x i|Y=yi) for each
possible i and j.

 Expressed with the product rule, this becomes:

P ( a  b)
P ( a | b) 
P (b)
P(X=x1  Y=y1) = P(X=x1|Y=y1)P(Y=y1)
P(X=x1  Y=y2) = P(X=x1|Y=y2)P(Y=y2)


 This can be combined in a single equation:

P(X,Y) = P(X|Y)P(Y)
 This denotes a set of equations relating the
corresponding individual entries in the tables (not a
matrix multiplication of the tables). 14
Inference Using Full Joint
Distribution
Example: The domains consist of three boolean variables: Toothache, Cavity
and Catch (the dentist’s nasty steel probe catches cavity in my tooth)
Toothache Toothache
catch catch catch catch
Cavity 0.108 0.012 0.072 0.008
Cavit 0.016 0.064 0.144 0.576
y
How many atomic events? 23 = 8 (as many entries as in the table!)

e1 (cavity=false)(toothache=false)(catch=false) P(e1)=0.576
e2 (cavity=false)(toothache=false)(catch=true) P(e2)=0.144
e3 (cavity=false)(toothache=true)(catch=false) P(e3)=0.064
e4 (cavity=false)(toothache=true)(catch=true) P(e4)=0.016
e5 (cavity=true)(toothache=false)(catch=false) P(e5)=0.008
e6 (cavity=true)(toothache=false)(catch=true) P(e6)=0.072
e7 (cavity=true)(toothache=true)(catch=false) P(e7)=0.012
15
e8 (cavity=true)(toothache=true)(catch=tr P(e8)=0.10
Inferring Probabilities
Given any proposition a, we can derive its probability as the
sum of the probabilities of the atomic events in which it
holds P ( a )   P (ei )
e e ( a ) i

ExampleToothache
1: a = cavity  toothache
Toothache a = cavity  toothache
catc catc catch catch Six events: e5, e6, e7, e8, e3, e4
h h
Cavity 0.10 0.012 0.072 0.008 P(cavity) = 0.108+0.012+0.072+0.008=0.2
8
Cavity 0.01 0.064 0.144 0.576
e1 6
(cavity=false)(toothache=false)(catch=fa P(e1)=0.57
lse) 6
e2 (cavity=false)(toothache=false)(catch=tr P(e2)=0.14
When adding
ue) all probabilities
4
in a row we obtain
e3 (cavity=false)(toothache=true)(catch=fal P(e3)=0.06 the unconditional
se) 4 or marginal
e4 (cavity=false)(toothache=true)(catch=tru P(e4)=0.01 probability
e) 6
16
e5 (cavity=true)(toothache=false)(catch=fal P(e5)=0.00
Probabilistic Inference by
Enumeration
 Given a full join distribution to work with, ENUMERATE-JOIN-ASK is
a complete algorithm for answering probabilistic queries for
discrete variables

17
Probabilistic Inference by
Enumeration
 Problems: domain described by n Boolean
variables
 Table of size O(2n) and O(2n) time to
process it!

 Full joint distribution in tabular form is not a

practical tool for building reasoning systems.

 It can be viewed as the theoretical foundation on

which more effective approaches may be built.

18
19
Marginalization and Conditioning
Rules
 Given any two random variables Y and Z,

P (Y )  P (Y , Z ) marginalization rule
z

 A variant involves conditional probabilities, instead of joint

probabilities:
conditioning rule P (Y )  P (Y | Z )P ( Z )
z

P(cavity) = 0.2
Using it to compute: P(cavity)=0.8

P(toothache)=0.108+0.012+0.016+0.06=0.2
Using it to compute: P(toothache)=0.8

P(catch)=0.108+0.016+0.072+0.14=0.34
Using it to compute: P(catch)=0.66 20
Marginalization and Conditioning
Rules
 If we have three random variables X, Y and Z
 P ( X )   P ( X , y, z ) marginalization rule
y z

P ( a , b, c ) P ( a , b, c )
P (a | bc)  
P (b, c) P (b | c) P (c)

 P ( a, b, c ) P ( a | bc ) P (b | c ) P (c )
21
P ( a, b)
P ( a | b) 
P (b)

Conditional Probabilities P ( a, b) P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

P (cavity  toothache)
P (cavity | toothache)  toothache toothache
P (toothache) catch catch catch catch

0.108  0.012 cavity 0.108 0.012 0.07 0.008

 0.6 2
0.108  0.012  0.016  0.064 cavity 0.016 0.064 0.14 0.576
4
P( cavity  toothache)
Also : P( cavity | toothache) 
P(toothache)
0.016  0.064 Added to 1
 0.4
0.108  0.012  0.016  0.064 P ( a | b)  P ( a | b) 1

 Notice: P(toothache) remains the same in both calculations  it

acts like a normalization constant for the distribution P(cavity|
toothache) with P( cavity|toochache), added to a sum of 1.
22
P ( a, b)
P ( a | b) 
P (b)

Normalization Constants P ( a, b) P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 - normalization constant

P(Cavity | toothache)
=> P(Cavity, toothache)
=> [ P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]
=> [ <0.108, 0.016> + <0.012, 0.064> ] = [ <0.12, 0.08> ]

toothache toothache P(cavity, toothache, catch)

catc catch catc catch
h h + P(cavity, toothache,
cavity 0.10 0.012 0.07 0.008 catch)]
8 2
cavity 0.01 0.064 0.14 0.576
6 4 P(cavity, toothache, catch)
+ P(cavity, toothache,
P ( a | b)  P ( a | b) 1
catch)]
Normalization Constants are useful shortcuts in many probability computations!

23
P ( a, b)
P ( a | b) 
P (b)

Normalization Constants P ( a, b) P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 - normalization constant

P(Cavity | toothache)
=> P(Cavity,toothache)
 [ P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]
=> [ <0.108, _____> + <0.012, _____> ] = [ <0.12, ____> ]
=> [ <0.108, 0.016> + <0.012, 0.064> ] = [ <0.12, 0.08> ]

toothache toothache That is, (0.12 + 0.08) = 1

(0.2) = 1
catc catch catc catch
h h 0.2 = 1/
cavity 0.10 0.012 0.07 0.008  = 1/0.2 = 5
8 2
cavity 0.01 0.064 0.14 0.576
Normalization Constants are
6 4
Thus 0.12 = 50.12 = 0.6
useful shortcuts in many
probability computations! for P(Cavity | toothache)

and 0.08 = 50.08 = 0.4

for P(Cavity | toothache)24
P ( a, b)
P ( a | b) 
P (b)

General inference Procedure P ( a, b) P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 A notation:
- X is a query variable (Cavity in the example)
- E is the set of evidence variables (Toothache in the
example)
- e are the observed values for the evidence
 The query: P(X| e)
 Evaluated as
P ( X | e) P ( X , e)   P ( X , e, y )
y

P(Cavity | toothache)
=>  with P(Cavity, toothache) and P(Cavity, toothache)
=>  with P(Cavity, toothache) and P(Cavity, toothache) with catch or not
P( Cavity, toothache, catch) + P( Cavity, toothache, catch) ]
vs P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]

25
26
Independence
toothache toothache
catch catch catch catch
cavity 0.108 0.012 0.072 0.008
cavity 0.016 0.064 0.144 0.576

 Let us add a fourth variable  Weather (with 4 values for Weather)

 The full distribution becomes P(Toothache, Catch, Cavity, Weather)
which has 32 entries (8 before 4 values for Weather), assuming
Weather has 4 values.
This table contains four “editions” of the full table, one for each kind of
weather.
 What relations do these editions have to each other and to the original
3-variable table?
 How are P(toothache, catch, cavity, Weather=cloudy) and
P(toothache, catch, cavity) related?
 Use the product rule: P(toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy | toothache, catch, cavity) *
P(toothache,catch,cavity)
 If Weather is independent of the others, then
P(Weather=cloudy | toothache, catch, cavity) =
P(Weather=cloudy) 27
Absolute Independence
 Weather is independent of one’s dental problems.
P(Toothache, Catch, Cavity, Weather)
= P(Toothache, Catch, Cavity) P(Weather)
 The 32-element table can be constructed
from one 8-element table and one 4-element table.

28
Independence in Equations
 If propositions a and b are independent

 P (a ) P (b) 
P(a  b) = P(a) P(b)  P (a | b)  
P(a | b) = P(a)  P (b) 

 Independence between variables X and Y is written:

P( X, Y ) = P( X ) P( Y )
P( X | Y ) = P( X )
P( Y | X ) = P( Y )

29
P ( a, b)
P ( a | b) 
P (b)

Bayes Rule P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 From the product rule: P(a  b) = P(a | b) P(b) = P(b | a) P(a)

 Then: P(a | b) P(b) = P(b | a) P(a)

P ( a | b) P (b)
P (b | a )  Bayes’ Rule
P(a)

 For multi-valued variables, Bayes’ Rule is:

P ( X | Y ) P (Y )
P (Y | X ) 
P( X )

a set of equations, each dealing with specific values of the

variables.

30
P ( a, b)
P ( a | b) 
P (b)

Examples P ( a, b)  P ( a | b) P (b)
P ( a | b)  P ( a | b) 1

P (toothache | cavity ) P (cavity )

P (cavity | toothache) 
P (toothache)

P(toothache |  cavity) P( cavity)

P( cavity | toothache) 
P(toothache)

P ( toothache | cavity) P (cavity)

P (cavity |  toothache) 
P ( toothache)

P ( toothache |  cavity ) P ( cavity )

P ( cavity |  toothache ) 
P ( toothache )

31
32
Applying Bayes’ Rule: The Simple
Case
 Example: Medical diagnosis
 Meningitis is a disease caused by the inflammation of the protective membranes
covering the brain and spinal cord known as the meninges.

 A doctor knows that meningitis causes a stiff neck 50% of the time: P(s |
m)=0.5
The doctor also knows some unconditional facts:
the prior probability that the patient has meningitis is 1/50,000 
P(m)=1/50,000
the prior probability that any patient has a stiff neck is 1/20  P(s)=1/20

P ( m s )  P ( s m )  P ( m | s ) P ( s )  P ( s | m ) P ( m )

P ( s | m ) P ( m ) 0.5 1 / 50,000
P(m | s)   0.0002
P( s) 1 / 20

33
Only 1 in 5,000 patients with a stiff neck is expected to have meningitis
P ( a, b)
P ( a | b) 
P (b)

Another way P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

P( s | m) P(m) 0.5 1 / 50,000 small because

P(m | s) =  P(m , s) =  < P(s | m)P(m), P(s | m)P(m) >

As this can be obtained also from applying Bayes’ Rule with
normalization:
P(Y|X) =  P(X|Y)P(Y)
where  is a normalization constant needed to make the entries in P(Y|X)
34
Example of Bayes’ Rule with
Normalization
We have two discrete random variables:
 X describing weather conditions, with the domain:

X={sunny, rain, cloudy, snow}

 Y describing clothes: Y={t-shirt, long-sleeves, coat}

The distributions of X and Y are:

 X = <0.331, 0.26, 0.159, 0.25>
 Y = <0.5, 0.3, 0.2>

We also have values of joint probabilities:

P(t-shirt, sunny)=0.32 P(long-sleeves, sunny)=0.01 P(coat, sunny)=0.001
P(t-shirt, rain)=0.08 P(long-sleeves, rain)=0.15 P(coat, rain)=0.03
P(t-shirt, cloudy)=0.09 P(long-sleeves, cloudy)=0.05 P(coat, cloudy)=0.019
P(t-shirt, snow)=0.01 P(long-sleeves, snow)=0.09 P(coat, snow)=0.15

35
Example of Bayes’ Rule with
Normalization
 From these calculate conditional probabilities P(Y|X), that is
probability of weather given clothing

P(t-shirt | sunny)=P(t-shirt, sunny)/P(sunny)=0.32/ 0.331 = .967

P(long-sleeves | sunny)= P(long-sleeves, sunny)/P(sunny)= 0.01/0.331=
0.0302
P(coat | sunny)= P(coat, sunny)/P(sunny)= 0.001/0.331= 0.003
P(t-shirt | rain)= P(t-shirt, rain)/P(rain)= 0.08/0.26=0.307
P(long-sleeves | rain)= P(long-sleeves, rain)/P(rain)= 0.15/0.26=0.577
P(coat | rain)= P(coat, rain)/P(rain)= 0.03/0.26=0.1154
P(t-shirt | cloudy)= P(t-shirt, cloudy)/P(cloudy)= 0.09/0.159=0.566
P(long-sleeves | cloudy)= P(long-sleeves, cloudy)/P(cloudy)=
0.05/0.159=0.314
P(coat | cloudy)=P(coat, cloudy)/P(cloudy) = 0.019/0.159=0.1195
P(t-shirt | snow)= P(t-shirt, snow)/P(snow)= 0.01/0.25=0.04
P(long-sleeves | snow)= P(long-sleeves, snow)/P(snow)= 0.09/0.25=0.36
P(coat | snow)= P(coat, snow)/P(snow)= 0.15/0.25=0.6
36
Computing the Normalization Constant
 Bayes’s Rule:  trying to guess the weather from the clothes people
wear: P(X|Y) =  P(Y|X)P(X) =  P(X,Y)
P(X|Y) add up to 1 for each value of Y P ( a | b) 
P ( a, b)
P (b)
P ( a, b) P ( a | b) P (b)
1. P(sunny | t-shirt) = 1 P(t-shirt, sunny) =1P×0.32
( a | b)  P ( a | b) 1
2. P(rain | t-shirt) = 1 P(t-shirt, rain) =1×0.08
3. P(cloudy | t-shirt) = 1 P(t-shirt, cloudy) =1×0.09
4. P(snow| t-shirt) = 1 P(t-shirt, snow)=1×0.01
Since P(sunny or rain or cloudy or snow | t-shirt)
= 1 (0.32+0.08+0.09+0.01) = 1
so 1=1/0.5 = 2
1. P(sunny | t-shirt)=1P(t-shirt, sunny)=1×0.32= 2×0.32=0.64
2. P(rain | t-shirt)=1P(t-shirt, rain)=1×0.08=2×0.08=0.16
3. P(cloudy | t-shirt)=1P(t-shirt, cloudy)=1×0.09=2×0.09=0.18
4. P(snow | t-shirt)=1P(t-shirt, snow)=1×0.01=2×0.01=0.02

37
Computing the Normalization Constant
 Bayes’s Rule:  trying to guess the weather from the clothes
people wear: P(X|Y) =  P(Y|X)P(X) =  P(X,Y)
P(X|Y) add up to 1 for each value of Y P ( a, b)
P ( a | b) 
1. P(sunny | t-shirt)=1P(t-shirt, sunny)=1×0.32= 2×0.32=0.64 P (b)
2. P(rain | t-shirt)=1P(t-shirt, rain)=1×0.08=2×0.08=0.16 P ( a, b) P ( a | b) P (b)
3. P(cloudy | t-shirt)=1P(t-shirt, cloudy)=1×0.09=2×0.09=0.18P ( a | b)  P ( a | b) 1
4. P(snow | t-shirt)=1P(t-shirt, snow)=1×0.01=2×0.01=0.02
where 1(0.32+0.08+0.09+0.01) = 1 So 1=1/0.5 = 2

1. P(sunny | long-sleeves)=2P(long-sleeves, sunny)=2×0.01=3.33×0.01=0.033

2. P(rain | long-sleeves)=2P(long-sleeves, rain)=2×0.15=3.33×0.15=0.5
3. P(cloudy | long-sleeves)=2P(long-sleeves, cloudy)=2×0.05=3.33×0.05=0.167
4. P(snow | long-sleeves)=2P(long-sleeves, snow)P=2×0.09=3.33×0.09=0.3
where 2(0.01+0.15+0.05+0.09) = 1 So 2=1/0.3 = 3,333

n P(sunny | coat)=3P(coat, sunny)=3×0.001=5×0.001=0.005

n P(rain | coat)=3P(coat, rain)=3×0.03=5×0.03=0.15
n P(cloudy | coat)=3P(coat, cloudy)=3×0.019=5×0.019=0.095
n P(snow | coat)=3P(coat, snow)=3×0.15=5×0.15=0.75
where 3(0.001+0.03+0.019+0.15) = 1 So 3=1/0.2 = 5 38
P ( a, b)
P ( a | b) 
P (b)

Example - Summary P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1
X Joint Probabilities
sunny rain cloudy snow

t-shirt .32 .08 .09 .01 .5 P (X, Y)

long-sleeve .01 .15 .05 .09 .3
Y
coat .001 .03 .019 .15 .2

.331 .26 .159 .25

Y
t-shirt long- coat
weather X sleeve

sunny .967 .0302 .003 1

rain .307 .577 .1154 1 P (Y | X)

X
cloudy .566 .314 .1195 1
clothes Y
snow .04 .36 .6 1

Conditional Probabilities
X
sunny rain cloudy snow
clothes Y
t-shirt .64 .16 .18 .02 1 P (X | Y)
long-sleeve .033 .5 .167 .03 1
Y
coat .005 .15 .095 .75 1
weather X
Conditional Probabilities

39
40
P ( a, b)
P ( a | b) 
P (b)

More on Bayes’ rule P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 What happens if one has the conditional probability in one

direction but not the other?

Example: meningitis domain

The doctor knows that a stiff neck implies meningitis in 1 of

5000 cases

 the doctor has quantitative information in the diagnostic

direction from symptoms to causes. Lucky case  the
doctor has no need to use Bayes’s rule.

 Note: Unfortunately, diagnostic knowledge is often more

fragile than causal knowledge.

41
P ( a, b)
P ( a | b) 
P (b)

Causal Knowledge P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

Note: Unfortunately, diagnostic knowledge is often more fragile than

causal knowledge.
Why?  If there is a sudden epidemic of meningitis,
the prior probability of meningitis P(m) will go up.
Because of this, the doctor who designed the diagnostic probability
P(m | s) directly from statistical information, will not know how to
update P(m | s).

But
 P(m|s) should go up
P ( s | m) P ( m)
P(m | s)  proportionally with
P(m) P( s)
Important: P(s | m) is unaffected by epidemic because it reflects how
meningitis works (that is, stiff-neck case caused only by
meningitis).

Conclusions: Using Causal or model-based knowledge provides

robustness  feasible probabilistic reasoning
42
P ( a, b)
P ( a | b) 
P (b)

Combining Evidence P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 Until now, we considered probabilistic information available in the

form P(effect|cause)
single evidence
 What happens when there are multiple pieces of evidence?

Example: dentist domain

P(Cavity | toothache  catch) = <0.108,0.016>  <0.871, 0.129>
This will not scale up to larger numbers of variables
P(Cavity | toothache  catch) = P(toothache  catch |
Cavity)P(Cavity)
 we need to know the values of the conditional probabilities of
the conjunction toothache  catch for all values of Cavity

If we have n possible evidence variables (X rays, diet, oral

hygiene, …) these are 2n possible combinations of observed
values, and for each we need to know the conditional
probabilities.
43
P ( a, b)
P ( a | b) 
P (b)

Solution P ( a, b)  P ( a | b) P (b)
P ( a | b)  P ( a | b) 1

 Consider the notion of independence

 Three variables: Cavity, Toothache, Catch

 Which are independent?

- (Cavity, Catch)?  if the probe catches in the tooth, it

probably has a cavity and that probably causes a toothache

- (Toothache, Catch)?  if there is a cavity, there will be a

toothache, regardless of the probable catching the tooth

- (Toothache, Cavity)?  if there is a cavity, it might cause a

toothache, but toothaches are not only caused by cavities

44
45
P ( a, b)
P ( a | b) 
P (b)

Conditional Independence P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

P(toothache  catch | Cavity )  P(toothache|Cavity) P (catch | Cavity )

Conditional independence of toothache and catch given Cavity

effect1 given cause effect2 given cause cause

46
P ( a, b)
P ( a | b) 
P (b)

Computational Complexity P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

General definition of conditional independence

of two variables X and Y given a third variable Z:
P(X , Y | Z )  P(X | Z) P (Y | Z )
Example:
P(Toothach e, Catch | Cavity )  P(Toothac he|Cavity)P (Catch | Cavity )
We can now derive the decomposition:
P(Toothach e, Catch, Cavity ) 
P(Toothach e, Catch|Cavity)P(Cavity ) (product rule)
P(Toothache | Cavity) P(Catch | Cavity ) P (Cavity)
Table 1 Table 2 Table 3
Initial table-size (Toochache, Catch, Cavity) = 23-1 = 7 values (23 all, but
since they sum to 1, we do not need the last one)
Table1-size = 2 x (22-1) – 1 = 5 values
Table2-size = Table1-size
Table3-size = 21-1 = 1 the size of the representation grows as O(n) instead
of O(2n)
47
P ( a, b)
P ( a | b) 
P (b)

Separation (conditional independence)

P ( a, b)  P ( a | b) P (b)
P ( a | b)  P ( a | b) 1

P(Cause, Effect1 ,..., Effect n )  P(Cause) P ( Effecti | Cause)

P(Toothach e, Catch, Cavity ) 

 P (Toothache | Cavity ) P (Catch | Cavity ) P (Cavity )

Here Cavity separates Toothache and Catch because it is a

Cause to both of them! (with a naïve assumption: “Toothache and
Catch” are [or believed to be] conditionally independent).

Naïve Bayes model

It is called Naïve because it works surprisingly well
even when the effect variables are not conditionally
independent
48
49
P ( a, b)
P ( a | b) 
P (b)

The Wumpus World Revisited P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

After finding a breeze in both [1,2] and [2,1], the agent is stuck because there is
no safe place to explore.

1,4 2,4 3,4 4,4 Goal: compute the probability that the 3 neighboring
squares contain a pit
1,3 2,3 3,3 4,3 P(Pit [1,3]), P(Pit[2,2] ), P( Pit[3,1])

1,2 2,2 3,2 4,2 Information:

B - a pit causes a breeze in all neighboring squares
OK - each square other than [1,1] contains a pit with
1,1 2,1 B 3,1 4,1
probability 0.2
OK OK
Step 1: Identify random variables
Pi,j=1 if square [i,j] contains a pit

[i,j] = only for observed squares, e.g [1,1],[1,2],[2,1] so far

Bij = 1 if square [i,j] is breezy

Pij, Bij are boolean variables 50

Probabilistic Reasoning for the
Wumpus
Pij for pit
Next step: specify the full distribution:
Bij for breezy

P(P11 , P12 , P13 , P14 , P21 , P22 , P23 , P24 , P31 , P32 , P33 , P34 , P41 , P42 , P43 , P44 , B11 , B12 , B21 )
P( B11 , B12 , B21 | P11 ,..., P44 ) P( P11 ,..., P44 )

The prior probability of a pit configuration:

(assuming independence of each cell)
4, 4
P ( P11 ,..., P44 )   P ( Pij )
i , j 1,1

If there are n pits [where p(a cell having a pit)=0.2], then

n 16  n
P( P11 ,..., P44 ) (0.2) (0.8)
51
P ( a, b)
P ( a | b) 
P (b)

Combining Evidence P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

The evidence: observed breeze in each square visited

+ each square visited contains no pit
b  b11  b12  b21 [1,2] and [2,1] have breeze
known  p11   p12   p21
How likely is that [1,3] contains a pit,
Query : P ( P13 | known, b) given the observation so far
1,4 2,4 3,4 4,4 1,4 2,4 3,4 4,4

1,3 2,3 3,3 4,3 1,3 2,3 3,3OTHER 4,3

QUERY

1,2 2,2 3,2 4,2 1,2 2,2 3,2 4,2

B B
OK OK
frontier
1,1 2,1 3,1 4,1 1,1 2,1 3,1 4,1
B B
KNOWN
OK OK OK OK
52
P ( a, b)
P ( a | b) 
P (b)

Answering the Query P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

 To answer P(P13 | known, breeze)

We sum over the entries from the full distribution.
 Let unknown be a composite variable consisting of the Pij
variables for squares other than known and the query square
[1,3]
P ( P13| known, b)   P( P13, unknown, known, b)
 unknown

How many squares? With 4x4=16 cells.

16-3-1=12  the summation contains: 212 = 4096 terms (too
known
many!)
query

53
P ( a, b)
P ( a | b) 
P (b)

Careful Computation P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

Why would the contents of [4,4] affect whether [1,3] has a pit?
 Let us consider the set Frontier made of variables of the visited
squares. Frontier: Frontier = {[2,2],[3,1]}
 Let us also consider the set Other containing variables of the
unknown squares (10 of them)
 The observed breezes are conditionally independent of all other
variables, given the known, frontier, and query variables.

P ( P13 | known, b)   P( P
unknown
13 , known, b, unknown)

  P(b | P
unknown
13 , known, unknown) P ( P13 , known, unknown)

   P(b | known, P
frontier other
13 , frontier , other) P ( P13 , known, frontier , other)

   P(b | known, P
frontier other
13 , frontier ) P ( P13 , known, frontier , other)

The final step uses conditional independence: “b” is independent of “other”

given “known”, “P13”, and “frontier”. 54
P ( a, b)
P ( a | b) 
P (b)

Continue Computation P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

P( P13 | known, b)    P(b | known, P , frontier) P( P , known, frontier, other)

frontier other
13 13

The first term in this expression does not depend on the other variables
 we can move the summation inwards
 
frontier
P (b | known , P13 , frontier )  P ( P13 , known , frontier , other )
other

The prior term can be factored and then the terms can be reordered:

P ( P13 | known, b)  
frontier
P (b | known, P13 , frontier ) 
other
P ( P13 ) P (known) P ( frontier ) P (other )

 P ( known) P ( P13 ) 
frontier
P (b | known, P13 , frontier) P ( frontier)  P (other)
other

This equals 1
 ' P ( P13 )  P(b | known, P , frontier ) P( frontier )
13
frontier

55
P ( a, b)
P ( a | b) 
P (b)

Finishing P ( a, b)  P ( a | b) P (b)
P ( a | b)  P ( a | b) 1

P ( P13 | known, b)  ' P ( P13 )  P(b | known, P , frontier ) P( frontier )

frontier
13

frontier {[2,2], [3,1]} ... if both [2,2] & [3,1] have pits 0.2x0.2 0.04

Case (a) [1,3] with a Case (b) [1,3] without a

pit pit
P13 P22 P31
How do we build models for the Frontier?
1 1 1 Model1
Since B12 and B21 (signs for a pit in neighbor) 0.2x0.2x0.2
(a) 1 1 0 Model2 0.2x0.2x0.8
we may have a pit in P13,P22 or P31
1 0 1 Model3 0.2x0.8x0.2
0 1 1 Model4 0.8x0.2x0.2
(a)the cases when [1,3] has a pit (P13=1) (b) 0.8x0.2x0.8
0 1 0 Model5 56
P ( a, b)
P ( a | b) 
P (b)

Likelihood of Pit at [1,3] P ( a, b)  P ( a | b) P (b)

P ( a | b)  P ( a | b) 1

P ( P13 | known, b)  ' P ( P13 )  P(b | known, P , frontier ) P( frontier )

frontier
13

Model 1,2&3 for P13=1

Model 4,5 for P13=0

P(P13 | known, b) = ’ <0.2 x (0.04 + 0.16 + 0.16), 0.8 x (0.04+0.16)>

P(P13) Model1 Model2 Model3 P(P13) Model4 Model5

= ’ <0.2 x 0.36, 0.8 x 0.2>

= ’ <0.072, 0.016>

Because ’ ( 0.072 + 0.16 ) = ’ x (0.232) = 1

So ’ = 1/0.232 = 4.3103448
 <0.3103, 0.6897> 57
P ( a, b)
P ( a | b) 
P (b)

Interpretation P ( a, b)  P ( a | b) P (b)
P ( a | b)  P ( a | b) 1

 From P(P13| known,b) = <0.3103, 0.6897>

 We know that [1,3] contains a pit roughly 31% in probability.
(Likewise, [3,1] by symmetry) contains a pit roughly 31%)
 Similarly (as we compute P(P22 | known, b)),
P(P22|known,b) contains a pit roughly 86% probability
 That is, the agent should avoid [2,2]!

Lessons:
 Seemingly complicated problems can be formulated precisely
in probability theory and solved using simple algorithms
 Efficient solutions are obtained when independence and
conditional independence relationships are used to simplify
the summations
 Independence corresponds to our natural understanding of
how the problem should be decomposed
58

12 Uncertainty
No ratings yet
12 Uncertainty
39 pages
Ch-5 Uncertain Knowledge and Reasoning
No ratings yet
Ch-5 Uncertain Knowledge and Reasoning
25 pages
L11a Uncertainty171105
No ratings yet
L11a Uncertainty171105
25 pages
Chapt13 Uncertainty
No ratings yet
Chapt13 Uncertainty
39 pages
Uncertainty: CSE-345: Artificial Intelligence
No ratings yet
Uncertainty: CSE-345: Artificial Intelligence
30 pages
Uncertainty
No ratings yet
Uncertainty
27 pages
Module 2
No ratings yet
Module 2
12 pages
Uncertainty Inference
No ratings yet
Uncertainty Inference
38 pages
UNIT-4_New
No ratings yet
UNIT-4_New
79 pages
IT8601 unitIV
No ratings yet
IT8601 unitIV
47 pages
Lecture Quantifying Uncertainty
No ratings yet
Lecture Quantifying Uncertainty
40 pages
Unit 5
No ratings yet
Unit 5
37 pages
AI - Unit 4 - Part 3 - Uncertainty - Probabilty Basics - Bayes Rule
No ratings yet
AI - Unit 4 - Part 3 - Uncertainty - Probabilty Basics - Bayes Rule
40 pages
uncertainty-probabilty
No ratings yet
uncertainty-probabilty
25 pages
Pert13 - Quantifying Uncertainty
No ratings yet
Pert13 - Quantifying Uncertainty
32 pages
TTNT 07 QUANTIFYING UNCERTAINTY
No ratings yet
TTNT 07 QUANTIFYING UNCERTAINTY
35 pages
Lecture 5 Bayesian Classification 3
No ratings yet
Lecture 5 Bayesian Classification 3
103 pages
Module 4 - v1
No ratings yet
Module 4 - v1
84 pages
13.Uncertainty
No ratings yet
13.Uncertainty
31 pages
Module 2
No ratings yet
Module 2
17 pages
04 - Probability in AI
No ratings yet
04 - Probability in AI
169 pages
AI UNIT 4 _ 18-3-2024
No ratings yet
AI UNIT 4 _ 18-3-2024
52 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
M2
No ratings yet
M2
9 pages
M05_01 Quantifying Uncertainty
No ratings yet
M05_01 Quantifying Uncertainty
20 pages
Lecture 29
No ratings yet
Lecture 29
65 pages
Quantifying Uncertainty: Week 5
No ratings yet
Quantifying Uncertainty: Week 5
38 pages
Chapter12 4e
No ratings yet
Chapter12 4e
23 pages
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
59 pages
UNIT-4 Uncertainty in Artificial Intelligence
No ratings yet
UNIT-4 Uncertainty in Artificial Intelligence
38 pages
Chapter13 PDF
No ratings yet
Chapter13 PDF
34 pages
Topic - 8 (Uncertainty)
No ratings yet
Topic - 8 (Uncertainty)
25 pages
Topic - 8 (Uncertainty)
No ratings yet
Topic - 8 (Uncertainty)
25 pages
PPT05-Quantifying Uncertainty
No ratings yet
PPT05-Quantifying Uncertainty
39 pages
Ch13 Uncertainty Converted
No ratings yet
Ch13 Uncertainty Converted
36 pages
6-Uncertainty
No ratings yet
6-Uncertainty
45 pages
Outline of The Course: Unknown
No ratings yet
Outline of The Course: Unknown
26 pages
Module 5
No ratings yet
Module 5
65 pages
Ai Unit Iv
No ratings yet
Ai Unit Iv
13 pages
Quantifying Uncertainity
No ratings yet
Quantifying Uncertainity
8 pages
13-LIKELIHOOD, PRIOR, POSTERIOR, MARGINAL Probability, Problems Solved Appliying Bayes Rule-11-03-2024
No ratings yet
13-LIKELIHOOD, PRIOR, POSTERIOR, MARGINAL Probability, Problems Solved Appliying Bayes Rule-11-03-2024
43 pages
Lecture7 -Probabilistic Reasoning (Updated)
No ratings yet
Lecture7 -Probabilistic Reasoning (Updated)
59 pages
ProbabilityStatitic Review
No ratings yet
ProbabilityStatitic Review
41 pages
Fuzzy
No ratings yet
Fuzzy
27 pages
mid2
No ratings yet
mid2
211 pages
UNIT II
No ratings yet
UNIT II
51 pages
Module-5 complete notes-Quantifying Uncertainty 20th February 2024
No ratings yet
Module-5 complete notes-Quantifying Uncertainty 20th February 2024
66 pages
FALLSEM2023-24 CSE3013 ETH VL2023240103712 2023-08-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE3013 ETH VL2023240103712 2023-08-01 Reference-Material-I
34 pages
ch13 Uncertainty
No ratings yet
ch13 Uncertainty
26 pages
6-Module 4 Reasoning With Uncertainty-15-03-2024
No ratings yet
6-Module 4 Reasoning With Uncertainty-15-03-2024
80 pages
CS115 Probability (4)
No ratings yet
CS115 Probability (4)
41 pages
Unit-4 Uncertainity
No ratings yet
Unit-4 Uncertainity
62 pages
Chapter13 Uncertainty
No ratings yet
Chapter13 Uncertainty
49 pages
Unit-4 Uncertainty
No ratings yet
Unit-4 Uncertainty
49 pages
Uncertainty F23 Part1
No ratings yet
Uncertainty F23 Part1
44 pages
Mod02 Intro Probability
No ratings yet
Mod02 Intro Probability
49 pages
Topic - 7 (Uncertainty)
No ratings yet
Topic - 7 (Uncertainty)
25 pages
Chapter Five AI
No ratings yet
Chapter Five AI
30 pages
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Bell Delaware Math Cad Example
100% (1)
Bell Delaware Math Cad Example
8 pages
The Dynamics of Beech Roundwood Prices in Selected Central European Markets
No ratings yet
The Dynamics of Beech Roundwood Prices in Selected Central European Markets
22 pages
chapter 8 -STATISTICAL method by S. P. Gupta
No ratings yet
chapter 8 -STATISTICAL method by S. P. Gupta
49 pages
Sst4e Tif 08 PDF
No ratings yet
Sst4e Tif 08 PDF
9 pages
Use 1: To Summarize Data With Central Values
No ratings yet
Use 1: To Summarize Data With Central Values
34 pages
Linkages Between Solar Activity, Climate Predictability and Water Resource Development
100% (1)
Linkages Between Solar Activity, Climate Predictability and Water Resource Development
13 pages
Potato Friction Coefficient
No ratings yet
Potato Friction Coefficient
1 page
Data Errors
No ratings yet
Data Errors
5 pages
6 Sales Forecasting PDF
No ratings yet
6 Sales Forecasting PDF
6 pages
Chapter Three Research Methodology 3.1 Research Design
No ratings yet
Chapter Three Research Methodology 3.1 Research Design
7 pages
Chapt 4 Association
No ratings yet
Chapt 4 Association
6 pages
Applied Geochemistry: Iris R. Pit, Jasper Grif Fioen, Martin J. Wassen
No ratings yet
Applied Geochemistry: Iris R. Pit, Jasper Grif Fioen, Martin J. Wassen
18 pages
Noah, Ioseph, and Operational Hydrology: Yorktown Heights, New York 10598
No ratings yet
Noah, Ioseph, and Operational Hydrology: Yorktown Heights, New York 10598
10 pages
Qa Imps
No ratings yet
Qa Imps
2 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
Exercise: Specific Heat of Solid
No ratings yet
Exercise: Specific Heat of Solid
3 pages
Allcott, H. (2011).
No ratings yet
Allcott, H. (2011).
14 pages
ECE 313 Course Notes: Probability With Engineering Applications
No ratings yet
ECE 313 Course Notes: Probability With Engineering Applications
188 pages
Cigre 029 Ultra-High-Voltage PDF
100% (1)
Cigre 029 Ultra-High-Voltage PDF
88 pages
Sta 301 Formulas
33% (3)
Sta 301 Formulas
10 pages
Flood Design Manual 4 Java and Sumatera
100% (1)
Flood Design Manual 4 Java and Sumatera
152 pages
Data Prep - Black Swan (Excel)
No ratings yet
Data Prep - Black Swan (Excel)
3 pages
Climate Changes Prediction Using Simple Linear Regression
No ratings yet
Climate Changes Prediction Using Simple Linear Regression
5 pages
COURSE CODE
No ratings yet
COURSE CODE
23 pages
Statistics in Research
100% (2)
Statistics in Research
26 pages
Applied Energy: Zhongfu Tan, Jinliang Zhang, Jianhui Wang, Jun Xu
No ratings yet
Applied Energy: Zhongfu Tan, Jinliang Zhang, Jianhui Wang, Jun Xu
5 pages
BBA 122 Notes On Estimation and Confidence Intervals
No ratings yet
BBA 122 Notes On Estimation and Confidence Intervals
34 pages
Application of Triple Exponential Smoothing Technique in The Analysis of Time Series With Quadratic Trend
No ratings yet
Application of Triple Exponential Smoothing Technique in The Analysis of Time Series With Quadratic Trend
4 pages
Asphatene Ppt. in Crude Oils PDF
100% (1)
Asphatene Ppt. in Crude Oils PDF
19 pages