CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4(1)
CS6364 Lecture12 - AI Ch13 Prob Reasoning - Rev4(1)
CS 6364
Section 12
Probabilistic Reasoning
Acting under uncertainty
Logical agents assume propositions are
- True
- False
- Unknown acting under uncertainty
Example: diagnosis (for medicine) dental diagnosis using 1st order
logic:
Wrong if one think that all patients with toothaches must have
cavities.
Why?
not all patients p with toothaches have cavities!
p Symptom( p, Toothache) Disease( p, Cavity)
Some have gum disease, an abscess or other problems
3
Degree of belief
When propositions are not known to be true or false, the agent can at
best provide a degree of belief in relevant sentences.
The main tool for dealing with degrees of belief is the Probability Theory
return action
4
Basic Probability Notation
Random Variables – thought as referring to a “part” of the
world whose “status” is unknown
Random variables have domains values they may take.
Depending on the domain, random variables may be
classified as
- Boolean random variables have the domain <true,
false>
Example: Cavity = true Cavity = false (cavity)
- Discrete random variables take values from a
countable domain
- Continuous random variables values from the real
numbers
Example: the proposition x=4.02 asserts that the
random variable x has the exact value 4.02. We can
also have propositions that use inequalities like x4.20
5
Atomic Events
An atomic event is a complete specification of the state
of the world about which the agent is uncertain. If the
world is described by a set of random variables, an
atomic event is a particular assignment of values to the
random variables
Properties:
a) Mutually exclusive: at most one can be true
b) The set of all possible atomic events is exhaustive
6
Axioms of Probability
1. 0 ≤ P(a) ≤ 1
2. P(true) = 1, P(false) = 0
3. P(a b) = P(a) + P(b) – P(a b)
P(a) + P(a) = 1
7
Prior probability
Prior probability (or unconditional probability) associated
with proposition a is the degree of belief accorded to it in the
absence of any other information
Example: p(cavity = true) = 0.1
Important: P(a) can be used only when there is no other
information. As soon as some new information is known, we
must reason with the conditional probability of a, given that
new information
Sometimes we are interested in all possible values of a
random variable use expressions such as P(weather)
which denotes a vector of values for the probabilities of each
individual state of the weather
Example: P(weather = sunny) = 0.7
P(weather = rain) = 0.2
P(weather = cloudy) = 0.08
P(weather = snow) = 0.02
also written as:
P(weather) = <0.7, 0.2, 0.08, 0.02>
8
Probability distributions
Example 1:
Hair_color is a discrete random variable.
It has the domain <blond, brown, red, black, white, none>
Out of a sample of 10000 people, we find that 1872 had
blond hair, 4325 had brown hair, 2135 had black hair, 652
had red hair, 321 had white hair and the rest were bald
The probability distribution is:
P(Hair_color) = <0.1872, 0.4325, 0.652, 0.2135, 0.0321,
0.0721>
Example 2:
UTD_student is a binary random variable the domain is
True or False
Out of a sample of 1000 youngsters between 19 and 21 in
the Dallas area, 321 were students at UTD
P(UTD_student) = <0.321, 0.679> 9
Full Joint Probability Distribution
Suppose the world consists only of the variables
Cavity Toothache Weather 2x2x4=16
e1 (cavity=false)(toothache=false)(catch=false) P(e1)=0.576
e2 (cavity=false)(toothache=false)(catch=true) P(e2)=0.144
e3 (cavity=false)(toothache=true)(catch=false) P(e3)=0.064
e4 (cavity=false)(toothache=true)(catch=true) P(e4)=0.016
e5 (cavity=true)(toothache=false)(catch=false) P(e5)=0.008
e6 (cavity=true)(toothache=false)(catch=true) P(e6)=0.072
e7 (cavity=true)(toothache=true)(catch=false) P(e7)=0.012
15
e8 (cavity=true)(toothache=true)(catch=tr P(e8)=0.10
Inferring Probabilities
Given any proposition a, we can derive its probability as the
sum of the probabilities of the atomic events in which it
holds P ( a ) P (ei )
e e ( a ) i
ExampleToothache
1: a = cavity toothache
Toothache a = cavity toothache
catc catc catch catch Six events: e5, e6, e7, e8, e3, e4
h h
Cavity 0.10 0.012 0.072 0.008 P(cavity) = 0.108+0.012+0.072+0.008=0.2
8
Cavity 0.01 0.064 0.144 0.576
e1 6
(cavity=false)(toothache=false)(catch=fa P(e1)=0.57
lse) 6
e2 (cavity=false)(toothache=false)(catch=tr P(e2)=0.14
When adding
ue) all probabilities
4
in a row we obtain
e3 (cavity=false)(toothache=true)(catch=fal P(e3)=0.06 the unconditional
se) 4 or marginal
e4 (cavity=false)(toothache=true)(catch=tru P(e4)=0.01 probability
e) 6
16
e5 (cavity=true)(toothache=false)(catch=fal P(e5)=0.00
Probabilistic Inference by
Enumeration
Given a full join distribution to work with, ENUMERATE-JOIN-ASK is
a complete algorithm for answering probabilistic queries for
discrete variables
17
Probabilistic Inference by
Enumeration
Problems: domain described by n Boolean
variables
Table of size O(2n) and O(2n) time to
process it!
18
19
Marginalization and Conditioning
Rules
Given any two random variables Y and Z,
P (Y ) P (Y , Z ) marginalization rule
z
P(cavity) = 0.2
Using it to compute: P(cavity)=0.8
P(toothache)=0.108+0.012+0.016+0.06=0.2
Using it to compute: P(toothache)=0.8
P(catch)=0.108+0.016+0.072+0.14=0.34
Using it to compute: P(catch)=0.66 20
Marginalization and Conditioning
Rules
If we have three random variables X, Y and Z
P ( X ) P ( X , y, z ) marginalization rule
y z
P ( X ) P ( X | y , z ) P ( y | z ) P ( z ) P ( a, b)
y z P ( a | b)
P (b)
P ( a, b) P ( a | b) P (b)
P ( a | b) P ( a | b) 1
From the product rule:
P ( a , b, c ) P ( a , b, c )
P (a | bc)
P (b, c) P (b | c) P (c)
P ( a, b, c ) P ( a | bc ) P (b | c ) P (c )
21
P ( a, b)
P ( a | b)
P (b)
P (cavity toothache)
P (cavity | toothache) toothache toothache
P (toothache) catch catch catch catch
- normalization constant
P(Cavity | toothache)
=> P(Cavity, toothache)
=> [ P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]
=> [ <0.108, 0.016> + <0.012, 0.064> ] = [ <0.12, 0.08> ]
23
P ( a, b)
P ( a | b)
P (b)
- normalization constant
P(Cavity | toothache)
=> P(Cavity,toothache)
[ P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]
=> [ <0.108, _____> + <0.012, _____> ] = [ <0.12, ____> ]
=> [ <0.108, 0.016> + <0.012, 0.064> ] = [ <0.12, 0.08> ]
A notation:
- X is a query variable (Cavity in the example)
- E is the set of evidence variables (Toothache in the
example)
- e are the observed values for the evidence
The query: P(X| e)
Evaluated as
P ( X | e) P ( X , e) P ( X , e, y )
y
P(Cavity | toothache)
=> with P(Cavity, toothache) and P(Cavity, toothache)
=> with P(Cavity, toothache) and P(Cavity, toothache) with catch or not
P( Cavity, toothache, catch) + P( Cavity, toothache, catch) ]
vs P(Cavity, toothache, catch) + P(Cavity, toothache, catch) ]
25
26
Independence
toothache toothache
catch catch catch catch
cavity 0.108 0.012 0.072 0.008
cavity 0.016 0.064 0.144 0.576
28
Independence in Equations
If propositions a and b are independent
P (a ) P (b)
P(a b) = P(a) P(b) P (a | b)
P(a | b) = P(a) P (b)
P( X, Y ) = P( X ) P( Y )
P( X | Y ) = P( X )
P( Y | X ) = P( Y )
29
P ( a, b)
P ( a | b)
P (b)
P ( a | b) P (b)
P (b | a ) Bayes’ Rule
P(a)
30
P ( a, b)
P ( a | b)
P (b)
Examples P ( a, b) P ( a | b) P (b)
P ( a | b) P ( a | b) 1
31
32
Applying Bayes’ Rule: The Simple
Case
Example: Medical diagnosis
Meningitis is a disease caused by the inflammation of the protective membranes
covering the brain and spinal cord known as the meninges.
A doctor knows that meningitis causes a stiff neck 50% of the time: P(s |
m)=0.5
The doctor also knows some unconditional facts:
the prior probability that the patient has meningitis is 1/50,000
P(m)=1/50,000
the prior probability that any patient has a stiff neck is 1/20 P(s)=1/20
P ( m s ) P ( s m ) P ( m | s ) P ( s ) P ( s | m ) P ( m )
P ( s | m ) P ( m ) 0.5 1 / 50,000
P(m | s) 0.0002
P( s) 1 / 20
33
Only 1 in 5,000 patients with a stiff neck is expected to have meningitis
P ( a, b)
P ( a | b)
P (b)
35
Example of Bayes’ Rule with
Normalization
From these calculate conditional probabilities P(Y|X), that is
probability of weather given clothing
37
Computing the Normalization Constant
Bayes’s Rule: trying to guess the weather from the clothes
people wear: P(X|Y) = P(Y|X)P(X) = P(X,Y)
P(X|Y) add up to 1 for each value of Y P ( a, b)
P ( a | b)
1. P(sunny | t-shirt)=1P(t-shirt, sunny)=1×0.32= 2×0.32=0.64 P (b)
2. P(rain | t-shirt)=1P(t-shirt, rain)=1×0.08=2×0.08=0.16 P ( a, b) P ( a | b) P (b)
3. P(cloudy | t-shirt)=1P(t-shirt, cloudy)=1×0.09=2×0.09=0.18P ( a | b) P ( a | b) 1
4. P(snow | t-shirt)=1P(t-shirt, snow)=1×0.01=2×0.01=0.02
where 1(0.32+0.08+0.09+0.01) = 1 So 1=1/0.5 = 2
Y
t-shirt long- coat
weather X sleeve
Conditional Probabilities
X
sunny rain cloudy snow
clothes Y
t-shirt .64 .16 .18 .02 1 P (X | Y)
long-sleeve .033 .5 .167 .03 1
Y
coat .005 .15 .095 .75 1
weather X
Conditional Probabilities
39
40
P ( a, b)
P ( a | b)
P (b)
41
P ( a, b)
P ( a | b)
P (b)
But
P(m|s) should go up
P ( s | m) P ( m)
P(m | s) proportionally with
P(m) P( s)
Important: P(s | m) is unaffected by epidemic because it reflects how
meningitis works (that is, stiff-neck case caused only by
meningitis).
Solution P ( a, b) P ( a | b) P (b)
P ( a | b) P ( a | b) 1
44
45
P ( a, b)
P ( a | b)
P (b)
46
P ( a, b)
P ( a | b)
P (b)
After finding a breeze in both [1,2] and [2,1], the agent is stuck because there is
no safe place to explore.
1,4 2,4 3,4 4,4 Goal: compute the probability that the 3 neighboring
squares contain a pit
1,3 2,3 3,3 4,3 P(Pit [1,3]), P(Pit[2,2] ), P( Pit[3,1])
P(P11 , P12 , P13 , P14 , P21 , P22 , P23 , P24 , P31 , P32 , P33 , P34 , P41 , P42 , P43 , P44 , B11 , B12 , B21 )
P( B11 , B12 , B21 | P11 ,..., P44 ) P( P11 ,..., P44 )
n 16 n
P( P11 ,..., P44 ) (0.2) (0.8)
51
P ( a, b)
P ( a | b)
P (b)
53
P ( a, b)
P ( a | b)
P (b)
Why would the contents of [4,4] affect whether [1,3] has a pit?
Let us consider the set Frontier made of variables of the visited
squares. Frontier: Frontier = {[2,2],[3,1]}
Let us also consider the set Other containing variables of the
unknown squares (10 of them)
The observed breezes are conditionally independent of all other
variables, given the known, frontier, and query variables.
P ( P13 | known, b) P( P
unknown
13 , known, b, unknown)
P(b | P
unknown
13 , known, unknown) P ( P13 , known, unknown)
P(b | known, P
frontier other
13 , frontier , other) P ( P13 , known, frontier , other)
P(b | known, P
frontier other
13 , frontier ) P ( P13 , known, frontier , other)
The first term in this expression does not depend on the other variables
we can move the summation inwards
frontier
P (b | known , P13 , frontier ) P ( P13 , known , frontier , other )
other
The prior term can be factored and then the terms can be reordered:
P ( P13 | known, b)
frontier
P (b | known, P13 , frontier )
other
P ( P13 ) P (known) P ( frontier ) P (other )
P ( known) P ( P13 )
frontier
P (b | known, P13 , frontier) P ( frontier) P (other)
other
This equals 1
' P ( P13 ) P(b | known, P , frontier ) P( frontier )
13
frontier
55
P ( a, b)
P ( a | b)
P (b)
Finishing P ( a, b) P ( a | b) P (b)
P ( a | b) P ( a | b) 1
frontier {[2,2], [3,1]} ... if both [2,2] & [3,1] have pits 0.2x0.2 0.04
Interpretation P ( a, b) P ( a | b) P (b)
P ( a | b) P ( a | b) 1
Lessons:
Seemingly complicated problems can be formulated precisely
in probability theory and solved using simple algorithms
Efficient solutions are obtained when independence and
conditional independence relationships are used to simplify
the summations
Independence corresponds to our natural understanding of
how the problem should be decomposed
58