0% found this document useful (0 votes)
4 views

Lecture 29

Uploaded by

brolysensei282
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 29

Uploaded by

brolysensei282
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Reasoning under Uncertainty

Instructors: Dr. Durgesh Singh


CSE Discipline, PDPM IIITDM, Jabalpur -482005
Reasoning under uncertainty

▪ Agents in the real world need to handle uncertainty, whether


due to partial observability, nondeterminism, or adversaries.
▪ An agent may never know for sure what state it is in now or
where it will end up after a sequence of actions.
Nature of Uncertain Knowledge

▪ Let us try to write rules for dental diagnosis using propositional


logic, so that we can see how the logical approach breaks down.
Consider the following simple rule:
Toothache ⇒ Cavity.
▪ The problem is that this rule is wrong.
▪ Not all patients with toothaches have cavities; some of them
have gum disease, swelling, or one of several other problems:
Toothache ⇒ Cavity ∨ GumProblem ∨ Swelling ∨ ……..
Nature of Uncertain Knowledge

▪ In order to make the rule true, we have to add an almost


unlimited list of possible problems. We could try turning the rule
into a causal rule:
Cavity ⇒ Toothache
But this rule is also not right; not all cavities cause pain.
Toothache and a Cavity are always not connected, so the
judgement may go wrong.
Nature of Uncertain Knowledge

▪ This is typical of the medical domain, as well as most other


judgmental domains: law, business, design, automobile repair,
gardening, dating, and so on.
▪ The agent’s knowledge can at best provide only a degree of
belief in the relevant sentences.
▪ Our main tool for dealing with degrees of belief is probability
theory.
▪ A logical agent believes each sentence to be true or false or has
no opinion, whereas a probabilistic agent may have a numerical
degree of belief between 0 (for sentences that are certainly
false) and 1 (certainly true).
Basic Probability Notation

▪ Random variables are typically divided into three kinds,


depending on the type of the domain:
▪ Boolean random variables, such as Cavity, have the domain
(true, false) or (1,0)
▪ Discrete random variables, take on values from a countable
domain. For example, the domain of Weather might be (sunny,
rainy, cloudy, snow).
▪ Continuous random variables (bounded or unbounded) take on
values from the real numbers. Ex: temp=21.4; temp<21.4 or
temp< 1.
Atomic events or sample points
▪ Atomic event: A complete specification of the state of the world
about which the agent is uncertain
▪ E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:
Cavity = false  Toothache = false
Cavity = false  Toothache = true
Cavity = true  Toothache = false
Cavity = true  Toothache = true
▪ Atomic events are mutually exclusive and exhaustive
▪ When two events are mutually exclusive, it means they cannot both occur at
the same time.
▪ When two events are exhaustive, it means that one of them must occur.
Axioms of Probability Theory

▪ All probabilities between 0 and 1


– 0 ≤ P(A) ≤ 1
– P(true) = 1
– P(false) = 0.
▪ The probability of disjunction is:
P(A B) = P(A)+ P(B)− P(A B)
Prior probability

▪ The unconditional or prior probability associated with a


proposition A is the degree of belief according to the absence of
any other information;
▪ It is written as P ( A ).
▪ For example, if the prior probability that I have a cavity is 0.1,
then we would write
P ( Cavity= true ) = 0.1 or P ( cavity ) = 0.1
▪ P ( A ) can be used only when there is no other information.
▪ As soon as some new information is known, we must reason with
the conditional probability of a given that new information.
Prior probability…
▪ Sometimes, we will want to talk about the probabilities of all the
possible values of a random variable.
▪ In that case, we will use an expression such as P(Weather), which
denotes a vector of values for the probabilities of each individual state
of the weather.
▪ Instead of writing these four equations
P ( Weather = sunny) = 0.7
P ( Weather= rain) = 0.2
P ( Weather= cloudy) = 0.08
P(Weather = snow) = 0.02
we may simply write: P( Weather) = (0.7,0.2,0.08,0.02) (Note that the
probabilities sum to 1 )
▪ This statement defines a prior probability distribution for the random
variable Weather.
Prior probability…
▪ Joint probability distribution for a set of random variables gives
the probability of every atomic event on those random variables
▪ P(Weather, Cavity) = a 4 × 2 matrix of values:

Weather = sunny rainy cloudy snow


Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08

▪ A full joint distribution specifies the probability of every atomic


event and is therefore a complete specification of one's
uncertainty about the world in question.
Conditional or posterior probability

▪ The notation used is P(a l b),where a and b are any proposition.


This is read as "the probability of a, given that all we know is b."
For example,
P(cavity l toothache) = 0.8
“indicates that if a patient is observed to have a toothache and no
other information is yet available, then the probability of the
patient's having a cavity will be 0.8.”
Conditional or posterior probability

▪ Conditional probabilities can be defined in terms of


unconditional probabilities.

P(a|b) = P (a ^ b)
P (b)

holds whenever P(b)>0


This equation can be written as
P(a^b) = P(a|b) * P(b) (which is called product rule)
Alternative way:
P(a^b) = P(b|a) * P(a)
Chain Rule/Product Rule
Example

A domain consisting of just the three Boolean variables


(the dentist’s nasty steel probe
catches in my tooth).
Inference Using Full Joint Distributions

P(toothache)=.108+.012+.016+.064
= .20 or 20%
Inference Using Full Joint Distributions

P(toothachecavity) = .20 + ??
.072 + .008
.28
Inference Using Full Joint Distributions
Problems with joint distribution ??
▪ Worst case time: O(dn)
▪ Where d = max arity
▪ And n = number of random variables
▪ Space complexity also O(dn)
▪ Size of joint distribution

19
Independence

▪ A and B are independent iff:

P( A | B) = P( A)
P( B | A) = P( B)

Therefore, if A and B are independent:


P( A  B)
P( A | B) = = P( A)
P( B)

P( A  B) = P( A) P( B)
Independence…

Complete independence is powerful but rare. What to do if it


doesn’t hold?
Conditional Independence
Conditional Independence

▪ The general definition of conditional independence of two


variables X and Y, given a third variable Z is

(I)

(II)
Conditional Independence II

P(catch | toothache, cavity) = P(catch | cavity)


P(catch | toothache,cavity) = P(catch |cavity)
Power of Cond. Independence

▪ Often, using conditional independence reduces the storage


complexity of the joint distribution from exponential to linear!!

▪ Conditional independence is the most basic & robust form of


knowledge about uncertain environments.
Bayes Rule
P( E | H ) P( H )
P( H | E ) =
P( E )
Simple proof from def of conditional probability:
P( H  E )
P( H | E ) = (Def. cond. prob.)
P( E )
P( H  E )
P( E | H ) = (Def. cond. prob.)
P( H )

P( H  E ) = P( E | H ) P( H ) (Mult by P(H) in line 2)

P( E | H ) P( H )
P( H | E ) = (Substitute #3 in #1)
P( E )
Use to Compute Diagnostic Probability from Causal
Probability

E.g. let M be meningitis, S be stiff neck


P(M) = 0.0001,
P(S) = 0.1,
P(S|M)= 0.8

P(M|S) =
Bayes Rule

▪ Does patient have cancer or not?

Given: A patient takes a lab test, and the result comes back
positive. The test returns a correct positive result in only 98% of
the cases in which the disease is present, and a correct negative
result in only 97% of the cases in which the disease is not present.
Furthermore, 0.008 of the entire population have this cancer.
Bayesian Networks

▪ In general, joint distribution over set of variables (X1, X1, ... , Xn)
requires exponential space for representation & inference.
▪ We also saw that independence and conditional independence
relationships among variables can greatly reduce the number of
probabilities that need to be specified in order to define the full
joint distribution.
▪ BNs(a graphical representation) is a data structure
▪ represents the dependencies among variables and
▪ give a concise specification of any full joint probability distribution
Chain rule in Bayesian Networks

The general assertion that, for every variable Xi in the


Bayesian network,
Bayes Networks

▪ A Bayesian network is a directed graph in which each node is


annotated with quantitative probability information.
▪ The full specification is as follows:
1. Each node corresponds to a random variable, which may be discrete
or continuous.
2. Directed links or arrows connect pairs of nodes. If there is an arrow
from node X to node Y , X is said to be a parent of Y.
3. Each node Xi, has a conditional probability distribution
P (Xi | Parents (Xi)) that quantifies the effect of the parents on the
node.
4. The graph has no directed cycles (and hence is a directed, acyclic
graph, or DAG).
Example

Topology of network encodes conditional independence


assertions:
Example: Burglar Alarm

▪ You have a new burglar alarm installed at home.


▪ It is reliable at detecting a burglary, but also responds on
occasion to minor earthquakes.
▪ You also have two neighbors, John and Mary, who have
promised to call you at work when they hear the alarm.
▪ John always calls when he hears the alarm, but sometimes
confuses the telephone ringing with the alarm and calls then,
too.
▪ Mary, on the other hand, likes loud music and sometimes misses
the alarm altogether.
Given the evidence of who has or has not called, we would like to
estimate the probability of a burglary.
Example: Burglar Alarm

Earthquake Burglary

Alarm

JohnCalls MarryCalls
Example Bayes Net: Burglar Alarm
▪ Notice that the network does not have nodes corresponding to
▪ Mary’s currently listening to loud music or
▪ The telephone ringing and confusing John.
▪ These factors are summarized in the uncertainty associated with
the links from Alarm to to JohnCalls and MaryCalls .
Conditional probability table, or CPT

▪ Each row in a CPT contains the conditional probability of each


node value for a conditioning case.
▪ A conditioning case is just a possible combination of values for
the parent nodes’
▪ Each row must sum to 1.
▪ For Boolean variables, once you know that the probability of a
true value is p , the probability of false must be 1-p, so we often
omit the second number.
▪ In general, a table for a Boolean variable with k Boolean parents
contains 2^k independently specifiable probabilities.
▪ A node with no parents has only one row, representing the prior
probabilities of each possible value of the variable.
Syntax of BNs

▪ a set of nodes, one per random variable


▪ a directed, acyclic graph (link ≈"directly influences")
▪ a conditional distribution for each node given its
parents: P (Xi | Parents (Xi))
▪ For discrete variables, conditional probability table (CPT)=
distribution over Xi for each combination of parent values
Example Bayes Net: Burglar Alarm
Earthquake Burglary
Burglar Alarm Example …
Alarm

JohnCalls MarryCalls

▪If I know if Alarm, no other evidence influences my degree


of belief in JohnCalls
▪ P(J|M,A,E,B) = P(J|A)
▪ also: P(M|J,A,E,B) = P(M|A) and P(E|B) = P(E)
▪By the chain rule we have
P(J,M,A,E,B) = P(J|M,A,E,B) ·P(M|A,E,B)· P(A|E,B) ·P(E|B) ·P(B)
= P(J|A) ·P(M|A) ·P(A|B,E) ·P(E) ·P(B)
▪Full joint requires only 10 parameters
BNs: Qualitative Structure

▪ Graphical structure of BN reflects conditional independence


among variables
▪ Each variable X is a node in the DAG
▪ Edges denote direct probabilistic influence
▪ parents of X are denoted Par(X)
▪ Each variable X is conditionally independent of all non
descendants, given its parents.
▪ Graphical test exists for more general independence
▪ “Markov Blanket”
Given Parents, X is Independent of Non-Descendants

Fig: A node X is conditionally independent of its non-descendants (e.g., the


Zij’ s) given its parents (the Uis shown in the gray area).
For Example

Earthquake Burglary

Alarm

JohnCalls MarryCalls
Example
Given Markov Blanket, X is Independent of
All Other Nodes

MB(X) = Par(X)  Childs(X)  Par(Childs(X))


Example
Conditional Probability Tables

Pr(B=t) Pr(B=f)
Earthquake Burglary 0.05 0.95

Pr(A|E,B)
e,b 0.9 (0.1)
e,b 0.2 (0.8)
Radio Alarm
e,b 0.85 (0.15)
e,b 0.01 (0.99)

Nbr1Calls Nbr2Calls
Conditional Probability Tables
▪ For complete spec. of joint dist., quantify BN

▪ For each variable X, specify CPT: P(X | Par(X))


▪ number of params locally exponential in |Par(X)|

▪ If X1, X2,... Xn is any topological sort of the network, then we are


assured:
P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)
… P(X2 | X1) · P(X1)
= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)
Exact Inference in BNs

▪ The graphical independence representation


▪ yields efficient inference schemes
▪ We generally want to compute
▪ Marginal probability: Pr(Z), or
▪ Pr(Z|E) where E is (conjunctive) evidence
▪ Z: query variable(s),
▪ E: evidence variable(s)
▪ everything else: hidden variable
▪ One simple algorithm:
▪ Inference by enumeration with variable elimination (VE)
Inference in BNs

▪ Let E be the list of evidence variables, let e be the list of


observed values for them, and let y be the remaining
unobserved variables (hidden variables). The query P(X | e) can
be evaluated as

where the summation is over all possible ys (i.e., all possible


combinations of values of the unobserved variables Y).
▪ Now, a Bayes net gives a complete representation of the full joint
distribution.
▪ Therefore, a query can be answered using a Bayes net by computing
sums of products of conditional probabilities from the network.
Example: P(B | J=true, M=true)

Earthquake Burglary

Alarm

John Mary

P(B|j,m) = P(B) P(E) P(A|B,E)P(j|A)P(m|A)


E A
Earthquake Burglary
Burglar Alarm Example …
Alarm

JohnCalls MarryCalls
Inference by Enumeration

Dynamic Programming
Variable Elimination

▪ A factor is a function from some set of variables into a specific


value: e.g., f(E,A, B)
▪ CPTs are factors, e.g., P(A|E,B) function of A,E,B
▪ VE works by eliminating all variables in turn until there is a factor
with only query variable
▪ To eliminate a variable:
▪ join all factors containing that variable (like DB)
▪ sum out the influence of the variable on new factor
Example of VE: P(J)

P(J)

= M,A,B,E P(J,M,A,B,E) Earthqk Burgl

= M,A,B,E P(J|A)P(M|A) P(B)P(A|B,E)P(E)

= AP(J|A) MP(M|A) BP(B) EP(A|B,E)P(E) Alarm

= AP(J|A) MP(M|A) BP(B) f1(A,B)

= AP(J|A) MP(M|A) f2(A) J M

= AP(J|A) f3(A)
= f4(J)
Example: P(B | J=true, M=true) using VE
Example: Traffic Domain

▪ Random Variables +r 0.1


▪ R: Raining R -r 0.9

▪ T: Traffic
▪ L: Late for class! T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
L

+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9

You might also like