CS480 Lecture October26th
CS480 Lecture October26th
2
Plan for Today
Decision Networks
3
Playing Minesweeper with Bayes’ Rule
Prior probability / belief: Posterior probability / belief:
X X
4
Naive Bayes Spam Filter
Email = Spam
5
Agents and Belief State
Partially
Sensor A=4 B=7 observable
A=4
Agent’s model of the world
B=7
Environment could be in one of Agent can
those states: consult its C=0
Agent Environment
Assume: DC = {0,1,2,3}
6
Decision Theory
Decisions: every plan (actions) leads to an
outcome (state)
Agents have preferences (preferred outcomes)
Preferences outcome utilities
Agents have degrees of belief (probabilities) for
actions
7
Decision Theory
Decisions: every plan (actions) leads to an
outcome (state)
Agents have preferences (preferred outcomes)
Preferences outcome utilities
Agents have degrees of belief (probabilities) for
actions
8
Maximum Expected (Average) Utility
9
Agents Decisions
Recall that agent ACTIONS change the state:
if we are in state s
action a is expected to
lead to another state s’ (outcome)
10
State Utility Function
Agent’s preferences (desires) are captured by
the Utility function U(s).
11
Expected Action Utility
The expected utility of an action a given the
evidence is the average utility value of all
possible outcomes s’ of action a, weighted by
their probability (belief) of occurence:
’ ’
12
How Did We Get Here?
Let’s start with relationships (and related notation)
between agent’s preferences:
agent prefers A over B:
A B
agent is indifferent between A and B:
A ~ B
agent prefers A over B or is indifferent between
A and B (weak preference):
A B
13
The Concept of Lottery
Let’s assume the following:
an action a is a lottery ticket
the set of outcomes (resulting states) is a lottery
A lottery L with possible outcomes S1, ..., Sn that
occur with probabilities p1, ..., pn is written as:
15
Lottery Constraints: Transitivity
Given three lotteries A, B, and C, if an agent
prefers A to B AND prefers B to C, then the agent
must prefer A to C:
(A B) (B C) (A C)
16
Lottery Constraints: Continuity
If some lottery B is between A and C in preference,
then there is some probability p for which the
rational agent will be indifferent between getting B
for sure or some other lottery that yields A with
probability p and C with probability 1 - p:
(A B C) p [p, A; 1- p, C] ~ B
17
Lottery Constraints: Substitutability
If an agent is indifferent between two lotteries A
and B, then the agent is indifferent between two
more complex lotteries that are the same, except
that B is subsituted for A in one of them:
(A ~ B) [p, A; 1- p, C] ~ [p, B; 1- p, C]
18
Lottery Constraints: Monotonicity
Suppose two lotteries have the same two possible
outcomes, A and B. If an agent prefers A to B, then
the agent must prefer the lottery that has a higher
probability for A:
19
Lottery Constraints: Decomposability
Compound lotteries can be reduced to smaller ones
using the laws of probability:
20
Preferences and Utility Function
An agent whose preferences between lotteries
follow the set of axioms (of utility theory) below:
Orderability
Transitivity
Continuity
Subsitutability
Monotonicity
Decomposability
can be described as possesing a utility function and
maximize it.
21
Preferences and Utility Function
If an agent’s preferences obey the axioms of utility
theory, then there exist a function U such that:
and
22
Multiattribute Outcomes
Outcomes can be characterized by more than one
attribute. Decisions in such cases are handled by
Multiattribute Utility Theory.
Attributes: X = X1, ..., Xn
Assigned values: x = <x1, ..., xn>
23
Strict Dominance: Deterministic
B strictly dominates A
B is better than A
for both X1 and X2
24
Strict Dominance: Deterministic
D doesn’t strictly
dominate A,D is better
than A only for X1
25
Strict Dominance: Uncertain
B strictly dominates A
B is better than A
for both X1 and X2
26
Decision Network (Influence Diagram)
Decision networks (also called influence diagrams)
are structures / mechanisms for making rational
decisions.
27
Decision Networks
The most basic decision network needs to include:
information about current state s
possible actions
resulting state s’ (after applying chosen action a)
utility of the resulting state U(s’)
28
Decision Network Nodes
Decision networks are built using the following
nodes:
chance nodes:
X
decision nodes:
Y
29
Decision Network: Example
30
Decision Network: Example
Nodes describing
current state
31
Decision Network: Example
Evidence
32
Decision Network: Example
Airport Site decision
changes Safety, Quietness,
and Frugality nodes
conditional distributions
33
Decision Network: Example
Parents of the
utility node
34
Decision Network: Example
Outcome nodes
35
Decision Network: Example
Nodes directly
influencing utility
36
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
37
Decision Network: Example
38
Decision Network: Simplified Form
Action-Utility table
is used to get
expected utility
39
(Single-Stage) Decision Networks
General Structure Simplified Structure
Decision Network Decision Network
40
(Single-Stage) Decision Networks
General Structure Simplified Structure
Q low low high high low low high high L low low high --- --- low high high
F low high low high low high low high C low high low --- --- high low high
41
Decision Network: Evaluation
The algorithm for decision network evaluation is as
follows:
1. Set the evidence variables for the current state
2. For each possible value a of decision node:
a. Set the decision node to that value
b. Calculate the posterior probabilities for the parent
nodes of the utility node
c. Calculate the utility for the action / value a
3. Return the action with highest utility
42
Agent’s Decisions
Recall that agent ACTIONS change the state:
if we are in state s
action a is expected to
lead to another state s’ (outcome)
43
Expected Action Utility
The expected utility of an action a given the
evidence is the average utility value of all
possible outcomes s’ of action a, weighted by
their probability (belief) of occurence:
’ ’
44
Decision Networks: Example
Decision: take umbrella Decision: leave umbrella
Weather Weather
Decision U Decision U
D W U D W U
45
Decision Networks: Example
Decision: take umbrella Decision: leave umbrella
’ ’
Weather Weather
Decision U Decision U
D W U D W U
46
Decision Networks: Example
Decision: take umbrella Decision: leave umbrella
’ ’
Weather Weather
Decision U Decision U
S2’: D = take, W = rain leave sun 100 S4’: D = leave, W = rain leave sun 100
EU(take) = EU(leave) =
leave rain 0 leave rain 0
P(Result(take) = S1’)*U(S1’) + P(Result(leave) = S3’)*U(S3’) +
P(Result(take) = S2’)*U(S2’) = take sun 20 P(Result(leave) = S4’)*U(S4’) = take sun 20
0.70 * 20 + 0.30 * 70 = 35 take rain 70 0.70 * 100 + 0.30 * 0 = 70 take rain 70
47
Decision Networks: Example
Which action to choose: take or leave Umbrella?
’ ’
Weather Weather
Decision U Decision U
S2’: D = take, W = rain leave sun 100 S4’: D = leave, W = rain leave sun 100
EU(take) = EU(leave) =
leave rain 0 leave rain 0
P(Result(take) = S1’)*U(S1’) + P(Result(leave) = S3’)*U(S3’) +
P(Result(take) = S2’)*U(S2’) = take sun 20 P(Result(leave) = S4’) U(S4’) = take sun 20
0.70 * 20 + 0.30 * 70 = 35 take rain 70 0.70 * 100 + 0.30 * 0 = 70 take rain 70
48
Decision Networks: Example
Decision: take umbrella Decision: leave umbrella
Decision U Decision U
D W U D W U
49
Decision Networks: Example
Decision:take umbrella given e Decision:leave umbrella given e
’ ’
Decision U Decision U
D W U D W U
50
Decision Networks: Example
Decision:take umbrella given e Conditional probabilities
Assume that we are given:
’
F W P(F|W)
By Bayes’ Theorem:
P(F = sun | W = sun) ∗ P(W = sun)
P(W = sun | F = sun) = =
P(F = sun)
0.80 ∗ 0.70
Decision U = 0.95
0.59
51
Decision Networks: Example
Decision:take umbrella given sun Decision:leave umbrella given sun
’ ’
Forecast Forecast
Weather Weather
=sun =sun
Decision U Decision U
D W U D W U
52
Decision Networks: Example
Decision:take umbrella given sun Decision:leave umbrella given sun
’ ’
Forecast Forecast
Weather Weather
=sun =sun
Decision U Decision U
S2’: D = take, W = rain leave sun 100 S4’: D = leave, W = rain leave sun 100
EU(take) = EU(leave) =
leave rain 0 leave rain 0
P(Result(take)=S1’|e)*U(S1’) + P(Result(leave)=S3’|e)*U(S3’) +
P(Result(take)=S2’|e)*U(S2’) = take sun 20 P(Result(leave)=S4’|e)*U(S4’) = take sun 20
0.95 * 20 + 0.05 * 70 = 22.5 take rain 70 0.95 * 100 + 0.05 * 0 = 95 take rain 70
53
Decision Networks: Example
Decision:take umbrella given rain Decision:leave umbrella given rain
’ ’
Forecast Forecast
Weather Weather
= rain = rain
Decision U Decision U
D W U D W U
54
Decision Networks: Example
Decision:take umbrella given rain Decision:leave umbrella given rain
’ ’
Forecast Forecast
Weather Weather
= rain = rain
Decision U Decision U
S2’: D = take, W = rain leave sun 100 S4’: D = leave, W = rain leave sun 100
EU(take) = EU(leave) =
leave rain 0 leave rain 0
P(Result(take)=S1’|e)*U(S1’) + P(Result(leave)=S3’|e)*U(S3’) +
P(Result(take)=S2’|e)*U(S2’) = take sun 20 P(Result(leave)=S4’|e)*U(S4’) = take sun 20
0.34 * 20 + 0.66 * 70 = 53 take rain 70 0.34 * 100 + 0.66 * 0 = 34 take rain 70
55
Decision Networks: Example
Decision:take umbrella given rain Decision:leave umbrella given rain
Forecast Forecast
Weather Weather
= rain = rain
Decision U Decision U
Decision U Decision U
56
Decision Networks: Example
Decision:take umbrella given rain Decision:leave umbrella given rain
Forecast Forecast
Weather Weather
= rain = rain
Decision U Decision U
Decision U Decision U
57
Value of Perfect Information
The value/utility of best action a without additional
evidence (information) is :
a
58
Decision Network: Example
Decision network Outcome tree
Weather Forecast
Weather | e Weather | e
The value of best action a without additional evidence sun rain sun rain
MEU(a) =MEU(leave)=70
With evidence information (E = e ) given by Forecast:
U U U U
MEU(𝑎 | 𝑒 ) = MEU(take | 𝐹 = 𝑟𝑎𝑖𝑛) = 53
MEU(𝑎 | 𝑒 ) = MEU(leave | 𝐹 = 𝑠𝑢𝑛) = 95
U(t,s) U(t,r) U(l,s) U(l,r)
The value of additional evidence / information from F is:
VPI(𝐸 ) = ∑ P(𝐸 = 𝑒 ) ∗ MEU(𝑎 | 𝐸 = 𝑒 ) − 𝑀𝐸𝑈(𝑎)
VPI(𝐹) =(P(𝐹 = 𝑟𝑎𝑖𝑛) ∗ MEU(𝑡𝑎𝑘𝑒 | 𝐹 = 𝑟𝑎𝑖𝑛) + P(𝐹 = 𝑠𝑢𝑛) ∗
MEU(leave | 𝐹 = 𝑠𝑢𝑛)) − 𝑀𝐸𝑈(𝑙𝑒𝑎𝑣𝑒) =
(0.41 ∗ 53 + 0.59 ∗ 95) − 70 = 7.78
59
Decision Networks: Example
Decision:leave umbrella Decision:take umbrella given rain
Forecast
Weather Weather
= rain
Decision U Decision U
60
Utility & Value of Perfect Information
Action Action Action
a1 a1
Action a1
Action Action
a2 a2 a2
61
VPI Properties
Given a decision network with possible observations Ej
(sources of new information / evidence):
62
Information Gathering Agent
63