0% found this document useful (0 votes)

61 views

Decision Making

A rational agent should choose the action that maximizes agent's expected utility. Great progress has been made in this area recently. A truly rational agent takes into account the utility of reasoning as well.

Uploaded by

Tariq Bezinjo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Decision Making

Uploaded by

Tariq Bezinjo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 63

Decision Making Under Uncertainty

Russell and Norvig: ch 16, 17 CMSC421 Fall 2005

Utility-Based Agent
sensors

?
agent actuators

environment

Non-deterministic vs. Probabilistic Uncertainty

? ?

a
{a,b,c}

{a(pa),b(pb),c(pc)} decision that maximizes expected utility value Probabilistic model

decision that is best for worst case Non-deterministic model ~ Adversarial search

Expected Utility
Random variable X with n values x1,,xn and distribution (p1,,pn) E.g.: X is the state reached after doing an action A under uncertainty Function U of X E.g., U is the utility of a state The expected utility of A is EU[A] = Si=1,,n p(xi|A)U(xi)

One State/One Action Example

s0
EU(A1) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

s1
0.2 100

s2
0.7 50

s3
0.1 70

One State/Two Actions Example

s0
EU(AI) = 62 EU(A2) = 74 EU(S0) = max{EU(A1),EU(A2)} = 74 A2

s1
0.2 100

s2
0.7 0.2 50

s3
0.1 70

s4
0.8 80

Introducing Action Costs

s0
EU(A1) = 62 5 = 57 EU(A2) = 74 25 = 49 EU(S0) = max{EU(A1),EU(A2)} = 57 A2

-5

-25

s1
0.2 100

s2
0.7 0.2 50

s3
0.1 70

s4
0.8 80

MEU Principle
rational agent should choose the action that maximizes agents expected utility this is the basis of the field of decision theory normative criterion for rational choice of action

Not quite
Must have complete model of:

Actions Utilities States

Even if you have a complete model, will be computationally intractable In fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationality Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before

Well look at
Decision Theoretic Planning

Simple decision making (ch. 16) Sequential decision making (ch. 17)

Decision Networks
Extend BNs to handle actions and utilities Also called Influence diagrams Make use of BN inference Can do Value of Information calculations

Decision Networks cont.

Chance nodes: random variables, as in BNs Decision nodes: actions that decision maker can take Utility/value nodes: the utility of the outcome state.

R&N example

Umbrella Network
take/dont take P(rain) = 0.4

Take Umbrella umbrella

P(umb|take) = 1.0 P(~umb|~take)=1.0

rain

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

U(umb,~rain) = 0

U(umb,rain) = -25

Evaluating Decision Networks

Set the evidence variables for current state For each possible value of the decision node:

Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node, using BN inference Calculate the resulting utility for action

return the action with the highest utility

Umbrella Network
take/dont take P(rain) = 0.4

Take Umbrella umbrella

P(umb|take) = 1.0 P(~umb|~take)=1.0

rain #1
umb rain 0 1 0 1 P(umb,rain | take)

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

U(umb,~rain) = 0

0 0 1 1

U(umb,rain) = -25

#2: EU(take)

Umbrella Network
take/dont take P(rain) = 0.4

Take Umbrella umbrella

P(umb|take) = 1.0 P(~umb|~take)=1.0

rain #1
umb rain 0 1 0 1 P(umb,rain | ~take)

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

U(umb,~rain) = 0

0 0 1 1

U(umb,rain) = -25

#2: EU(~take)

Value of Information (VOI)

suppose agents current knowledge is E. The value of the current best action is
EU( | E) max U(Re sult i ( A))P(Re sult i ( A ) | E, Do ( A ))
A i

the value of the new best action (after new evidence E is obtained):
EU( | E, E) max U(Re sult i ( A))P(Re sult i ( A) | E, E, Do ( A ))
A i

the value of information for E is:

VOI (E)

P (e
k

| E)EU( ek | e k , E) EU( | E)

Umbrella Network
take/dont take P(rain) = 0.4

Take Umbrella umbrella

P(umb|take) = 1.0 P(~umb|~take)=1.0

rain

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

U(umb,~rain) = 0

forecast
R 0 F 0 P(F|R) 0.8

0
1 1

1
0 1

0.2
0.3 0.7

U(umb,rain) = -25

VOI
VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) EU()

P(F=rainy) = 0.4

Umbrella Network
take/dont take

F 0 0 1 1

R 0 1 0 1

P(R|F) 0.8 0.2 0.3 0.7

P(rain) = 0.4

Take Umbrella umbrella

P(umb|take) = 1.0 P(~umb|~take)=1.0

rain

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

U(umb,~rain) = 0

forecast
R 0 F 0 P(F|R) 0.8

0
1 1

1
0 1

0.2
0.3 0.7

U(umb,rain) = -25

umb 0 0 1 1

rain 0 1 0 1

P(umb,rain | take, rainy)

umb
0 0 1 1

rain
0 1 0 1

P(umb,rain | take, ~rainy)

#1: EU(take|rainy)

#3: EU(take|~rainy)

umb 0 0 1 1

rain 0 1 0 1

P(umb,rain | ~take, rainy)

umb 0 0 1 1

rain 0 1 0 1

P(umb,rain |~take, ~rainy)

#2: EU(~take|rainy)

#4: EU(~take|~rainy)

VOI
VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) EU()

Sequential Decision Making

Finite Horizon Infinite Horizon

Simple Robot Navigation Problem

In each state, the possible actions are U, D, R, and L

Probabilistic Transition Model

In each state, the possible actions are U, D, R, and L The effect of U is as follows (transition model): With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move) With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)

Probabilistic Transition Model

Markov Property
The transition properties depend only on the current state, not on previous history (how that state was reached)

Sequence of Actions
3 2 1 [3,2]

2 3 4 Planned sequence of actions: (U, R)

Sequence of Actions
3 2 1 [3,2] [3,2] [3,3] [4,2]

2 3 4 Planned sequence of actions: (U, R) U is executed

Histories
3 2 1 [3,2] [3,2] [3,3] [4,2] [3,1] [3,2] [3,3] [4,1] [4,2] [4,3]

2 3 4 Planned sequence of actions: (U, R) U has been executed R is executed

There are 9 possible sequences of states called histories and 6 possible final states for the robot!

Probability of Reaching the Goal

Note importance of Markov property 2 in this derivation
1 3

P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2]) P([4,3] | R.[3,3]) = 0.8 P([3,3] | U.[3,2]) = 0.8 P([4,3] | R.[4,2]) = 0.1 P([4,2] | U.[3,2]) = 0.1 P([4,3] | (U,R).[3,2]) = 0.65

Utility Function
3 2 1 +1 -1

[4,3] provides power supply [4,2] is a sand area from which the robot cannot escape

Utility Function
3 2 1 +1 -1

[4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries

Utility Function
3 2 1 +1 -1

[4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4,3] or [4,2] are terminal states

Utility of a History
3 2 1 +1 -1

[4,3] provides power supply [4,2] is a sand area from which the robot cannot escape The robot needs to recharge its batteries [4,3] or [4,2] are terminal states The utility of a history is defined by the utility of the last state (+1 or 1) minus n/25, where n is the number of moves

Utility of an Action Sequence

3 2 1 +1 -1

Consider the action sequence (U,R) from [3,2]

Utility of an Action Sequence

3 2 1 +1 -1 [3,2] [3,2] [3,3] [4,2] [3,1] [3,2] [3,3] [4,1] [4,2] [4,3]

Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability

Utility of an Action Sequence

3 2 1 +1 -1 [3,2] [3,2] [3,3] [4,2] [3,1] [3,2] [3,3] [4,1] [4,2] [4,3]

Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories:

U = ShUh P(h)

Optimal Action Sequence

3 2 1 +1 -1 [3,2] [3,2] [3,3] [4,2] [3,1] [3,2] [3,3] [4,1] [4,2] [4,3]

Consider the action sequence (U,R) from [3,2] A run produces one among 7 possible histories, each with some probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility

Optimal Action Sequence

3 2 1 +1 -1 [3,2] [3,2] [3,3] [4,2] [3,1] [3,2] [3,3] [4,1] [4,2] [4,3]

Consider the action sequence (U,R) from [3,2] A run produces onethe sequence is histories, each with some only if among 7 possible executed blindly! probability The utility of the sequence is the expected utility of the histories The optimal sequence is the one with maximal utility But is the optimal action sequence what we want to compute?

Reactive Agent Algorithm

Accessible or Repeat: observable state s sensed state If s is terminal then exit a choose action (given s) Perform a

Policy

(Reactive/Closed-Loop Strategy)
3 2 1 +1 -1

A policy P is a complete mapping from states to actions

Reactive Agent Algorithm

Repeat: s sensed state If s is terminal then exit a P(s) Perform a

Optimal Policy
3 2 1 +1 -1

A policy P is a completeNote that [3,2] is a dangerous mapping from states to actions state that the optimal policy The optimal policy P* is the one that always yields a tries to avoid history (ending at a terminal state) with maximal expected utility Makes sense because of Markov property

Optimal Policy
3 2 1 +1 -1

This problem states to A policy P is a complete mapping from is called aactions Markov Decision Problem (MDP) The optimal policy P* is the one that always yields a history with maximal expected utility

How to compute P*?

Additive Utility
History H = (s0,s1,,sn) The utility of H is additive iff: U(s ,s ,,s ) = R(0) + U(s ,,s ) = S R(i)
0 1 n 1 n

Reward

Additive Utility
History H = (s0,s1,,sn) The utility of H is additive iff: U(s ,s ,,s ) = R(0) + U(s ,,s )
0 1 n 1 n

S R(i)

Robot navigation example: R(n) = +1 if sn = [4,3] R(n) = -1 if sn = [4,2] R(i) = -1/25 if i = 0, , n-1

Principle of Max Expected Utility

History H = (s0,s1,,sn) Utility of H: U(s0,s1,,sn) = S

R(i)
+1 -1

First-step analysis

U(i) = R(i) + maxa SkP(k | a.i) U(k)

P*(i) = arg maxa SkP(k | a.i) U(k)

Value Iteration
U (i) = 0
0

Initialize the utility of each non-terminal state si to For t = 0, 1, 2, , do:

Ut+1(i) R(i) + maxa SkP(k | a.i) Ut(k)

+1 -1

3 2 1 1 2 3

Value Iteration

Note the importance of terminal state and Initialize the utility of each non-terminalstates si to connectivity of the U0(i) = 0 state-transition graph For t = 0, 1, 2, , do:

Ut+1(i) R(i) + maxa SkP(k | a.i) Ut(k)

Ut([3,1])
+1 -1
0.611 0.5 0 0 10

3 2 1

0.812 0.868 0.918 0.762 0.660

0.705 0.655 0.611 0.388

30 t

Policy Iteration
Pick a policy P at random

Policy Iteration
Pick a policy P at random Repeat:

Compute the utility of each state for P Ut+1(i) R(i) + SkP(k | P(i).i) Ut(k)

Policy Iteration
Pick a policy P at random Repeat:

Compute the utility of each state for P Ut+1(i) R(i) + SkP(k | P(i).i) Ut(k) Compute the policy P given these utilities
P(i) = arg maxa

S P(k | a.i) U(k)

Policy Iteration
Pick a policy P at random Repeat:

Compute the utility of each state for P Ut+1(i) R(i) + SkP(k | P(i).i) Ut(k) Compute the policy P given these utilities
P(i) = arg maxa
Or S P(k solve the set of linear equations: | a.i) U(k)
k

If P = P then return Pa sparse system) (often

U(i) = R(i) + SkP(k | P(i).i) U(k)

Example: Tracking a Target

The robot must keep the target in view The targets trajectory is not known in advance

robot

target

Example: Tracking a Target

Infinite Horizon
In many problems, e.g., the robot navigation example, histories are What if the robot lives forever? potentially unbounded and the same One trick: state can be reached many times
3 2 1 1 2 3 4 +1 -1

Use discounting to make infinite Horizon problem mathematically tractible

POMDP (Partially Observable Markov Decision Problem)

A sensing operation returns multiple states, with a probability distribution Choosing the action that maximizes the expected utility of this state distribution assuming state utilities computed as above is not good enough, and actually does not make sense (is not rational)

Example: Target Tracking

There is uncertainty in the robots and targets positions, and this uncertainty grows with further motion There is a risk that the target escape behind the corner requiring the robot to move appropriately But there is a positioning landmark nearby. Should the robot tries to reduce position uncertainty?

Summary
Decision making under uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration

Cs 188 HW Solutions Artificial Intelligence
No ratings yet
Cs 188 HW Solutions Artificial Intelligence
7 pages
Year 4 Maths Test - Symmetry - Questions
100% (2)
Year 4 Maths Test - Symmetry - Questions
6 pages
A Concise Introduction To Models and Methods For Automated Planning
No ratings yet
A Concise Introduction To Models and Methods For Automated Planning
143 pages
GRE Magoosh Practice Questions
100% (1)
GRE Magoosh Practice Questions
15 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
MDP PDF
No ratings yet
MDP PDF
37 pages
Lec 25
No ratings yet
Lec 25
20 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
[24F-COSE361] 5. Markov Decision Process (1)
No ratings yet
[24F-COSE361] 5. Markov Decision Process (1)
40 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Markov Decision Processes: Stochastic, Sequential Environments
No ratings yet
Markov Decision Processes: Stochastic, Sequential Environments
20 pages
Lec 26
No ratings yet
Lec 26
21 pages
UNIT 4 (2)
No ratings yet
UNIT 4 (2)
6 pages
DOC-20250127-WA0003.
No ratings yet
DOC-20250127-WA0003.
16 pages
Markov Decision Process
No ratings yet
Markov Decision Process
29 pages
Lecture 4: Sequential Decision Making: Simon Parsons
No ratings yet
Lecture 4: Sequential Decision Making: Simon Parsons
94 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
c26 Dtheory
No ratings yet
c26 Dtheory
19 pages
A16 Simple Decisions
No ratings yet
A16 Simple Decisions
16 pages
07. Chapter. 07 - Expectimax Search and Utilities
No ratings yet
07. Chapter. 07 - Expectimax Search and Utilities
47 pages
A16 Simple Decisions
No ratings yet
A16 Simple Decisions
16 pages
A16-Simple Decisions
No ratings yet
A16-Simple Decisions
16 pages
AI Notes
No ratings yet
AI Notes
37 pages
Chapter17 1
No ratings yet
Chapter17 1
40 pages
Lec 08
No ratings yet
Lec 08
59 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
21 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
07 Expectimax
No ratings yet
07 Expectimax
46 pages
Nips00 Bs
No ratings yet
Nips00 Bs
7 pages
16 Decision Making
No ratings yet
16 Decision Making
14 pages
Markov Decision Process I
No ratings yet
Markov Decision Process I
111 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
CS480 Lecture October26th
No ratings yet
CS480 Lecture October26th
63 pages
Dicision Networks
No ratings yet
Dicision Networks
4 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Expectimax Search and Utilities
No ratings yet
Expectimax Search and Utilities
44 pages
06 MDP
No ratings yet
06 MDP
89 pages
5 - MDP
No ratings yet
5 - MDP
42 pages
16 - Reinforcement Learning and Bandits.pptx
No ratings yet
16 - Reinforcement Learning and Bandits.pptx
41 pages
Crib Sheet
No ratings yet
Crib Sheet
2 pages
Sections
No ratings yet
Sections
76 pages
2_ U1_AIML_PART2
No ratings yet
2_ U1_AIML_PART2
86 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
18 AI BasicRL
No ratings yet
18 AI BasicRL
96 pages
Algorithms To Solve An MDP
No ratings yet
Algorithms To Solve An MDP
24 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
T02640020220124069T0264-14 Probabilistic Reasoning Over Time
No ratings yet
T02640020220124069T0264-14 Probabilistic Reasoning Over Time
32 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Junaid109,Assignment No1,BSAI(3B)
No ratings yet
Junaid109,Assignment No1,BSAI(3B)
7 pages
ML unit 4
No ratings yet
ML unit 4
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
Uninformed Search Chapter 3 Hamid
No ratings yet
Uninformed Search Chapter 3 Hamid
86 pages
RL Frra
No ratings yet
RL Frra
10 pages
lecture-06
No ratings yet
lecture-06
98 pages
Maai 6
No ratings yet
Maai 6
143 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Data Communication:: Major Components of DC
No ratings yet
Data Communication:: Major Components of DC
11 pages
Project Management Tools and Techniques
100% (1)
Project Management Tools and Techniques
32 pages
Roots: Latin Root Basic Meaning Example Words Example Words
No ratings yet
Roots: Latin Root Basic Meaning Example Words Example Words
6 pages
Personnel Policy: Mir Shai Mazar Baloch
No ratings yet
Personnel Policy: Mir Shai Mazar Baloch
27 pages
Training & Development
No ratings yet
Training & Development
9 pages
System Development Life Cycle (SDLC)
100% (1)
System Development Life Cycle (SDLC)
16 pages
Overview of Strategic Management
No ratings yet
Overview of Strategic Management
56 pages
Vec 2
No ratings yet
Vec 2
21 pages
Unit 1 Grade 8
No ratings yet
Unit 1 Grade 8
4 pages
Specimen paper 2 (Mathematics)
0% (1)
Specimen paper 2 (Mathematics)
12 pages
Recursion
No ratings yet
Recursion
9 pages
Question On CFT
No ratings yet
Question On CFT
2 pages
Jiao (2020 Berkeley) - Spectral Representation of Random Processes
No ratings yet
Jiao (2020 Berkeley) - Spectral Representation of Random Processes
5 pages
State Tables and State Diagrams
No ratings yet
State Tables and State Diagrams
3 pages
Bessel'S Function: R R N X X J
No ratings yet
Bessel'S Function: R R N X X J
4 pages
Pdfcoffee.com Worksheet on Indices PDF Free
No ratings yet
Pdfcoffee.com Worksheet on Indices PDF Free
9 pages
Boundary Value Problems: On Higher Order Differential Equations
No ratings yet
Boundary Value Problems: On Higher Order Differential Equations
3 pages
Business Math Lesson Guide
No ratings yet
Business Math Lesson Guide
9 pages
9ma0_01 Mock Paper a (Edm)
No ratings yet
9ma0_01 Mock Paper a (Edm)
11 pages
grade 11 Test 2
No ratings yet
grade 11 Test 2
2 pages
2003 Solutions Cayley Contest: Canadian Mathematics Competition
No ratings yet
2003 Solutions Cayley Contest: Canadian Mathematics Competition
9 pages
P.7 MTC WEEKEND TEST SET TWO
No ratings yet
P.7 MTC WEEKEND TEST SET TWO
12 pages
BS111
No ratings yet
BS111
2 pages
Lesson Plan 2-3 Solving Word Problems
No ratings yet
Lesson Plan 2-3 Solving Word Problems
21 pages
Data Types
No ratings yet
Data Types
8 pages
5.5 Binomial Identities
No ratings yet
5.5 Binomial Identities
26 pages
Ryan PDavies Workshop Paper
No ratings yet
Ryan PDavies Workshop Paper
6 pages
Chapter Two - 2.1 Geometry Notes
No ratings yet
Chapter Two - 2.1 Geometry Notes
8 pages
Kuppi
No ratings yet
Kuppi
6 pages
R. Young - Evolutionary Biologoy and Ideology. Then and Now
No ratings yet
R. Young - Evolutionary Biologoy and Ideology. Then and Now
20 pages
A1 C7 L7 Answer Key
100% (1)
A1 C7 L7 Answer Key
27 pages
Week002 LabEx
No ratings yet
Week002 LabEx
3 pages
Binomial Distribution
No ratings yet
Binomial Distribution
6 pages
Chapter Two-Computational Complexity
No ratings yet
Chapter Two-Computational Complexity
19 pages

Decision Making

Uploaded by

Decision Making

Uploaded by

Decision Making Under Uncertainty

Russell and Norvig: ch 16, 17 CMSC421 Fall 2005

Non-deterministic vs. Probabilistic Uncertainty

{a(pa),b(pb),c(pc)} decision that maximizes expected utility value Probabilistic model

One State/One Action Example

One State/Two Actions Example

Introducing Action Costs

Actions Utilities States

Decision Networks cont.

Take Umbrella umbrella

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

Evaluating Decision Networks

return the action with the highest utility

Take Umbrella umbrella

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

Take Umbrella umbrella

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

Value of Information (VOI)

the value of information for E is:

Take Umbrella umbrella

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

P(R|F) 0.8 0.2 0.3 0.7

Take Umbrella umbrella

happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100

P(umb,rain | take, rainy)

P(umb,rain | take, ~rainy)

P(umb,rain | ~take, rainy)

P(umb,rain |~take, ~rainy)

Sequential Decision Making

Simple Robot Navigation Problem

In each state, the possible actions are U, D, R, and L

Probabilistic Transition Model

Probabilistic Transition Model

Probabilistic Transition Model

2 3 4 Planned sequence of actions: (U, R)

2 3 4 Planned sequence of actions: (U, R) U is executed

2 3 4 Planned sequence of actions: (U, R) U has been executed R is executed

Probability of Reaching the Goal

Utility of an Action Sequence

Consider the action sequence (U,R) from [3,2]

Utility of an Action Sequence

Utility of an Action Sequence

Optimal Action Sequence

Optimal Action Sequence

Reactive Agent Algorithm

A policy P is a complete mapping from states to actions

Reactive Agent Algorithm

How to compute P*?

Principle of Max Expected Utility

U(i) = R(i) + maxa SkP(k | a.i) U(k)

Initialize the utility of each non-terminal state si to For t = 0, 1, 2, , do:

Ut+1(i) R(i) + maxa SkP(k | a.i) Ut(k)

Ut+1(i) R(i) + maxa SkP(k | a.i) Ut(k)

0.812 0.868 0.918 0.762 0.660

0.705 0.655 0.611 0.388

S P(k | a.i) U(k)

If P = P then return Pa sparse system) (often

U(i) = R(i) + SkP(k | P(i).i) U(k)

Example: Tracking a Target

Example: Tracking a Target

Example: Tracking a Target

Use discounting to make infinite Horizon problem mathematically tractible

POMDP (Partially Observable Markov Decision Problem)

Example: Target Tracking

You might also like