Reinforcement Learning: Russell and Norvig: CH 21

Reinforcement learning is a method for an agent to learn behaviors without a teacher by taking actions in an environment and receiving feedback on its performance. The goal is for the agent to maximize rewards over time by learning which actions led to rewards or punishments through a credit assignment process. Reinforcement learning approaches can be used to train agents to perform tasks like playing games, scheduling jobs, and controlling robot limbs.

Uploaded by

Zuzar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views16 pages

Reinforcement Learning: Russell and Norvig: CH 21

Uploaded by

Zuzar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 16

Reinforcement Learning

Russell and Norvig: ch 21

CMSC 671 Fall 2005

Slides from Jean-Claude

Latombe and Lise Getoor
Reinforcement Learning
Supervised (inductive) learning is the simplest and
most studied type of learning
How can an agent learn behaviors when it doesnt
have a teacher to tell it how to perform?
The agent has a task to perform
It takes some actions in the world
At some later point, it gets feedback telling it how well it did
on performing the task
The agent performs the same task over and over again
This problem is called reinforcement learning:
The agent gets positive reinforcement for tasks done well
The agent gets negative reinforcement for tasks done poorly
Reinforcement Learning (cont.)
The goal is to get the agent to act in the
world so as to maximize its rewards
The agent has to figure out what it did that
made it get the reward/punishment
This is known as the credit assignment problem
Reinforcement learning approaches can be
used to train computers to do many tasks
backgammon and chess playing
job shop scheduling
controlling robot limbs
Reinforcement learning on the
web

Nifty applets:
for blackjack
for robot motion
for a pendulum controller
Formalization
Given:
a state space S
a set of actions a1, , ak
reward value at the end of each trial (may
be positive or negative)
Output:

example:
a mapping Alvinnto
from states (driving
actionsagent)
state: configuration of the car
learn a steering action for each state
Reactive Agent Algorithm
Accessible or
Repeat: observable state
s sensed state
If s is terminal then exit
a choose action (given s)
Perform a
Policy (Reactive/Closed-Loop Strategy)

3 +1

2 -1

1 2 3 4

A policy P is a complete mapping from states to actions

Reactive Agent Algorithm

Repeat:
s sensed state
If s is terminal then exit
a P(s)
Perform a
Approaches
Learn policy directly function mapping
from states to actions
Learn utility values for states (i.e., the
value function)
Value Function
The agent knows what state it is in
The agent has a number of actions it can perform in
each state.
Initially, it doesn't know the value of any of the states
If the outcome of performing an action at a state is
deterministic, then the agent can update the utility
value U() of states:
U(oldstate) = reward + U(newstate)
The agent learns the utility values of states as it
works its way through the state space
Exploration
The agent may occasionally choose to explore
suboptimal moves in the hopes of finding better
outcomes
Only by visiting all the states frequently enough can we
guarantee learning the true values of all the states
A discount factor is often introduced to prevent utility
values from diverging and to promote the use of
shorter (more efficient) sequences of actions to
attain rewards
The update equation using a discount factor is:
U(oldstate) = reward + * U(newstate)
Normally, is set between 0 and 1
Q-Learning
Q-learning augments value iteration by
maintaining an estimated utility value
Q(s,a) for every action at every state
The utility of a state U(s), or Q(s), is
simply the maximum Q value over all
the possible actions at that state
Learns utilities of actions (not states)
model-free learning
Q-Learning
foreach state s
foreach action a
Q(s,a)=0
s=currentstate
do forever
a = select an action
do action a
r = reward from doing a
t = resulting state from doing a
Q(s,a) = (1 ) Q(s,a) + (r + Q(t))
s=t
The learning coefficient, , determines how quickly our
estimates are updated
Normally, is set to a small positive constant less than
1
Selecting an Action
Simply choose action with highest (current)
expected utility?
Problem: each action has two effects
yields a reward (or penalty) on current sequence
information stuck
is received and used in learning for
in a rut
future sequences
Trade-off: immediate good for long-term well-
being

try a shortcut you might get lost;

you might learn a new, quicker route!
Exploration policy
Wacky approach (exploration): act randomly
in hopes of eventually exploring entire
environment
Greedy approach (exploitation): act to
maximize utility using current estimate
Reasonable balance: act more wacky
(exploratory) when agent has little idea of
environment; more greedy when the model is
close to correct
Example: n-armed bandits
RL Summary
Active area of research
Approaches from both OR and AI
There are many more sophisticated
algorithms that we have not discussed
Applicable to game-playing, robot
controllers, others

Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
37 RL
No ratings yet
37 RL
18 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
Unit-5 Mlt
No ratings yet
Unit-5 Mlt
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Unit 5
No ratings yet
Unit 5
45 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
Unit 1
No ratings yet
Unit 1
18 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Sections
No ratings yet
Sections
76 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
unit 3 ai
No ratings yet
unit 3 ai
5 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
UNIT-4
No ratings yet
UNIT-4
56 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
unit 5 ml
No ratings yet
unit 5 ml
15 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
fai_mid2_4ans[1]
No ratings yet
fai_mid2_4ans[1]
4 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
A (Long) Peek Into Reinforcement Learning _ Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning _ Lil'Log
23 pages
16 - Reinforcement Learning and Bandits.pptx
No ratings yet
16 - Reinforcement Learning and Bandits.pptx
41 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Some Thoughts On Reinforcement Learning: 1 Motivation
No ratings yet
Some Thoughts On Reinforcement Learning: 1 Motivation
9 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
ML unit 4
No ratings yet
ML unit 4
17 pages
ML UNIT 5
No ratings yet
ML UNIT 5
13 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Over Head Tank
No ratings yet
Over Head Tank
44 pages
GAM DistributionOfLoads 2 Horizontal Torsion
100% (1)
GAM DistributionOfLoads 2 Horizontal Torsion
48 pages
Etabs Column Reactions
No ratings yet
Etabs Column Reactions
9 pages
Foundation Design For Dynamic Equipment Based On ACI 351.3 & ACI 318-19
No ratings yet
Foundation Design For Dynamic Equipment Based On ACI 351.3 & ACI 318-19
12 pages
250 MM 350 MM: Check For Crack Width As Per Is 456-2000
No ratings yet
250 MM 350 MM: Check For Crack Width As Per Is 456-2000
3 pages
Common Retaining Walls: Dr. Mohammed E. Haque, P.E. Retaining Walls
No ratings yet
Common Retaining Walls: Dr. Mohammed E. Haque, P.E. Retaining Walls
18 pages
Design Modelling Analysis and Implementation of Two Phase Interleaved Buck DC DC Converter
No ratings yet
Design Modelling Analysis and Implementation of Two Phase Interleaved Buck DC DC Converter
8 pages
Gaddis Python 6e Chapter 02
No ratings yet
Gaddis Python 6e Chapter 02
81 pages
ODE Assign 5 Soln
No ratings yet
ODE Assign 5 Soln
9 pages
Development of Packaging and Products For Use in Microwave Ovens (Woodhead Publishing in Materials) 2nd Edition Ulrich Erle (Editor)
100% (3)
Development of Packaging and Products For Use in Microwave Ovens (Woodhead Publishing in Materials) 2nd Edition Ulrich Erle (Editor)
62 pages
MTC SENARIO QUESTION
100% (1)
MTC SENARIO QUESTION
25 pages
Tianjian Lu
No ratings yet
Tianjian Lu
67 pages
Non Newtonian Fluids
No ratings yet
Non Newtonian Fluids
2 pages
Chapter 4 & 5
No ratings yet
Chapter 4 & 5
12 pages
Array Pre Test
No ratings yet
Array Pre Test
7 pages
Untitled
100% (2)
Untitled
286 pages
Power System
No ratings yet
Power System
25 pages
1.engineering Drawing Short Questions - Line (Geometry) - Angle
No ratings yet
1.engineering Drawing Short Questions - Line (Geometry) - Angle
1 page
Line Grapth
No ratings yet
Line Grapth
9 pages
BLUEPRINT 2025 10
No ratings yet
BLUEPRINT 2025 10
25 pages
Unit V Probability Distribution Marks 12
No ratings yet
Unit V Probability Distribution Marks 12
6 pages
JR Chemistry PDF - Set-2
No ratings yet
JR Chemistry PDF - Set-2
1 page
Mathematical Foundations of Big Data Analytics Vladimir Shikhman - Instantly access the full ebook content in just a few seconds
No ratings yet
Mathematical Foundations of Big Data Analytics Vladimir Shikhman - Instantly access the full ebook content in just a few seconds
69 pages
B.Tech. Aeronautical-Structure
No ratings yet
B.Tech. Aeronautical-Structure
28 pages
Process Costing: Discussion Questions
No ratings yet
Process Costing: Discussion Questions
53 pages
Kahneman Peak End Rule and Duration Neglet
No ratings yet
Kahneman Peak End Rule and Duration Neglet
10 pages
Week 3 Decision Makıng Under Uncertaınıty
No ratings yet
Week 3 Decision Makıng Under Uncertaınıty
46 pages
A Novel Approach To Predicting Youngs Mo PDF
No ratings yet
A Novel Approach To Predicting Youngs Mo PDF
11 pages
1 Transformation and Collineations
No ratings yet
1 Transformation and Collineations
2 pages
CSE330 Quiz Solutions
No ratings yet
CSE330 Quiz Solutions
5 pages
unit-v-notes-02
No ratings yet
unit-v-notes-02
12 pages
Clock Homework Sheets
100% (1)
Clock Homework Sheets
7 pages
Circuit Analysis
No ratings yet
Circuit Analysis
3 pages
Datalog Educational System V3.8 User's Manual
No ratings yet
Datalog Educational System V3.8 User's Manual
264 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
Python Lab File PDF
No ratings yet
Python Lab File PDF
17 pages