0% found this document useful (1 vote)

433 views

Assignment 7 (Sol.) : Reinforcement Learning

This document contains solutions to 10 questions about reinforcement learning concepts like temporal difference learning and eligibility traces. Key points addressed include: - Increasing the number of episodes over which error is calculated in a TD learning problem will decrease the RMS error shown in the graphs by improving policy evaluation. - The one-step return is not necessarily assigned the maximum weight in a λ-return, depending on the episode length and λ value. - For TD(λ) with λ = 1 and γ = 1, the method behaves like Monte Carlo for undiscounted tasks, eligibility traces do not decay, and the method is not suitable for continuing tasks. - When using an n-step truncated return for

Uploaded by

sachin bhadang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

433 views

Assignment 7 (Sol.) : Reinforcement Learning

Uploaded by

sachin bhadang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 7 (Sol.

)
Reinforcement Learning
Prof. B. Ravindran

1. Consider example 7.1 and figure 7.2 in the text book. Which among the following steps
(keeping all other factors unchanged) will result in a decrease in the RMS errors shown in the
graphs?

(a) increasing the number of states of the MDP

(b) increasing the number of episodes over which error is calculated
(c) increasing the number of repetitions over which the error is calculated
(d) none of the above

Sol. (b)
Note that the graphs are generated by averaging over the first 10 episodes. If we increase the
number of episodes considered, the error shown in the graphs would reduce as evaluation of
the policy improves.
2. Considering episodic tasks and for λ ∈ (0, 1), is it true that the one step return always gets
assigned the maximum weight in the λ-return?

(a) no
(b) yes

Sol. (a)
This is not necessarily true and depends on the length of an episode (as well as the value of
λ). For example, consider an episode of length 3 and a value of λ = 0.7.
3. In the TD(λ) algorithm, if λ = 1 and γ = 1, then which among the following are true?

(a) the method behaves like a Monte Carlo method for an undiscounted task
(b) the eligibility traces do not decay
(c) the value of all states are updated by the TD error in each episode
(d) this method is not suitable for continuing tasks

Sol. (a), (b), (d)

Note that even if λ = 1 and the eligibility traces do not decay, states must first be visited
before their values can be updated.

1
4. Assume you have a MDP with |S| states. You decide to use an n-step truncated corrected
return for the evaluation problem on this MDP. Do you think that there is any utility in
considering values of n which exceed |S| for this problem?

(a) no
(b) yes

Sol. (b)
Note that the number of steps in an n-step truncated corrected return is related to the length
of the trajectories which can exceed the number of states in the state space of a problem.
5. Which among the following are reasons to support your answer in the previous question?

(a) only values of n ≤ |S| should be considered as the number of states is only |S|
(b) all implementations with n > |S| will result in the same evaluation at each stage of the
iterative process
(c) the length of each episode may exceed |S|, and hence values of n > |S| should be consid-
ered
(d) regardless of the number of states, different values of n will always lead to different
evaluations (at each step of the iterative process) and hence cannot be disregarded

Sol. (c)
6. Consider the text book figure 5.1 describing the first-visit MC method prediction algorithm
and figure 7.7 describing the TD(λ) algorithm. Will these two algorithms behave identically
for λ = 1? If so, what kind of eligibility trace will result in equivalence?

(a) no
(b) yes, accumulating traces
(c) yes, replacing traces
(d) yes, dutch traces, with α = 0.5

Sol. (a)
The two algorithms are not identical since figure 7.7 describes the online version of the TD(λ)
algorithm, whereas in the MC algorithm described in figure 5.1, updates are obviously not
made after each individual reward is observed.
7. Given the following sequence of states observed from the beginning of an episode,

s2 , s1 , s3 , s2 , s1 , s2 , s1 , s6

what is the eligibility value, e7 (s1 ), of state s1 at time step 7 given trace decay parameter λ,
discount rate γ, and initial value, e0 (s1 ) = 0, when accumulating traces are used?

(a) γ 7 λ7
(b) (γλ)7 + (γλ)6 + (γλ)3 + γλ
(c) γλ(1 + γ 2 λ2 + γ 5 λ5 )
(d) γ 7 λ7 + γ 3 λ3 + γλ

2
Sol. (c)
According to the non-recursive expression for accumulating eligibility trace, we have
t
X
et (s) = (γλ)t−k Issk
k=0

where Issk is an indicator function.

Using the above expression along with the given state sequence, we have

e7 (s1 ) = (γλ)7−1 + (γλ)7−4 + (γλ)7−6 = γλ + γ 3 λ3 + γ 6 λ6

8. For the above question, what is the eligibility value if replacing traces are used?

(a) γ 7 λ7
(b) γλ
(c) γλ + 1
(d) 3γλ

Sol. (b)
We know that when using replacing traces, the eligibility trace of a state is set to 1 if that
state is visited and decayed by a factor of γλ otherwise. Thus, the latest occurrence of state
s1 just before state s6 would cause e6 (s1 ) to be set to 1 and after the occurrence of state s6 ,
this would decay to e7 (s1 ) = γλ.
9. In solving the control problem, suppose that at the start of an episode the first action that
is taken is not an optimal action according to the current policy. Would an update be made
corresponding to this action and the subsequent reward received in Watkin’s Q(λ) algorithm?

(a) no
(b) yes

Sol. (b)
This is immediately clear from the Watkin’s Q(λ) algorithm described in the text.
10. Suppose that in a particular problem, the agent keeps going back to the same state in a loop.
What is the maximum value that can be taken by the eligibility trace of such a state if we
consider accumulating traces with λ = 0.25 and γ = 0.8?

(a) 1.25
(b) 5.0
(c) ∞
(d) insufficient data

Sol. (a)
For accumulating traces maximum increase in eligibility occurs if the state is selected: et (s) =
1
γλet−1 (s) + 1. At maximum, et (s) = et−1 (s), giving, et (s) = et−1 (s) = 1−γλ .

Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
4 pages
Assignment 1: Reinforcement Learning Prof. B. Ravindran
100% (2)
Assignment 1: Reinforcement Learning Prof. B. Ravindran
4 pages
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Data Analytics With Python - Unit 13 - Week 11
No ratings yet
Data Analytics With Python - Unit 13 - Week 11
4 pages
Introduction To Machine Learning Assignment-Week 4
No ratings yet
Introduction To Machine Learning Assignment-Week 4
5 pages
AP Comp Sci A Practice FRQ 2
No ratings yet
AP Comp Sci A Practice FRQ 2
3 pages
Assignment 3: Reinforcement Learning Prof. B. Ravindran
100% (1)
Assignment 3: Reinforcement Learning Prof. B. Ravindran
4 pages
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
2 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
Week3 Assignment
No ratings yet
Week3 Assignment
6 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 6 - Week 3
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 6 - Week 3
5 pages
PA12
100% (2)
PA12
3 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Assignment 11
100% (1)
Assignment 11
4 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Programming in Java Assignment3: NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
100% (1)
Programming in Java Assignment3: NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
6 pages
Assignment 11: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 11: Reinforcement Learning Prof. B. Ravindran
4 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
NPTEL Introduction To Machine Learning Assignment 10 Answers
100% (1)
NPTEL Introduction To Machine Learning Assignment 10 Answers
7 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
KL Transform
100% (1)
KL Transform
22 pages
Assignment 6
No ratings yet
Assignment 6
2 pages
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (2)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Introduction To Machine Learning - Unit 4 - Week 2
100% (1)
Introduction To Machine Learning - Unit 4 - Week 2
3 pages
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
3 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
100% (1)
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
5 pages
Assignment W4
No ratings yet
Assignment W4
6 pages
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Introduction To Machine Learning Prof. B. Ravindran
2 pages
E-Business All Assignment
No ratings yet
E-Business All Assignment
57 pages
Assignment 1: Unit 3 - Week 1
No ratings yet
Assignment 1: Unit 3 - Week 1
80 pages
Introduction To Machine Learning - IITKGP - Unit 5 - Week 3
No ratings yet
Introduction To Machine Learning - IITKGP - Unit 5 - Week 3
4 pages
Assignment 5 Solution CLOUD COMPUTING 2024
No ratings yet
Assignment 5 Solution CLOUD COMPUTING 2024
4 pages
Assignment-1 Solution
No ratings yet
Assignment-1 Solution
6 pages
Foundation of Cloud IoT Edge ML - Unit 6 - Week 05
50% (4)
Foundation of Cloud IoT Edge ML - Unit 6 - Week 05
3 pages
Machine Learning, ML Ass 5
No ratings yet
Machine Learning, ML Ass 5
6 pages
IOT Assignment-9 Solution
No ratings yet
IOT Assignment-9 Solution
7 pages
Introduction To Machine Learning - IITKGP - Unit 4 - Week 2
No ratings yet
Introduction To Machine Learning - IITKGP - Unit 4 - Week 2
5 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Week 1 NPTEL E-Business 2024
No ratings yet
Week 1 NPTEL E-Business 2024
4 pages
Week 05 Assignment Solution
100% (1)
Week 05 Assignment Solution
4 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Introduction To Internet of Things - Unit 5 - Week 3
No ratings yet
Introduction To Internet of Things - Unit 5 - Week 3
3 pages
Introduction To Deep Learning - Assignment
No ratings yet
Introduction To Deep Learning - Assignment
4 pages
Assigment - 2 - Week 2 - 2023 - G
No ratings yet
Assigment - 2 - Week 2 - 2023 - G
4 pages
Week 11 Assignment
No ratings yet
Week 11 Assignment
7 pages
@vtucode - in 21CS63 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21CS63 Question Bank 2021 Scheme
6 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Introduction To Machine Learning - Unit 3 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1
3 pages
Assignment 9
No ratings yet
Assignment 9
10 pages
Assigment - 1 - Week 1 - 2023 - G
No ratings yet
Assigment - 1 - Week 1 - 2023 - G
3 pages
Assignment 8 - Software Engineering - New
No ratings yet
Assignment 8 - Software Engineering - New
6 pages
Assignment 1: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 1: Introduction To Machine Learning Prof. B. Ravindran
4 pages
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
No ratings yet
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
9 pages
Assignment 12july 2022 Solution
No ratings yet
Assignment 12july 2022 Solution
4 pages
Assignment7
No ratings yet
Assignment7
2 pages
Lec 17 SARSA Expected SARSA Q Learning
No ratings yet
Lec 17 SARSA Expected SARSA Q Learning
4 pages
EE675A Lecture 16
No ratings yet
EE675A Lecture 16
6 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
Coupled Pendulum
No ratings yet
Coupled Pendulum
6 pages
The King Speaks To The Scribe
0% (1)
The King Speaks To The Scribe
7 pages
Algorithms Notes For Professionals PDF
No ratings yet
Algorithms Notes For Professionals PDF
256 pages
01 Converting Maximization Assignment Problem Into Minimization Problem
No ratings yet
01 Converting Maximization Assignment Problem Into Minimization Problem
14 pages
Ann-Unit Ii
No ratings yet
Ann-Unit Ii
21 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
Practice Graph
No ratings yet
Practice Graph
12 pages
Sem 2 Syllabus
No ratings yet
Sem 2 Syllabus
7 pages
Domain Decomposition Methods: Research Project C10
No ratings yet
Domain Decomposition Methods: Research Project C10
2 pages
Cs607 Collection of Old Papers
No ratings yet
Cs607 Collection of Old Papers
13 pages
Programs - Searching&sorting of C
No ratings yet
Programs - Searching&sorting of C
7 pages
2022S_Assignment 3
No ratings yet
2022S_Assignment 3
7 pages
9-Error Detection and Correction
No ratings yet
9-Error Detection and Correction
33 pages
The Art of Combinatorics: Volume Iii - Order and Optimization
No ratings yet
The Art of Combinatorics: Volume Iii - Order and Optimization
3 pages
Compiler Design (CS-701) : Develop A Lexical Analyzer To Recognize A Few Patterns in C
No ratings yet
Compiler Design (CS-701) : Develop A Lexical Analyzer To Recognize A Few Patterns in C
17 pages
CH 15
No ratings yet
CH 15
59 pages
DS - Unit-1 - Introduction To DS
No ratings yet
DS - Unit-1 - Introduction To DS
14 pages
All Unit 2 Int 426
No ratings yet
All Unit 2 Int 426
48 pages
Stack:: Algorithm For PUSH Operation
No ratings yet
Stack:: Algorithm For PUSH Operation
4 pages
CS210 Solutions For Quiz 1-3 Quiz 1: 1. Programming Model
No ratings yet
CS210 Solutions For Quiz 1-3 Quiz 1: 1. Programming Model
16 pages
Ol 48 7 1558
No ratings yet
Ol 48 7 1558
4 pages
Math Presentation
No ratings yet
Math Presentation
33 pages
Polymorphism PolymorphicAssignment
No ratings yet
Polymorphism PolymorphicAssignment
15 pages
DSD Unit 1 Analysis of Algorithm
No ratings yet
DSD Unit 1 Analysis of Algorithm
38 pages
FFT Using Overlap Add Method: 'Enter Input Signal X:' 'Enter Impulse Signal H:'
No ratings yet
FFT Using Overlap Add Method: 'Enter Input Signal X:' 'Enter Impulse Signal H:'
3 pages
Assignment Operators and Expressionswith Answers
No ratings yet
Assignment Operators and Expressionswith Answers
8 pages
Hashing24 PDF
No ratings yet
Hashing24 PDF
10 pages
Section 4 1.: Test: Java Fundamentals Final Exam
No ratings yet
Section 4 1.: Test: Java Fundamentals Final Exam
15 pages
DryRun Steps
No ratings yet
DryRun Steps
15 pages
Euler & Hamilton Graph
No ratings yet
Euler & Hamilton Graph
20 pages
Computer Organization - Boolean Algebra
No ratings yet
Computer Organization - Boolean Algebra
24 pages

Assignment 7 (Sol.) : Reinforcement Learning

Uploaded by

Assignment 7 (Sol.) : Reinforcement Learning

Uploaded by

Assignment 7 (Sol.

(a) increasing the number of states of the MDP

Sol. (a), (b), (d)

where Issk is an indicator function.

e7 (s1 ) = (γλ)7−1 + (γλ)7−4 + (γλ)7−6 = γλ + γ 3 λ3 + γ 6 λ6

You might also like