0% found this document useful (0 votes)

2 views

CH5_Function Approximation (1)

The document discusses function approximation in reinforcement learning (RL), highlighting its necessity due to challenges like the curse of dimensionality and the impracticality of storing explicit value functions. It covers various types of function approximators, including linear and nonlinear methods, and introduces Deep Q-Learning, which utilizes deep neural networks to approximate Q-values for optimal action selection. Additionally, it addresses challenges in deep RL such as overfitting and the exploration-exploitation tradeoff, while also explaining techniques like experience replay and target networks to stabilize learning.

Uploaded by

vemuripraveena2622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CH5_Function Approximation (1)

Uploaded by

vemuripraveena2622

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Function Approximation

Dr. D. John Pradeep

Associate Professor
VIT-AP University
Function Approximation
Need for Function Approximation
• In many real-life learning situations, maintaining explicit
representations of value functions or policies is impractical, due
to the large and continuous state and action spaces.
• The curse of dimensionality poses a significant challenge,
leading to increased memory and processing demands.
• Function approximation addresses this problem by enabling
agents to make informed decisions in unfamiliar conditions by
generalizing previously learned information.
Function Approximation
Types of Function Approximation in RL
Linear Function Approximation: Uses a weighted sum of
features

where w is a vector of weights, ( ) is a feature vector for state

Function approximation - RL

• Instead of storing V/Q values, update θ parameters such that

they fulfill the approximation relations
•
Function approximation - RL
Policy Improvement
Least Squares Method
• The least-squares method is a statistical method used to find the
line of best fit of the form of an equation such as y = mx + b to
the given data. The curve of the equation is called the regression
line
• The main objective of this method is to reduce the sum of the
squares of errors as much as possible. This is the reason this
method is called the least-squares method.
Procedure
• Let us assume that the given points of data are (x1, y1), (x2, y2),
(x3, y3), …, (xn, yn) in which all x’s are independent variables,
while all y’s are dependent ones.
• This method is used to find a linear line of the form y = mx + b,
where y and x are variables, m is the slope, and b is the y-
intercept.
• The formula to calculate slope m and the value of b is given by:
m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2
b = (∑y - m∑x)/n
Here, n is the number of data points
LSM - Example

m = 65/50 = 13/10
b = 5.5/5

y = mx + b = 13/10x + 5.5/5.
Function Approximation
Types of Function Approximation in RL
Nonlinear Function Approximation
• Uses models like neural networks to approximate value
functions or policies.
• Deep RL methods rely on nonlinear approximators:
• Deep Q-Networks (DQN) – Uses deep learning for Q-value
approximation.
• Actor-Critic Methods – Uses neural networks for both policy and
value function approximation.
Function Approximation - Challenges
Overfitting
• During training, agents could commit some states to memory,
which could hinder their ability to generalize in new scenarios.
This problem can be addressed by building a cautious
approximation model construction.
Exploration-Exploitation Tradeoff
• To find the best plans, agents must strike a balance between
using information that is already known and investigating
uncharted territory. To achieve a balance, incentive structures
and exploration tactics must be carefully designed.
Tile Coding
• Used for shrinking the feature vector thereby improving the
computational efficiency
• We cover more states, with fewer features.
• The indicated point is represented as (0,0,0,0,1,0,0,0,0). – one hot
encoding
• How about you take 4 - 2x2 boxes, and just shift them a little bit.
• Now you cover 10 states with only 4 dimensions, or 4 inputs: red box,
green box, blue box, and purple box.
• Now the same middle point could be represented by (1,1,1,1).
• This means you can generalize better. Before - gradient descent
would only affect the middle point parameters. Now, since a point is
influenced by a combination of a few features - all of these features
parameters will be affected. Which also allows for faster learning.
What is Deep Q-Learning?

• Deep Q-Learning is a reinforcement learning technique that

combines Q-Learning, an algorithm for learning optimal actions in
an environment, with deep neural networks.

• It aims to enable agents to learn optimal actions in complex, high-

dimensional environments.

• By using a neural network to approximate the Q-function, which

estimates the expected cumulative reward for each action in a given
state, Deep Q-Learning can handle environments with large state
spaces.
What is Deep Q-Learning?
• The network is updated iteratively through episodes, using a
combination of exploration and exploitation strategies.

• However, care must be taken to mitigate instability caused by non-

stationarity and divergence issues, typically addressed by
experience replay and target networks.

• Deep Q-Learning has proven effective in training agents for various

tasks, including video games and robotic control.
Why ‘Deep’ Q-Learning?
• Q-learning is a simple yet quite powerful algorithm to create a cheat sheet for
our agent. This helps the agent figure out exactly which action to perform.
• But what if this cheat sheet is too long? Imagine an environment with 10,000
states and 1,000 actions per state. This would create a table of 10 million
cells. Things will quickly get out of control!
• It is pretty clear that we can’t infer the Q-value of new states from already
explored states. This presents two problems:
• First, the amount of memory required to save and update that table would
increase as the number of states increases
• Second, the amount of time required to explore each state to create the
required Q-table would be unrealistic
• Here’s a thought – what if we approximate these Q-values with machine
learning models such as a neural network? Well, this was the idea behind
DeepMind’s algorithm that led to its acquisition by Google for 500 million
dollars!
Deep Q-Networks

• In deep Q-learning, we use a neural network to approximate the

Q-value function.

• The state is given as the input and the Q-value of all possible
actions is generated as the output. The comparison between Q-
learning & deep Q-learning is wonderfully illustrated below:
Deep Q-Learning
• Observe that in the equation target = R(s,a,s’) +
the term gamma max_{a'}Q_{k}(s',a’) is a variable term.

• Therefore in this process, the target for the neural network is variable
unlike other typical Deep Learning processes where the target is stationary.
• This problem is overcome by having two neural networks instead of one.
One neural network is used to adjust the parameters of the network and the
other is used for computing the target and which has the same architecture
as the first network but has frozen parameters.
• After an x number of iterations in the primary network, the parameters are
copied to the target network.
Deep Q-Learning
Challenges in Deep RL as Compared to Deep
Learning
• So far, this all looks great. We understood how neural networks can
help the agent learn the best actions. However, there is a challenge
when we compare deep RL to deep learning (DL):
Challenges in Deep RL as Compared to Deep
Learning

• As you can see in the above code, the target is continuously

changing with each iteration. In deep learning, the target variable
does not change and hence the training is stable, which is just not
true for RL.
• To summarise, we often depend on the policy or value functions in
reinforcement learning to sample actions. However, this is
frequently changing as we continuously learn what to explore. As
we play out the game, we get to know more about the ground truth
values of states and actions and hence, the output is also
changing.
• So, we try to learn to map for a constantly changing input and
output. But then what is the solution?
1. Target Network
• Since the same network is calculating the predicted value and
the target value, there could be a lot of divergence between
these two. So, instead of using one neural network for learning,
we can use two.

• We could use a separate network to estimate the target. This

target network has the same architecture as the function
approximator but with frozen parameters. For every C iterations
(a hyperparameter), the parameters from the prediction network
are copied to the target network. This leads to more stable
training because it keeps the target function fixed (for a while):
2. Experience Replay

• To perform experience replay, we store the agent’s experiences –

Putting it all Together

• The concepts we have learned so far? They all combine to make the
deep Q-learning algorithm that was used to achieve human-level level
performance in Atari games (using just the video frames of the game).
Summary - Deep Q-Learning
• Deep Q-Learning is a type of reinforcement learning algorithm
that uses a deep neural network to approximate the Q-
function, which is used to determine the optimal action to take
in a given state. The Q-function represents the expected
cumulative reward of taking a certain action in a certain state
and following a certain policy. In Q-Learning, the Q-function is
updated iteratively as the agent interacts with the environment.
Deep Q-Learning is used in various applications such as game
playing, robotics and autonomous vehicles.
• Deep Q-Learning is a variant of Q-Learning that uses a deep
neural network to represent the Q-function, rather than a
simple table of values. This allows the algorithm to handle
environments with a large number of states and actions, as well
as to learn from high-dimensional inputs such as images or
Summary - Deep Q-Learning
• Experience replay is a technique where the agent stores a
subset of its experiences (state, action, reward, next state) in a
memory buffer and samples from this buffer to update the Q-
function. This helps to decorrelate the data and make the
learning process more stable.

• Target networks, on the other hand, are used to stabilize the Q-

function updates. In this technique, a separate network is used
to compute the target Q-values, which are then used to update
the Q-function network.

• Deep Q-Learning has been applied to a wide range of

problems, including game playing, robotics, and autonomous
Fitted Q-iteration
• Instead of learning a value function ( ), FQI learns an
approximation of the optimal action-value function ∗( , ) from
a batch of data.
• It is off-policy, model-free, and sample-efficient, making it ideal
for scenarios where data collection is expensive.
Fitted Q-iteration
Fitted Q-iteration
Intuition:
At each iteration,
• Use your current Q-function estimate to compute targets for
each sample in your batch.
• Train a function approximator to map from state-action pairs
(s,a) to those target values.
• Iterate until convergence.
Actor – Critic Methods
• Actor-Critic methods are a class of reinforcement
learning (RL) algorithms that combine both value-based
and policy-based approaches. They're super popular and
powerful — especially in continuous action spaces or
environments with large state spaces.
Actor – Critic Methods
Actor – Critic Methods
How it works?

At each step:
1. The actor picks an action “a” based on current policy π(s).
2.The environment returns reward r and next state s'.
3.The Critic estimates the advantage or TD error: δ=r + γV(s′) − V(s)
4.The Critic updates its value function V(s).
5.The Actor updates its policy to improve actions with positive advantage δ.

Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Manual Unity Pro PDF
0% (2)
Manual Unity Pro PDF
612 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Unit Iv Deep Q Learning
No ratings yet
Unit Iv Deep Q Learning
27 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
18-deeprl
No ratings yet
18-deeprl
19 pages
Deep Reinforcement Learning
100% (4)
Deep Reinforcement Learning
48 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
dqn-atari
No ratings yet
dqn-atari
26 pages
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
No ratings yet
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
10 pages
What is TD Learning
No ratings yet
What is TD Learning
15 pages
15
No ratings yet
15
17 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
1611.01606v1
No ratings yet
1611.01606v1
13 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
Chapter_1_Introduction_RL_Report_Kiran
No ratings yet
Chapter_1_Introduction_RL_Report_Kiran
2 pages
Deep Q-Network
No ratings yet
Deep Q-Network
15 pages
UNIT- 5
No ratings yet
UNIT- 5
43 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
2410.22766v1
No ratings yet
2410.22766v1
12 pages
RL UNIT V QA (1)
No ratings yet
RL UNIT V QA (1)
13 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
2.3+Value+Function+Approximation
No ratings yet
2.3+Value+Function+Approximation
55 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
No ratings yet
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
8 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Function Approximation RL Presentation
No ratings yet
Function Approximation RL Presentation
11 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
1 Related Works
No ratings yet
1 Related Works
2 pages
RLDL_PBL_AmriteshChandra_09411503121
No ratings yet
RLDL_PBL_AmriteshChandra_09411503121
15 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Human-Level Control Through Deep Reinforcement Learning
No ratings yet
Human-Level Control Through Deep Reinforcement Learning
13 pages
Nature 14236
No ratings yet
Nature 14236
13 pages
Q_Networks[1]-31-50
No ratings yet
Q_Networks[1]-31-50
20 pages
Deep Deformable Q-Network An Extension of Deep Q-Network
No ratings yet
Deep Deformable Q-Network An Extension of Deep Q-Network
4 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
MAS-Lab7-QFA
No ratings yet
MAS-Lab7-QFA
10 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
Hyperparameter Tuning For Deep Reinforcement Learning Applications
No ratings yet
Hyperparameter Tuning For Deep Reinforcement Learning Applications
12 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Untitled document
No ratings yet
Untitled document
11 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
Q Learning
No ratings yet
Q Learning
38 pages
Deep-Learning-book-part5
No ratings yet
Deep-Learning-book-part5
142 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
DDQN PDF
No ratings yet
DDQN PDF
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
RL Course Report
No ratings yet
RL Course Report
10 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Comparing Q Learning and Policy Gradient in Frozen Lake Environment (1)
No ratings yet
Comparing Q Learning and Policy Gradient in Frozen Lake Environment (1)
8 pages
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
No ratings yet
Comparing Q Learning and Policy Gradient in Frozen Lake Environment
8 pages
module 4
No ratings yet
module 4
32 pages
MODULE6 5 Learning With Options
No ratings yet
MODULE6 5 Learning With Options
19 pages
Module6 4 Options
No ratings yet
Module6 4 Options
17 pages
module1.2
No ratings yet
module1.2
14 pages
CH3_2 Montecarlo Control
No ratings yet
CH3_2 Montecarlo Control
33 pages
module 1
No ratings yet
module 1
98 pages
Mathworks Installation Help
No ratings yet
Mathworks Installation Help
60 pages
ai
No ratings yet
ai
4 pages
Parvatham Yakshitha Gowri - Resume (2)[1]
No ratings yet
Parvatham Yakshitha Gowri - Resume (2)[1]
3 pages
16-Optimization and Loss Functions in Classifiers, Convolution Layers, Max Pool Layers-24!08!2024
No ratings yet
16-Optimization and Loss Functions in Classifiers, Convolution Layers, Max Pool Layers-24!08!2024
36 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
22BCE7873 ASG7
No ratings yet
22BCE7873 ASG7
3 pages
22BCE7873 ASG9
No ratings yet
22BCE7873 ASG9
3 pages
1 Linear Algebra Basics 25-07-2024
No ratings yet
1 Linear Algebra Basics 25-07-2024
30 pages
WINSEM2024-25_STS4006_TH_AP2024254001070_2025-03-01_Reference-Material-I
No ratings yet
WINSEM2024-25_STS4006_TH_AP2024254001070_2025-03-01_Reference-Material-I
14 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
DSA Time Complexity Table (1)
No ratings yet
DSA Time Complexity Table (1)
1 page
Module-5
No ratings yet
Module-5
37 pages
Configuração Casa OSUmbrel
No ratings yet
Configuração Casa OSUmbrel
5 pages
Open Mind Pre-Intermediate Unit 6 Skills Test
0% (1)
Open Mind Pre-Intermediate Unit 6 Skills Test
2 pages
Chapter 4 - Introduction To Java Programming Language
No ratings yet
Chapter 4 - Introduction To Java Programming Language
4 pages
Drop Box
No ratings yet
Drop Box
3 pages
BCSE497J Project I Report (Review-I).docx (1)
No ratings yet
BCSE497J Project I Report (Review-I).docx (1)
24 pages
2010 HSC Multimedia MG
No ratings yet
2010 HSC Multimedia MG
5 pages
Operating Instructions MLS
No ratings yet
Operating Instructions MLS
60 pages
DPS Nerul
No ratings yet
DPS Nerul
2 pages
Master's Programme in Electronics Engineering, 120 Credits: Purpose
No ratings yet
Master's Programme in Electronics Engineering, 120 Credits: Purpose
7 pages
Moshell Commands
75% (12)
Moshell Commands
76 pages
A Systematic Literature Review of Deep Learning Approaches For Sketch-Based Image Retrieval Datasets Metrics and Future Directions
No ratings yet
A Systematic Literature Review of Deep Learning Approaches For Sketch-Based Image Retrieval Datasets Metrics and Future Directions
23 pages
ELRS/SPEC/MPC-FDS/0001 Rev. 3' XXX 2013: Page 1 of XX Issued On: XXX 2013 Spec. No.
No ratings yet
ELRS/SPEC/MPC-FDS/0001 Rev. 3' XXX 2013: Page 1 of XX Issued On: XXX 2013 Spec. No.
40 pages
The Shift Register
No ratings yet
The Shift Register
9 pages
Sublimation Printable White Heat Transfer Vinyl
No ratings yet
Sublimation Printable White Heat Transfer Vinyl
25 pages
A Modified Very Fast Simulated Annealing Algorithm: Mohammad-Taghi Vakil-Baghmisheh, Alireza Navarbaf
No ratings yet
A Modified Very Fast Simulated Annealing Algorithm: Mohammad-Taghi Vakil-Baghmisheh, Alireza Navarbaf
6 pages
Driving Secure Employee Behaviors
No ratings yet
Driving Secure Employee Behaviors
15 pages
Gamma Scalping Using Neural Network With
No ratings yet
Gamma Scalping Using Neural Network With
5 pages
5000 HQ
No ratings yet
5000 HQ
12 pages
Req'ts Modeling (AnalysisPhase) PDF
No ratings yet
Req'ts Modeling (AnalysisPhase) PDF
3 pages
Grab The Fab Deal Offer: 1 February'21 To 31 March'21
No ratings yet
Grab The Fab Deal Offer: 1 February'21 To 31 March'21
10 pages
Cyberphysical Systems Modelling And Industrial Application Alla G Kravets download
No ratings yet
Cyberphysical Systems Modelling And Industrial Application Alla G Kravets download
72 pages
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
No ratings yet
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
11 pages
Chap 4 FACTORS AND POLYNOMIALS
No ratings yet
Chap 4 FACTORS AND POLYNOMIALS
18 pages
Service_Manual_UR20_UR30_en
No ratings yet
Service_Manual_UR20_UR30_en
73 pages
Lightweight Cryptography For Resource
No ratings yet
Lightweight Cryptography For Resource
146 pages
Senarai Isi Kandungan Koleksi Video Dan Audio Maulana Asri Yusof
No ratings yet
Senarai Isi Kandungan Koleksi Video Dan Audio Maulana Asri Yusof
697 pages
Heavy Duty Precision Milling Machine: Model PM-833TV
No ratings yet
Heavy Duty Precision Milling Machine: Model PM-833TV
35 pages
Dummy Variable Regression Models: Dichotomous Variables)
No ratings yet
Dummy Variable Regression Models: Dichotomous Variables)
9 pages
C
No ratings yet
C
11 pages

CH5_Function Approximation (1)

Uploaded by

CH5_Function Approximation (1)

Uploaded by

Function Approximation

Dr. D. John Pradeep

where w is a vector of weights, ( ) is a feature vector for state

• Instead of storing V/Q values, update θ parameters such that

• Deep Q-Learning is a reinforcement learning technique that

• It aims to enable agents to learn optimal actions in complex, high-

• By using a neural network to approximate the Q-function, which

• However, care must be taken to mitigate instability caused by non-

• Deep Q-Learning has proven effective in training agents for various

• In deep Q-learning, we use a neural network to approximate the

• As you can see in the above code, the target is continuously

• We could use a separate network to estimate the target. This

• To perform experience replay, we store the agent’s experiences –

• Target networks, on the other hand, are used to stabilize the Q-

• Deep Q-Learning has been applied to a wide range of

You might also like