0% found this document useful (0 votes)
0 views16 pages

DDPG(Deep Deterministic Policy Gradient)

Deep Deterministic Policy Gradient (DDPG) is a reinforcement learning algorithm that utilizes deep neural networks to predict optimal actions and maximize rewards through a policy gradient approach. It employs an actor-critic architecture with two networks: the actor that determines the best action and the critic that evaluates the action using the state action-value function. DDPG is applicable in various real-world scenarios such as autonomous driving, finance trading, and healthcare.

Uploaded by

Muhammad Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views16 pages

DDPG(Deep Deterministic Policy Gradient)

Deep Deterministic Policy Gradient (DDPG) is a reinforcement learning algorithm that utilizes deep neural networks to predict optimal actions and maximize rewards through a policy gradient approach. It employs an actor-critic architecture with two networks: the actor that determines the best action and the critic that evaluates the action using the state action-value function. DDPG is applicable in various real-world scenarios such as autonomous driving, finance trading, and healthcare.

Uploaded by

Muhammad Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DDPG

Osama Javid
What is Deep
deterministic policy
gradient
Deep: Refers to the use of deep neural networks.
These are powerful tools used in artificial intelligence
that can learn complex patterns and relationships from
data.
What is Deep
deterministic policy
gradient
Deterministic: This means that the algorithm tries to
directly predict the best action to take in a given
situation, rather than just guessing or exploring randomly.
It aims to be more precise and intentional in its decision-
making.
What is Deep
deterministic policy
gradient
Policy Gradient: directly optimizing the policy by some
parameter θ.
e.g. Self Driving a car (to receive only positive rewards
when we avoid hitting another car)

Updating Model θ in a way that maximizes the reward


What is Deep
deterministic policy
gradient
DDPG is an algorithm used in reinforcement learning
where
• A deep Neural Network is trained
• Predict the best action to take
• Directly estimates the Optimal Policy (π)
• Adjust Parameters (θ) for max rewards
ACTOR-CRITIC ARCH WITH TWO
NETWORKS
• Takes the state action-value function (Q), and combines it with
policy gradient
• Actor Determines the best action in the state by turning the
parameter θ
• Critic evaluate the action produced by the actor
• The critic evaluates the action using the TD error
ACTOR-CRITIC ARCH WITH TWO
NETWORKS
DDPG - ALGO
TWO NETWORKS ACTOR AND CRITIC

Actor Network represented by


01
μ(s:θ**μ)
1.Takes input and results in
action
2.θ is Actor Network weights
DDPG - ALGO
TWO NETWORKS ACTOR AND CRITIC

Critic Network
02 Q(s,a:θ**Q)

1.takes an input as a state


and action and results in the
Q value
2.θ is the weight
DDPG - ALGO
TWO NETWORKS ACTOR AND CRITIC

Target Network
03 μ(s:θ**μ/)
Q(s,a:θ**Q/)

The θ is the weights of the


target Actor and Critic
NEXT STEPS

Update Actor weights with Policy gradients


01

Update Critic network weight with


02 gradients calculated from the TD error
NEXT STEPS

Select action by adding exploration noise


03 N to the action produced by the actor-
network μ(s;θ**μ) + N

1.Perform an action in a state s,


04 2.Recieve a reward r,
3.move to next step

Store this transition information in a replay


buffer
FINAL STEPS
After some Iterations

01 Sample Transitions 02 Train Network


from the replay buffer

Calculate the Q value Compute TD error as:


03 04

M is number of smaples RB
Update our critic’s weight with
gradients calculated from this
L
UPDATING POLICY
GRADIENT
• We update our policy network weights using a
policy gradient,
• Then Update the weights of Actor and Critic
Network in the target network
• Soft replacement: update the θ slowly for
stability
REAL WORLD
EXAMPLES
• Tesla auto pilot
• Finance trading
• Game optimizations
• Health Care
THANK
YOU

You might also like