DDPG(Deep Deterministic Policy Gradient)
DDPG(Deep Deterministic Policy Gradient)
Osama Javid
What is Deep
deterministic policy
gradient
Deep: Refers to the use of deep neural networks.
These are powerful tools used in artificial intelligence
that can learn complex patterns and relationships from
data.
What is Deep
deterministic policy
gradient
Deterministic: This means that the algorithm tries to
directly predict the best action to take in a given
situation, rather than just guessing or exploring randomly.
It aims to be more precise and intentional in its decision-
making.
What is Deep
deterministic policy
gradient
Policy Gradient: directly optimizing the policy by some
parameter θ.
e.g. Self Driving a car (to receive only positive rewards
when we avoid hitting another car)
Critic Network
02 Q(s,a:θ**Q)
Target Network
03 μ(s:θ**μ/)
Q(s,a:θ**Q/)
M is number of smaples RB
Update our critic’s weight with
gradients calculated from this
L
UPDATING POLICY
GRADIENT
• We update our policy network weights using a
policy gradient,
• Then Update the weights of Actor and Critic
Network in the target network
• Soft replacement: update the θ slowly for
stability
REAL WORLD
EXAMPLES
• Tesla auto pilot
• Finance trading
• Game optimizations
• Health Care
THANK
YOU