0% found this document useful (0 votes)
2 views

Reinforcement Learning

The document provides a comprehensive study on the application of reinforcement learning (RL) in social robots, highlighting their key features such as human-like appearance and emotional intelligence. It discusses various RL algorithms like Q-learning, SARSA, DQN, and DDQN, comparing their performance in terms of accuracy and reward in different virtual environments. The study concludes that DDQN outperforms other algorithms with the highest accuracy and reward accumulation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Reinforcement Learning

The document provides a comprehensive study on the application of reinforcement learning (RL) in social robots, highlighting their key features such as human-like appearance and emotional intelligence. It discusses various RL algorithms like Q-learning, SARSA, DQN, and DDQN, comparing their performance in terms of accuracy and reward in different virtual environments. The study concludes that DDQN outperforms other algorithms with the highest accuracy and reward accumulation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Reinforcement Learning in Social

Robots : A Comprehensive Study

Mentored By : Presented By:


Dr. Manju Bhardwaj Aaditri Saraswat
Dr. Shweta Sankhwar Bhavya
Singh
Social Robots

A social robot is a type of robot designed


to interact and communicate with humans
in a social way. Some key features and
characteristics of social robots include :

Human-like appearance

Emotional intelligence

Natural language processing (NLP)

Social skills
Reinforcement Learning

● RL is an area of machine learning which teaches an agent


to learn through “trial and error” methodology.
● The agent makes decisions in a given environment by
maximising a reward signal
● The robot learns by interacting with its environment by
observing the outcomes, and receiving the reward
(positive or negative).
Markov Decision Process

The agent interacts with an environment,


which changes its state and yields a reward
for the action. Then, another round begins.
Mathematically, this cycle is based on the
Markov Decision Process (MDP). Such an
MDP consists of five components: S, A, R, P,
and p₀.
Fig 1
Markov decision
process
Reinforcement
learning model
Objectives

➢ Identification of types of reinforcement learning algorithms and


mechanisms used in social robot navigation

➢ Comparison of reinforcement learning algorithms in social robot

navigation
Workflow

Creating Designing Setting a Training Testing the Evaluating


Environme a mobile reward the agent learned the
nt using robot/ function in policy in performanc
Gazebo agent Environme both e based on
Simulator using ROS nt 1 for Environmen total reward
different t 1 and and number
Reinforce Environmen of
ment t 2 for successful
Learning different episodes for
Algorithms reinforceme each
nt learning reinforceme
algorithms
Virtual Environments

Virtual environment is orientation of different obstacles.

Environment 1 Environment 2
(Environment 1 Yields Training (Environment 2 Yields Test
Accuracy ) Accuracy )
Algorithms ❏ Q - Learning
❏ SARSA
Used ❏ DQN ( Deeo Q -
Network)
❏ DDQN (Double Deep
Q - Network )
Comparative Analysis
Algorithms Accuracy Reward Action State

Q - learning Q-learning is model-free and Q-learning relies on Q-learning directly


can be accurate if explicit rewards and updates Q-values
the state-action space is learns based
well-defined and discretized. optimal Q-values on observed rewards
through exploration. It and the maximum Q-
may struggle value of the
with complex next state.
continuous state
spaces.

SARSA SARSA is an on-policy Similar to Q-learning, SARSA updates Q-


algorithm that updates SARSA depends on values based on the
Q-value based on the actual explicit observed reward, the
action taken and rewards and next state's action,
the next state's action. exploration to learn and the next state's Q-
optimal Q-values value.
Comparative Analysis Contd.
Algorithms Accuracy Reward Action State

DQN DQN utilizes a deep neural The reward mechanism DQN selects actions
network to approximate depends on the specific based on the highest
Q-values, which can lead to more task predicted Q-value from
accurate estimations and environment. DQN the neural network
compared to traditional Q- can learn to optimize output.
learning. rewards
effectively, but careful
reward design is crucial
for good performance.

DDQN DDQN addresses overestimation Similar to DQN, DDQN's DDQN selects actions
bias present in reward handling based on the highest
DQN by using a separate target depends predicted Q-value from
network for action on proper reward the target network's
selection, potentially leading to design for the task. output.
more accurate
Q-value estimations.
Results

S.No Algorithms Training Accuracy Test Accur

1 Q - Learning 68 67

2 SARSA 77 75

3 DQN 87 85

4 DDQN 92 91
Conclusion

In this study we evaluated the performance of different reinforcement learning algorithms


based on their reward function

and total number of successful episodes. The results are as follows:

1. DDQN achieved highest accuracy of 92% and 91% in Env1 and Env2 respectively.

2. Comparatively DQN achieved accuracy of 87% and 85% in Env1 and Env2 respectively
followed by SARSA and Q- learning.

3. DDQN accumulated highest reward for same number of action states both Env1 and
Env2.
References
1. Akalin, N., & Loutfi, A. (2021). Reinforcement learning approaches in social robotics. Sensors, 21(4), 1292. https://doi.org/10.3390/s21041292

2. Kober, J., Bagnell, J. A., & Peters. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.
https://doi.org/10.1177/027836491349572

3. Thomaz, A.; Hoffman, G.; Cakmak, M. Computational human-robot interaction. Found. Trends Robot. 2016, 4, 105–223. [Google Scholar]

4. Isbell, C.; Shelton, C.R.; Kearns, M.; Singh, S.; Stone, P. A social reinforcement learning agent. In Proceedings of the Fifth International Conference on Autonomous
Agents, (AGENTS ’01), Montreal, QC, Canada, 28 May–1 June 2001; pp. 377–384

5.Suay, H.B.; Chernova, S. Effect of human guidance and state space size on interactive reinforcement learning. In Proceedings of the 20th IEEE International Workshop
on Robot and Human Communication (RO-MAN 2011), Atlanta, GA, USA, 31 July–3 August 2011; pp. 1–6.

6. Thomaz, A.L.; Hoffman, G.; Breazeal, C. Reinforcement learning with human teachers: Understanding how people want to teach robots. In Proceedings of the
15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2006), Hatfield, UK, 6–8 September 2006; pp. 352–357.

7. Thomaz, A.L.; Breazeal, C. Experiments in socially guided exploration: Lessons learned in building robots that learn with and without human teachers. Connect.
Sci. 2008, 20, 91–110. [CrossRef]

8. Knox, W.B.; Stone, P.; Breazeal, C. Training a Robot via Human Feedback: A Case Study. In Social Robotics; Herrmann, G.,Pearson, M.J., Lenz, A., Bremner, P.,
Spiers, A., Leonards, U., Eds.; Springer International Publishing: Cham, Switzerland, 2013;pp. 460–470.

You might also like