Application of Reinforcement Learning To A Two Dof Robot Arm Control
Application of Reinforcement Learning To A Two Dof Robot Arm Control
Annals of DAAAM for 2009 & Proceedings of the 20th International DAAAM Symposium, Volume 20, No. 1, ISSN 1726-9679
ISBN 978-3-901509-70-4, Editor B. Katalinic, Published by DAAAM International, Vienna, Austria, EU, 2009
Make Harmony Between Technology and Nature, and Your Mind will Fly Free as a Bird
Annals of DAAAM International
Abstract: Automatic manipulators control poses a complex because the inertia of the upper arm changes due to a variable
challenge for control systems, which have to deal with various angle (Denzinger & Laureyns, 2008).
dynamic and nonlinear effects. This paper presents a novel
approach for motion control of a two DOF robot arm based on
reinforcement learning. A new method to reduce the high
computational efforts, which come along with this method, is
presented. In order to accelerate the convergence of the
learning process, a fuzzy logic system is integrated in the
reward function. For further optimization of the implemented
algorithm a library of already learned motions is created. The
experimental result shows a significant improvement of
learning efficiency.
Key words: reinforcement learning, robotics, computational
intelligence
One of the biggest challenges in current research in robotics Since the redundancy of the inverse kinematic, the angle set
is, that robots “leave” their well structured environment and are which is closer to the target state will be selected as end
confronted with new tasks in a more complex environment. position. Because our work is focused on how to reach the goal,
Due to this, it can only be successful resp. useful, when it is the motion accuracy of intermediate states is not interested thus
able to adapt itself and learn from experiences. Reinforcement the interval of a state close to the target could be much smaller
Learning (RL), a branch of machine learning (Mitchell & Tom, than of a state far from it. A relative position is adopted to
1997), is one possible solution. RL is a learning process, which normalize the distribution of state space, whose values are
uses reward and punishment from the interaction with always changed depending on the target position. Moreover, a
environment to learn a policy for achieving tasks. Various RL fuzzy-logic system is integrated in the reward function for
methods e.g. Q-learning (Watkins, 1989) have been studied in evaluating the executed actions and a coordinate transformation
the recent decades and it is shown that two problems must be is applied to represent different tasks with one Q-table in
considered. At first, the high computational efforts: RL is library.
disturbed by the “curse of dimensionality” (Bellman, 1957),
which refers to the tendency of a state space to grow 2.2 Unevenly distribution of state space
exponentially in its dimension, that is, in the number of state The state space of robot arm consists of its angles and angle
variables (Sutton & Barto, 1998). Secondly, a Q-table is created velocities. A Q-table in this case is a 6-dimensional hyperspace,
for one specific task. It requires an extreme large space to store here means angle, velocity and action:
policies for all possible tasks, which restricts the practical
(1)
application of this learning method strongly.
In (Martin & De Lope, 2007), the author presents a According to the approach of (Martin & De Lope, 2007),
distributed architecture in RL to solve the first problem, which we can decompose the Q-table in and
uses some small, low dimensional Q-tables instead of a global . With the help of the fixed target position,
high-dimensional Q-table with the evaluations of actions for all each position can be calculated as a constant value plus a
states. In this paper, another ways based on an optimized state difference as shown in the follow equations:
space representation are proposed for improving the learning
ability of RL
(2)
.
2. APPROACH If the state space is not built according to but to , the
2.1 Overview target position can always be located at . In
The schematic of robot arm with two degree of freedom is this case, the state space can be easily divided with different
illustrated in Fig. 1. The system is dynamic and nonlinear intervals.
0416