0% found this document useful (0 votes)
57 views8 pages

Humanoids2020 A Framework For Autonomous Impedance Regulation of Robots Based On Imitation Learning and Optimal Control

This document proposes a framework to autonomously regulate robot impedance during constrained manipulation tasks based on imitation learning and optimal control. The framework extracts task stiffness geometry from human demonstrations using a human arm stiffness model. It then encodes this offline using Gaussian mixture models and reproduces it online using Gaussian mixture regression. Additionally, an optimal control problem is formulated using linear quadratic regulation to determine the full Cartesian impedance, considering the extracted task stiffness as a time-varying weighting matrix. Comparative experiments on a door opening task show this framework achieves better tracking and interaction forces than alternatives without the optimal control or imitation learning components.

Uploaded by

이재봉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views8 pages

Humanoids2020 A Framework For Autonomous Impedance Regulation of Robots Based On Imitation Learning and Optimal Control

This document proposes a framework to autonomously regulate robot impedance during constrained manipulation tasks based on imitation learning and optimal control. The framework extracts task stiffness geometry from human demonstrations using a human arm stiffness model. It then encodes this offline using Gaussian mixture models and reproduces it online using Gaussian mixture regression. Additionally, an optimal control problem is formulated using linear quadratic regulation to determine the full Cartesian impedance, considering the extracted task stiffness as a time-varying weighting matrix. Comparative experiments on a door opening task show this framework achieves better tracking and interaction forces than alternatives without the optimal control or imitation learning components.

Uploaded by

이재봉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

A Framework for Autonomous Impedance Regulation of Robots


Based on Imitation Learning and Optimal Control
Yuqiang Wu, Fei Zhao, Tao Tao, and Arash Ajoudani

Abstract—In this work, we propose a framework to address


the autonomous impedance regulation problem of robots in a Human Robot
SoftHand
class of constrained manipulation tasks. In this framework, a demonstration execution
human arm endpoint stiffness model is used to extract the 𝑍
task stiffness geometry along the constrained trajectory, which 𝑋 𝑌
is then encoded offline and reproduced online by a Gaussian 𝚺𝑺
Mixture Model (GMM) and the Gaussian Mixture Regression
(GMR), respectively. Furthermore, the full Cartesian impedance
of the robot is formulated through an optimal control problem,
i.e., the Linear-Quadratic Regulator (LQR), in which the task
stiffness geometry (extracted from human demonstrations) is
considered as the time-varying weighting matrix Q. The optimal
impedance is eventually realised by the robot through a task 3D motion
geometry consistent Cartesian impedance controller. A tank- tracking device Franka
based passivity observer is implemented to give evidence on Emika Panda 𝑍
𝑋 𝚺𝑹
the stability of the system during online impedance variations. 𝑌
To evaluate the performance of the framework, a comparative
experiment with three different impedance settings (i.e., the Fig. 1: The task stiffness geometry is extracted from human obser-
proposed framework, the framework without LQR and the vations (left), and encoded and reproduced by the robot (right) using
framework without GMM/GMR) for Franka Emika Panda to the proposed autonomous control framework.
perform a door opening task was conducted. The results reveal
that our framework outperforms the other two, in terms of the adaptation of force and impedance was proposed in [5],
tracking error and the interaction forces.
leading to a similar adaptation behavior observed in human
experiments. The online human arm impedance estimation
I. I NTRODUCTION techniques in 3D was first proposed under the concept of
In the last decades, Cartesian impedance control techniques teleimpedance control [6], and then extended to the entire arm
have been developed for the new generation of torque con- workspace through a reduced-complexity representation in [7].
trolled light-weight robots to guarantee safe interaction with An extension of this work to enable transferring of whole-
humans and their surroundings [1]. In such control techniques, body human motor behaviours to teleoperated robots was pre-
the impedance parameters regulate the relationship between sented in [8], where a MObile Collaborative robotic Assistant
the trajectory tracking errors and the interaction forces, playing (MOCA) was operated to execute complex loco-manipulation
a crucial role in the interaction performance of the robots for tasks. Although teleimpedance control has shown a high
a given task. Nevertheless, for tasks that require continuously potential in the execution of remote interactive tasks, in most
varying interaction patterns, the adjustment of the impedance applications, an autonomous approach is preferred.
parameters remains a difficult issue, also considering the To achieve autonomous interaction control of robots, the
dynamic uncertainties [2], [3]. stiffness profile can be encoded offline and reproduced online.
Inspired by the human manipulation capability, researchers An elegant and efficient way to implement this is through
have been trying to understand how human beings regulate imitation learning. In this direction, Dynamic Movement Prim-
their arm impedance in performing interactive tasks. In this itives (DMPs) [9], which consists of a canonical system, a
direction, authors in [4] proposed a 2D stiffness model of transformation system, and a nonlinear function estimator is
the human arm by assuming that the arm endpoint stiffness widely used to encode complex behaviors in human motor
is linearly related to the magnitude of the joint torques. control and robotics. However, due to the dependency of the
Relying on this assumption, an arm control model including canonical system on time or phase input in this framework,
it can hardly deal with multi-dimensional input data. Another
This work has been carried out in HRI2 Lab, Istituto Italiano di Tecnologia, popular movement representation technique is through Gaus-
Genoa, Italy. This work was supported in part by the ERC starting grant Ergo-
Lean (Grant Agreement No.850932), in part by the National Natural Science sian Mixture Regression (GMR) [10]. Firstly, it models the
Foundation of China under Grant 91748208, and in part by the Department of joint probability density of the input and output data in the
Science and Technology of Shaanxi Province under Grant 2017ZDL-G-3-1. form of a Gaussian Mixture Model (GMM), which can be
In this work, Y. Wu was also supported by the China Scholarship Council.
Yuqiang Wu, Fei Zhao and Tao Tao are with the State Key Laboratory estimated by an Expectation-Maximization (EM) procedure.
for Manufacturing System Engineering, Xi’an Jiaotong University, Shaanxi Furthermore, the regression function is derived by computing
Key Laboratory of Intelligent Robots and School of Mechanical Engineering, the conditional probability distribution of GMM. It directly
Xi’an Jiaotong University, Xi’an Shaanxi 710049, China; Yuqiang Wu is also
with the HRI2 Lab of IIT; Arash Ajoudani is with the HRI2 Lab of IIT. maps the input to output and both can be multi-dimensional.
[email protected] In addition, GMR is a probabilistic model, which can not
2

Human Arm Offline Training


CDS
IMITATION Human Stiffness
Model GMM
LEARNING Demos POSE

Model Parameters

OPTIMAL Environment CDS


LQR GMR
CONTROL Dynamics

Full Impedance Online Reproduction

Finite State Trajectory


𝑥𝑑 Variable 𝜏 Franka 𝑥
Impedance Environment
Machine Planner Robot ·
Controller
𝑥

Fig. 2: The architecture of the proposed framework. The full impedance is regulated by an imitation learning algorithm (GMM/GMR) and
an optimal control method (LQR), which is then realised by the robot Cartesian impedance controller.

only predict the mean value but also covariance of the out- Improvement with Path Integrals (PI2 ) algorithm is proposed
put variables. In [11], the robot’s movement was encoded to accomplish variable impedance control for a given task. In
and reproduced by GMM/GMR. They consider the predicted [17], an approach called OutPut FeedBack Adaptive Dynamic
covariance as uncertainty in the environment and then use Programming (OPFB ADP) is proposed to solve the problem
its inverse as the stiffness in Cartesian impedance control by only using measured output data, considering that in
of robots. The same idea but with a minimal intervention most robot-environment interaction applications the dynamic
control framework was established in [12]. However, the processes are partially observable. A detailed discussion of the
inverse of the covariance embedded in multiple movement progress in RL for variable impedance control is beyond the
demonstrations doesn’t correspond to the physical meaning of scope of this article.
stiffness, and this framework will not work in the constrained In this work, we propose a framework that merges the
motion scenarios, such as door opening and valve turning. advantages of imitation learning and optimal control to achieve
Instead of extracting stiffness information from variability an adaptive and optimal interaction performance with a gen-
in multiple demonstrations, there are some works about es- eralization potential. Our aim is to transfer task stiffness
timating it based on interaction forces and tracking errors, geometry, provided by human demonstrations, to robots in
followed by imitation learning algorithms to reproduce the an autonomous and optimal way. The door opening has been
learned stiffness profiles online. The authors in [13] proposed chosen as a representative example of tasks with constrained
an approach to collaboratively assemble a wooden table by a and continuously interactive motion as shown in Fig. 1. Due
human subject and a robot. By taking into account the sensed to a potential instability of the system acting under variable
force information, a Weighted Least-Squares (WLS) method is impedance, we developed a tank-based passivity observer to
employed to estimate the stiffness. In [14], the authors used a monitor the stability of the system. In addition, to verify the
similar framework but adopted a sliding window technique to synergistic contribution of the imitation learning and optimal
carry out local stiffness estimation instead of WLS. However, control in this framework, two additional impedance settings
in this way, the position and force data are usually demon- were tested: one used GMR to reproduce the varying task
strated via kinesthetic teaching, which may cause discontinuity stiffness geometry but with a constant volume coefficient
in a continuous interaction process. It is not very reliable instead of LQR, the other one was the opposite, i.e., a constant
also due to the fact that the first estimation result of stiffness task stiffness geometry with LQR. The performances of the
usually does not comply with Symmetric Positive-Definite controllers under different impedance settings were compared
(SPD) constraints. A more direct way to teach robot compliant in terms of stiffness profiles, tracking error, and interaction
manipulation is through human-robot interaction proposed forces.
in [3]. However, for a task requiring continuously varying
impedance, this way may bring inconvenience to the teacher. II. F RAMEWORK
Some works consider impedance regulation as an optimal The overall framework is depicted in Fig. 2. Inspired by
control problem [15], [16]. In this field, most of the work the imitation learning approach, we attempt to learn the task
are based on Linear Quadratic Regulator (LQR) formulation, stiffness geometry from human demonstrations. In contrast
however, the choices of Q and R matrices play an important with the work in [12]–[14], we propose to utilize physically
role which are usually hard to design for individual tasks. In meaningful stiffness information, which is extracted directly
most cases, Q and R are fixed which results in a hard-coded from a human arm endpoint stiffness model. In this model, the
behavior. To achieve an online adaptation, adaptive optimal arm endpoint stiffness is decomposed into two components,
control or Reinforcement Learning (RL) can be employed. An i.e., the Configuration Dependent Stiffness (CDS) and the
example of this approach is presented in [2], where a Policy Common Mode Stiffness (CMS), as proposed in [18], [19].
3

by the ellipsoid’s principal axes and their lengths, namely the


eigenvectors and eigenvalues of the stiffness matrix, based on
the arm configuration.
As shown in Fig. 3, we use a two-segment human arm
skeleton structure in the 3D space. The hand-forearm and
the upper arm segments in this model permit us to form an
arm triangle at any non-singular configuration. Relying on the
dominant contribution of the arm configuration to the endpoint
stiffness geometry, we propose to use the vector from the
Fig. 3: The geometric relationship between the human arm configu-
centre of shoulder joint to the position of the hand (~l ∈ R3 ),
ration and the principal axes of the endpoint stiffness ellipsoid. n~ is to identify the major principal direction of the human arm
perpendicular to the plane formed by ~l and ~r . ~l and n
~ are parallel to endpoint stiffness ellipsoid. The minor principal axis direction
the major and minor principal axes of the endpoint stiffness ellipsoid, n ∈ R3 ), instead, is defined to be perpendicular to the arm
(~
respectively. triangle plane
Under the assumption that human beings adapt their arm ~ = ~r × ~l,
n (1)
endpoint stiffness ellipsoid’s geometry (represented by CDS)
where ~r ∈ R3 represents the vector from the centre of
and volume (represented by CMS) to match the interaction
shoulder to the centre of elbow. The remaining principal
requirements of a task, we propose to only encode human
axis of the stiffness ellipsoid, which lies on the arm triangle
demonstrated CDS by GMM offline (pose as input while CDS
plane, is calculated based on the orthogonality of the three
as output in the model) and reproduce it online by GMR. The
principal axes. Under the assumption that the ratio of the
CMS component, which regulates the volume of the endpoint
length of median principal axis to the major principal axis of
stiffness ellipsoid, is not observed from human demonstrations
the stiffness ellipsoid is inversely proportional to the distance
since it can depend on task specifics. For instance, in door
d1 ∈ R, from the hand position to the centre of shoulder, while
opening or valve turning tasks, the CMS would depend on the
the ratio of the length of the minor principal axis to the major
friction properties or the elastic response of a spring-loaded
principal axis is proportional to the distance d2 ∈ R, from the
door or a valve. Instead, to construct the complete impedance
centre of the elbow to the major principal axis, the estimated
(i.e. the full stiffness and the damping term), the LQR is
endpoint stiffness matrix K̂ c ∈ R3×3 is formulated by
adopted to take CDS as the weighting matrix Q in an optimal
control problem to match different environment dynamics. K̂ c = V Acc D s V T . (2)
Eventually, the optimal impedance profile is implemented
by a task geometry consistent Cartesian variable impedance Here, Acc ∈ R is the co-contraction activation index of human
controller, whose desired pose is provided by a trajectory arm muscles corresponding to CMS, while V D s V T ∈ R3×3
planner and a Finite State Machine (FSM). In the following corresponding to CDS, and V ∈ R3×3 and D s ∈ R3×3
sections, each part of the overall framework will be explained are respectively eigenvectors representing the orientation and
in more detail. eigenvalues representing the shape of the stiffness ellipsoid
and formulated by:
" #
A. Learning CDS based on GMM/GMR ~l (~r × ~l) × ~l ~r × ~l
V = , , , (3)
Imitation learning has been widely used in modeling of the k ~l k k (~r × ~l) × ~l k k ~r × ~l k
task kinematics, while there are very few works about learning
the dynamics. The barrier is the difficulty to extract physically D s = Diag(1, α1 /d1 , α2 d2 ), (4)
meaningful dynamic parameters from human demonstrations. where α1 ∈ R and α2 ∈ R are subject related parameters
To exploit the potential of imitation learning in the execution in the model. For the detailed explanation and further model
of physical interaction tasks in this work, a human arm end- parameter identification, please refer to [19].
point stiffness model is first introduced. Next, the GMM/GMR According to the introduced model, to extract CDS infor-
is applied to encode and reproduce the task stiffness geometry mation from human demonstrations for a given task, only
through the CDS. positional information of the shoulder, elbow and hand is
1) Human arm stiffness model and demonstrations: In [19], necessary which is quite easy to get. The CDS information
we propose a principally simplified, and intuitive online model can be extracted followed by a post-processing of the collected
of the human arm endpoint stiffness. The new model is based 3D position data according to the proposed model.
on the large dependency of the shape and orientation of the 2) Encoding CDS profile by GMM/GMR: In this part, we
stiffness ellipsoid on the arm configuration. In fact, previous first give a recap of GMM/GMR algorithm and then introduce
studies [20], [21] indicate that i) the major principal axis of how to apply it to encode the CDS profiles. The Gaussian
the arm endpoint stiffness ellipse passes close to the hand- mixture distribution can be written as a linear superposition
shoulder vector, and ii) when the arm is extended and the of Gaussians in the form
hand moves further from the shoulder, the ellipse becomes K
more elongated and conversely, it becomes more isotropic. p(ξ) =
X
πk N (ξ|µk , Σk ), (5)
Inspired by these findings, the new model in 3D is constructed k=1
4

where ξ ∈ Rd and p(ξ) ∈ R, respectively, represent the vector CDS into the product of a lower triangular matrix and its
of variables and joint probability distribution. πk ∈ R, µk ∈ transpose, and then we arrange its elements in a vector form.
Rd and Σk ∈ Rd×d represent the prior probability, mean and Next, the offline training phase takes the compact pose and
covariance of the k-th Gaussian component respectively, while the CDS vectors as the vector of variables ξ. Subsequently,
K ∈ R is the total number of Gaussian components. the GMR reproduces the CDS vector online by taking robot
The goal of the offline training phase is to maximize the end-effector pose as input, followed by a reverse process of
log likelihood function Eq. (6) with respect to the model Cholesky decomposition to construct CDS matrix whose SPD
parameters (πk , µk and Σk ), which results in an EM procedure property is guaranteed.
(E-step in (7) and M-step in (8) and (9)) to iteratively update
the model parameters until convergence B. Constructing full impedance through optimal control
N K
For the control of robots, we need to derive the full stiffness,
X X 
ln p([ξ 1 , ξ 2 , . . . , ξ N ]) = ln πk N (ξ n |µk , Σk ) , and eventually the full impedance from CDS information. To
n=1 k=1
(6) take environment dynamics into account and construct the
full impedance from an optimal point of view, [12] proposed
πk N (ξ n |µk , Σk ) to formulate it as a LQR problem in which the predicted
γn,k = PK , (7)
covariance extracted from multiple movement demonstrations
j=1 πj N (ξ n |µj , Σj )
was adopted as the weighting matrix Q. This part follows
PN PN the same idea with the difference that instead of using the
γn,k γn,k ξ n predicted covariance, we are using the physically meaningful
πknew = n=1
, µnew
k
n=1
= PN
, (8)
N n=1 γn,k CDS formulated by the human arm endpoint stiffness model.
PN In the following, a brief description of LQR problem is
γn,k (ξ n − µnew )(ξ n − µnew )T
Σnew
k = n=1
PN
k k
. (9) given. Consider the discrete state space equation of a linear
n=1 γn,k environment system as:
Here, N ∈ R is the total number of data points in a training ζ n+1 = Aζ n + Bun , (14)
dataset and ξ n ∈ Rd is the n-th training data point.
> >
After modeling the joint probability distribution of the where ζ n = [x>n , ẋn ] ∈ R
2×p
is the system state, xn ∈ Rp
p
training data offline, to derive GMR, we use superscripts I and and ẋn ∈ R respectively represent the pose and velocity.
O to denote the dimensions of the input and output variables. un ∈ Rr is the system input, and A ∈ R(2×p)×(2×p) and B ∈
At iteration n, η In and η O
n respectively represent the input and
R(2×p)×r are the discrete state and input matrices respectively.
output variables. With this notation, the data point η n ∈ Rd , For the subsequent form of control input:
mean µk and covariance Σk of the k-th Gaussian component
un = K̂ n (x̂n − xn ) − D̂ n ẋn , (15)
can be decomposed as
 II where x̂n ∈ Rp is the desired pose, K̂ n ∈ Rr×p and
ΣIO
 I  I 
ηn µk Σk
ηn = O , µk = , Σ k = k
. (10) D̂ n ∈ Rr×p are complete stiffness and damping matrices esti-
ηn µO k ΣOI
k ΣOO
k mated by LQR with time-varying CDS. For an infinite-horizon,
In the online reproduction phase, the best estimation of discrete-time LQR, the solution is obtained by minimizing the
output η̂ O I
n for a given input η n is the mean µ̂n of the following cost function:
conditional probability distribution η̂ O I
n |η n ∼ N (µ̂n , Σ̂n ), ∞
X
with parameters which was reported in [22]: J= (x̂n − xn )> Qn (x̂n − xn ) + u>
n Run , (16)
K n=0
X
µ̂n = E(η̂ O I
n |η n ) = hk (η In )µk (η In ), (11) where Qn ∈ Rp×p and R ∈ Rr×r are the weighting matrix
k=1 and the time-varying Qn will be set to CDS in this framework.
where The feedback gain is updated by
πk N (η In |µIk , ΣII
k )
hk (η In ) = PK II
, (12) [K̂ n , D̂ n ] = (R + B > P B)−1 B > P A, (17)
I I
j=1 πj N (η n |µj , Σj )
where P ∈ R(2×p)×(2×p) is the unique positive definite
OI II −1
µk (η In ) = µO
k + Σk (Σk ) (η In − µIk ). (13) solution to the following Discrete time Algebraic Riccati
In this framework, only the mean value is used although Equation (DARE):
covariance can be derived as well. human arm endpoint pose P = A> A − A> P B(R + B > P B)−1 B > P A + Qn , (18)
and CDS are respectively considered as input and output of  
GMM. Pose data can be represented in vector form while the (2×p)×(2×p) Qn 0p×p
where Qn ∈ R and Qn = .
CDS profile is a series of SPD matrices, it’s not possible to 0p×p 0p×p
apply GMM to encode it directly considering GMM can only In this framework, compared with the heuristic way to con-
deal with independent variables in vector form. In [14], two struct impedance in [11], LQR is a more systematic method,
different ways to represent stiffness matrix are provided. To which takes the environment dynamics into account and holds
make it simple, we use Cholesky decomposition to decompose a good balance between the tracking error and the control
5

input (which equals to the interaction force in the point view where ṽ ed ∈ R6 is the velocity error and it equals to v d − v e ,
of Cartesian impedance control). The environment dynamics v d ∈ R6 is the desired end-effector velocity, ped ∈ R3 is
is set to a double integrator system similar to [12]. In the the position error and the orientation error is represented in
following, we discuss how to implement the full impedance axis/angle oed ∈ R3 and oed = r sin(θ), R(θ, r) = Rd RTe .
profile on robots by using a task geometry consistent Cartesian Rd ∈ R3×3 and Re ∈ R3×3 are respectively the rotation
impedance controller whose choice of parameters is further matrices from the desired and actual end-effector frames to
discussed in Sec. III. base frame. D̂ n and K̂ n are the impedance parameters derived
from Eq. (17), with r = p = 6 in Cartesian space.
C. Robot controller By adding Eq. (19) and Eq. (22) to eliminate v̇ e as in [1],
the following control law is derived:
1) Task geometry consistent Cartesian impedance control:  
At present, the most widely used Cartesian impedance con- p
F τ = Λ(q)v̇ d + µ(q, q̇)v d + D̂ n ṽ ed + K̂ n ed . (23)
troller for the new generation light-weight robots (such as oed
KUKA LWR and Franka Emika Panda) follows the Jacobian The joint torque command can be computed by following
transpose scheme, in which the measurement of external the Jacobian transpose scheme.
wrench is not necessary by avoiding the inertia shaping. A
systematic description of this scheme is reported in [1], where τ = J (q)T F τ
 
the analytic Jacobian is used due to the minimal representation T p
= J (q) (Λ(q)v̇ d + µ(q, q̇)v d + D̂ n ṽ ed + K̂ n ed ).
of the orientation. Such a representation is known to suffer oed
from singularity and the task geometry consistency problem, (24)
i.e., the equivalent rotational stiffness between the orientation If v d = 0, then the simplified task geometry consistent
displacement and the elastic moment, which depends on the Cartesian impedance controller is
desired end-effector orientation. This issue was first discussed
 
T p
in [23], in which an inverse dynamics strategy is employed τ = J (q) (−D̂ n J (q)q̇ + K̂ n ed ). (25)
oed
to implement Cartesian impedance control, resulting in the re-
quirement of measuring contact forces and moments. Although 2) Tank-based system passivity observer: Considering that
only the translational stiffness is modeled in Sec. II-A and the overall framework assumes a variable impedance control,
is dominant in most applications, the rotational stiffness still the passivity of the system and hence the stability may be lost
needs to be set when the robot executes a task. To eliminate if the impedance values change too fast. In this framework, a
the effects brought by task geometry consistency, a derivation tank-based approach proposed in [24] is used to monitor the
of task geometry consistent Cartesian impedance control is stability of the system. Formally, the extended dynamics is
performed by following the Jacobian transpose scheme. given by:
 
In contrast with [1], instead of representing the operational c p
space dynamic equation by using analytic Jacobian, here the Λ(q)ṽ˙ ed + (µ(q, q̇) + D̂ n )ṽ ed + K̂ n ed − ωxt = F ext ,
oed
geometric Jacobian J (q) ∈ R6×n is adopted to address σ > >
the task geometry consistency problem. Without an explicit ẋt = (ṽ ed D̂ n ṽ ed ) − ω ṽ ed .
xt
representation of Cartesian coordinates by using v e = J (q)q̇ (26)
c v c
and v̇ e = J˙(q)q̇ + J (q)q̈ and neglecting the gravity term for where K̂ n = K̂ n + K̂ n , K̂ n ∈ R6×6 is the constant stiffness
v
simplicity, the dynamic model for the manipulator in Cartesian term and K̂ n ∈ R6×6 is the time-varying stiffness. The scalar
space is xt ∈ R is the state associated with the tank and the tank energy
T ∈ R+ and T = 12 x2t . σ ∈ R and ω ∈ R6 respectively are
Λ(q)v̇ e + µ(q, q̇)v e = F τ − F ext , (19) v
K̂ [p> ,o> >
( (
1 if T ≤ T̄u ed ]
− n ed if T ≥ T̄l
where v e ∈ R6 is the end-effector velocity, q ∈ Rn is the σ= , ω= x t .
joint coordinates. F τ ∈ R6 and F ext ∈ R6 are respectively 0 otherwise 0 otherwise
the Cartesian control input and external wrench. Λ(q) ∈ R6×6 (27)
and µ(q, q̇) ∈ R6×6 are the Cartesian inertia and Corio- Here, T̄u ∈ R+ is a suitable, application dependent, upper
lis/centrifugal matrices respectively bound that avoid excessive energy storing in the tank while
T̄l ∈ R+ is a lower bound below, which energy cannot be
extracted by the tank for avoiding singularities in Eq. (26)
Λ(q) = J (q)−T M (q)J (q)−1 , (20) v
and thus the time-varying stiffness K̂ n will be removed. For
µ(q, q̇) = J (q)−T (C(q, q̇) − M (q)J (q)−1 J˙(q))J (q)−1 , a detailed analysis of the system passivity, please refer to [24].
(21)
where M (q) and C(q, q̇) are respectively the joint space III. E XPERIMENTS AND RESULTS
inertia and Coriolis/centrifugal matrices. The desired Cartesian The experimental setup is shown in Fig. 1. On the human
impedance relationship derived from an energy-based argu- side of this setup, the MVN Biomech suit (Xsens Tech)
ment when considering task geometry consistency is [23]: was used for collecting the 3D position information required
to construct CDS patterns and to train the GMM. On the
 
p
Λ(q)ṽ˙ ed + (µ(q, q̇) + D̂ n )ṽ ed + K̂ n ed = F ext , (22) robot side of this setup, the torque-controlled Franka Emika
oed
6

Panda arm was used, which was equipped with the Pisa/IIT
SoftHand for grasping of the door handle. For the door opening Case 1
Case 2
task, the significant motion and interaction happened on X-Y
Case 3
plane. Therefore, in the following, all the stiffness profiles,
trajectories and forces data are illustrated on X-Y plane.
As shown in Fig. 4, the subject repeated the door opening
task for five times. The black lines represent human arm
endpoint trajectories, while the green ellipses represent the
demonstrated human arm endpoint stiffness profile. The CDS
part was constructed according to the collected 3D position
data of human hand, elbow, and shoulder at 100Hz with model
parameters (α1 = 0.2450, α2 = 2.7118), which was reported
Fig. 6: Tank energy variations in Case 1 (blue), Case 2 (red) and
in [19]. The CMS value was selected experimentally to match Case 3 (yellow). The lower bound was set T̄l = 0.2J.
the range of trajectories, since it was neither used in the
training of the GMM nor for reproduction by GMR. Since the main components of the proposed framework (i.e., imitation
motion was constrained, the demonstrated trajectories didn’t learning and optimal control), a comparative experiment was
show much variability, which means that the method proposed conducted under three different cases.
in [12] would fail to provide meaningful stiffness profiles Case 1: This case implements the proposed framework: The
along the trajectory. CDS extracted from human demonstrations was encoded by
After constructing the input (endpoint trajectory) and output GMM offline and reproduced by GMR online (current end-
(endpoint stiffness) data of the GMM, an EM training process effector position as the input). For LQR, the weighting matrix
was performed to estimate the model parameters. The repro- Q was set to CDS and R was set to 1 × 10−6 I (I is identity
duction of CDS by GMR was simulated by taking a planned matrix), which was decided by limiting the resulted largest
circular trajectory as input. The results are depicted in Fig. 5. stiffness within the stiffness range of Franka Emika Panda
robot.
Case 2: In this case, the process of CDS information was
the same as in Case 1. But instead of using LQR, a constant
CMS was set to 1040.37 which is decided by keeping average
area of stiffness ellipses on X-Y plane along the trajectory is
the same in Case 1 and Case 2. The damping term was set to
guarantee a damping ratio of √12 .
Case 3: Here, instead of encoding and reproducing the time-
varying CDS by GMM/GMR, a constant CDS was used, which
equals to the value of the generated CDS at the starting point
in Case 1, and then a LQR procedure was followed.
In all above cases, the desired path was generated by
the same circular trajectory planner, and the environment
dynamics was set to a double integrator system. The tank-
Fig. 4: Human arm endpoint trajectories (black lines) and stiffness
profiles (green ellipses) during the demonstration phase. The task based stability observer was used to monitor system passivity.
stiffness was calculated by the model (resulting by Eq. (2)) and only As shown in Fig. 6, the tank energy was above the lower
CDS was extracted from human demonstrations. bound (T̄l = 0.2J) in all three cases, which means that the
full stiffness including the constant (i.e. the average stiffness
along the whole path was considered as the constant stiffness
in each case) and varying parts was realised. The resulted
trajectories and the stiffness profiles of Franka Emika Panda
in three cases are shown in Fig. 7. Compared with Case
1, the stiffness ellipses in Case 2 without the LQR process
were more elongated, which may cause large interaction force
when the door is in the middle. The stiffness ellipses in
Case 3, which used a constant CDS with the LQR process
demonstrated a more isotropic shape in comparison to those
in Case 2. However, due to the use of a constant CDS, although
the resulted stiffness matched the task requirements in the
beginning of the trajectory, however, it didn’t yield a desired
stiffness ellipse geometry in the end.
Fig. 5: CDS reproduction (green ellipses) based on a planned circular Due to the difference in impedance behaviors in the above
trajectory (black line) as input. three cases, the trajectory tracking and the interacting perfor-
To reveal the effectiveness and the added value of the two mance of the robot were different. The resulting tracking errors
7

Case 1
Case 2
Case 3

Starting Middle Ending

Case 1

Fig. 8: Tracking error in the constrained (bottom) and free (top)


motion directions in Case 1 (green), Case 2 (yellow) and Case 3
(blue).

Case 1
Case 2
Case 3

Case 2

Fig. 9: Interaction forces (control input) in the constrained (top) and


free (bottom) motion directions in Case 1 (green), Case 2 (yellow)
and Case 3 (blue).
was in the middle at 22s. In the free-motion direction, the
tracking errors (top in Fig. 8) in three cases were initialised by
the same zero value, thus resulted in the same zero force initial
values (bottom in Fig. 9). Case 1 showed less tracking errors
Case 3 and less control input compared with Case 2 and Case 3 along
almost the entire path. In the constrained-motion direction in
which a compliant behavior was expected, due to the existence
of an initial position mismatch and different stiffness settings,
Case 2 showed a different initial tracking error (bottom in
Fig. 8) and interaction force (top in Fig. 9) behavior compared
with Case 1 and Case 3 which were using the same stiffness
values at the starting point. During the door moving to the
middle, the tracking error in all three cases were decreasing,
and it was less in Case 2 and Case 3 due to a larger stiffness
than that in Case 1. However, the interaction force in Case 2
was much larger than the other two, which was unnecessary.
When the robot moved the door from the middle to the end,
the tracking error in Case 1 and Case 2 was increasing since
the stiffness in that direction was decreasing. In Case 3 the
Fig. 7: The executed trajectories (black lines) and stiffness profiles
(green ellipses) in three cases. The snapshots shows the starting, tracking error didn’t show obvious changes because of the
middle and ending phases of the door opening experiment, the solid near constant stiffness settings, however, it resulted in a larger
red and yellow arrows attached on the door respectively indicate the interaction force due to the inconsistency between stiffness
constrained-motion and free-motion directions. and task geometry in the end.
and the interaction forces along the constrained-motion and Due to the fact that the varying CDS encoded by
free-motion directions (as shown in Fig. 7, the solid red and GMM/GMR is not directly related to time but to current end-
yellow arrows attached on the door respectively indicate the effector position, the online generated impedance parameters
constrained-motion and free-motion directions.) are illustrated were always synchronized with the task geometry even if
in Fig. 8 and Fig. 9, respectively. The door opening task lasted external interventions (external forces were applied to confront
10s, starting and ending at 17s and 27s respectively. The door the door’s motion at around 11s and 27s) were applied, as
8

x [3] K. Kronander and A. Billard, “Learning compliant manipulation through


y
kinesthetic and tactile human-robot interaction,” IEEE transactions on
haptics, vol. 7, no. 3, pp. 367–380, 2013.
[4] K. P. Tee, E. Burdet, C.-M. Chew, and T. E. Milner, “A model of
x
force and impedance in human arm movements,” Biological cybernetics,
y vol. 90, no. 5, pp. 368–375, 2004.
[5] E. Burdet, K. P. Tee, I. Mareels, T. E. Milner, C.-M. Chew, D. W.
Franklin, R. Osu, and M. Kawato, “Stability and motor adaptation in
human arm movements,” Biological cybernetics, vol. 94, no. 1, pp. 20–
kxx 32, 2006.
kyy
kxy
[6] A. Ajoudani, N. Tsagarakis, and A. Bicchi, “Tele-impedance: Teleoper-
ation with impedance regulation using a body–machine interface,” The
International Journal of Robotics Research, vol. 31, no. 13, pp. 1642–
1656, 2012.
Fig. 10: External intervention happened during the period covered [7] A. Ajoudani, C. Fang, N. Tsagarakis, and A. Bicchi, “Reduced-
by yellow area, stiffness synchronises with task performing phase. complexity representation of the human arm active endpoint stiffness for
supervisory control of remote manipulation,” The International Journal
When the tracking error reached 0.05m, the trajectory planner stopped
of Robotics Research, vol. 37, no. 1, pp. 155–167, 2018.
updating to avoid large interaction forces in the free-motion direction.
[8] Y. Wu, P. Balatti, M. Lorenzini, F. Zhao, W. Kim, and A. Ajoudani,
“A teleoperation interface for loco-manipulation control of mobile
shown in Fig. 10. In the plots, when an external force was collaborative robotic assistant,” IEEE Robotics and Automation Letters,
applied and the end-effector stopped moving, GMR stopped vol. 4, no. 4, pp. 3593–3600, 2019.
updating as well to avoid task stiffness geometry mismatch [9] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,
“Dynamical movement primitives: learning attractor models for motor
which would cause large interaction forces in the constrained- behaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.
motion direction. This highlights the dependency of the GMR [10] S. Calinon and D. Lee, “Learning control,” in Humanoid Robotics: a
on task states and not on task timing, guaranteeing the adapta- Reference, P. Vadakkepat and A. Goswami, Eds. Springer, 2019, pp.
1261–1312.
tion of the stiffness profile according to the task requirements. [11] S. Calinon, I. Sardellitti, and D. G. Caldwell, “Learning-based control
Although the method proposed in this work outperforms the strategy for safe human-robot interaction exploiting task and robot re-
other two, one may argue that the resulted stiffness profile dundancies,” in 2010 IEEE/RSJ International Conference on Intelligent
Robots and Systems. IEEE, 2010, pp. 249–254.
doesn’t perfectly match the task requirements, e.g., in the
[12] S. Calinon, D. Bruno, and D. G. Caldwell, “A task-parameterized
middle of the task, where the major principle axis of the probabilistic model with minimal intervention control,” in 2014 IEEE
stiffness ellipse was in the constrained-motion direction. If International Conference on Robotics and Automation (ICRA). IEEE,
we consider the problem from the view of a moving task 2014, pp. 3339–3344.
[13] L. D. Rozo, S. Calinon, D. Caldwell, P. Jiménez, and C. Torras,
frame attached to the door, a more intuitive and easier solution “Learning collaborative impedance-based robot behaviors,” in Twenty-
for the stiffness regulation problem can be a constant task Seventh AAAI Conference on Artificial Intelligence, 2013.
stiffness geometry ellipse in the task frame, whose major [14] F. J. Abu-Dakka, L. Rozo, and D. G. Caldwell, “Force-based variable
impedance learning for robotic manipulation,” Robotics and Autonomous
and minor principle axes are respectively aligned with the Systems, vol. 109, pp. 156–167, 2018.
constrained-motion and free-motion directions. However, this [15] M. Matinfar and K. Hashtrudi-Zaad, “Optimization-based robot com-
approach is rather heuristic based on prior assumptions and pliance control: Geometric and linear quadratic approaches,” The In-
ternational Journal of Robotics Research, vol. 24, no. 8, pp. 645–656,
knowledge of the task. Instead, the proposed framework has 2005.
the capacity to encode more complex interaction behaviours [16] S. S. Ge, Y. Li, and C. Wang, “Impedance adaptation for optimal robot–
from human demonstrations and reproduce them via strong environment interaction,” International Journal of Control, vol. 87, no. 2,
pp. 249–263, 2014.
machine learning and optimal control algorithms.
[17] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for
partially observable dynamic processes: Adaptive dynamic programming
IV. C ONCLUSIONS using measured output data,” IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010.
In this work, we proposed a framework for autonomous [18] A. Ajoudani, M. Gabiccini, N. Tsagarakis, A. Albu-Schäffer, and
impedance regulation of robots based on imitation learning and A. Bicchi, “Teleimpedance: Exploring the role of common-mode and
configuration-dependant stiffness,” in 2012 12th IEEE-RAS International
optimal control, which resulted in task- and dynamics-aware Conference on Humanoid Robots (Humanoids 2012). IEEE, 2012, pp.
impedance planning without an explicit time dependency. The 363–369.
effectiveness of the framework was evaluated by conducting [19] Y. Wu, F. Zhao, W. Kim, and A. Ajoudani, “An intuitive formulation of
a comparative experiment including three different cases for the human arm active endpoint stiffness,” Sensors, vol. 20, no. 18, p.
5357, 2020.
a door opening task, as an illustrative example of constrained [20] F. A. Mussa-Ivaldi, N. Hogan, and E. Bizzi, “Neural, mechanical,
and continuously interactive actions. Results provide evidence and geometric factors subserving arm posture in humans,” Journal of
on the superior performance of the proposed framework in Neuroscience, vol. 5, no. 10, pp. 2732–2743, 1985.
[21] V. M. Zatsiorsky and V. M. Zaciorskij, Kinetics of human motion.
terms of tracking error and interaction forces, through optimal Human Kinetics, 2002.
and adaptive planning of robot interaction parameters. [22] Y. Huang, L. Rozo, J. Silvério, and D. G. Caldwell, “Kernelized
movement primitives,” The International Journal of Robotics Research,
vol. 38, no. 7, pp. 833–852, 2019.
R EFERENCES [23] F. Caccavale, C. Natale, B. Siciliano, and L. Villani, “Six-dof impedance
[1] C. Ott, Cartesian impedance control of redundant and flexible-joint control based on angle/axis representations,” IEEE Transactions on
robots. Springer, 2008. Robotics and Automation, vol. 15, no. 2, pp. 289–300, 1999.
[2] J. Buchli, F. Stulp, E. Theodorou, and S. Schaal, “Learning variable [24] F. Ferraguti, C. Secchi, and C. Fantuzzi, “A tank-based approach to
impedance control,” The International Journal of Robotics Research, impedance control with variable stiffness,” in 2013 IEEE International
vol. 30, no. 7, pp. 820–833, 2011. Conference on Robotics and Automation. IEEE, 2013, pp. 4948–4953.

You might also like