Humanoids2020 A Framework For Autonomous Impedance Regulation of Robots Based On Imitation Learning and Optimal Control
Humanoids2020 A Framework For Autonomous Impedance Regulation of Robots Based On Imitation Learning and Optimal Control
Model Parameters
Fig. 2: The architecture of the proposed framework. The full impedance is regulated by an imitation learning algorithm (GMM/GMR) and
an optimal control method (LQR), which is then realised by the robot Cartesian impedance controller.
only predict the mean value but also covariance of the out- Improvement with Path Integrals (PI2 ) algorithm is proposed
put variables. In [11], the robot’s movement was encoded to accomplish variable impedance control for a given task. In
and reproduced by GMM/GMR. They consider the predicted [17], an approach called OutPut FeedBack Adaptive Dynamic
covariance as uncertainty in the environment and then use Programming (OPFB ADP) is proposed to solve the problem
its inverse as the stiffness in Cartesian impedance control by only using measured output data, considering that in
of robots. The same idea but with a minimal intervention most robot-environment interaction applications the dynamic
control framework was established in [12]. However, the processes are partially observable. A detailed discussion of the
inverse of the covariance embedded in multiple movement progress in RL for variable impedance control is beyond the
demonstrations doesn’t correspond to the physical meaning of scope of this article.
stiffness, and this framework will not work in the constrained In this work, we propose a framework that merges the
motion scenarios, such as door opening and valve turning. advantages of imitation learning and optimal control to achieve
Instead of extracting stiffness information from variability an adaptive and optimal interaction performance with a gen-
in multiple demonstrations, there are some works about es- eralization potential. Our aim is to transfer task stiffness
timating it based on interaction forces and tracking errors, geometry, provided by human demonstrations, to robots in
followed by imitation learning algorithms to reproduce the an autonomous and optimal way. The door opening has been
learned stiffness profiles online. The authors in [13] proposed chosen as a representative example of tasks with constrained
an approach to collaboratively assemble a wooden table by a and continuously interactive motion as shown in Fig. 1. Due
human subject and a robot. By taking into account the sensed to a potential instability of the system acting under variable
force information, a Weighted Least-Squares (WLS) method is impedance, we developed a tank-based passivity observer to
employed to estimate the stiffness. In [14], the authors used a monitor the stability of the system. In addition, to verify the
similar framework but adopted a sliding window technique to synergistic contribution of the imitation learning and optimal
carry out local stiffness estimation instead of WLS. However, control in this framework, two additional impedance settings
in this way, the position and force data are usually demon- were tested: one used GMR to reproduce the varying task
strated via kinesthetic teaching, which may cause discontinuity stiffness geometry but with a constant volume coefficient
in a continuous interaction process. It is not very reliable instead of LQR, the other one was the opposite, i.e., a constant
also due to the fact that the first estimation result of stiffness task stiffness geometry with LQR. The performances of the
usually does not comply with Symmetric Positive-Definite controllers under different impedance settings were compared
(SPD) constraints. A more direct way to teach robot compliant in terms of stiffness profiles, tracking error, and interaction
manipulation is through human-robot interaction proposed forces.
in [3]. However, for a task requiring continuously varying
impedance, this way may bring inconvenience to the teacher. II. F RAMEWORK
Some works consider impedance regulation as an optimal The overall framework is depicted in Fig. 2. Inspired by
control problem [15], [16]. In this field, most of the work the imitation learning approach, we attempt to learn the task
are based on Linear Quadratic Regulator (LQR) formulation, stiffness geometry from human demonstrations. In contrast
however, the choices of Q and R matrices play an important with the work in [12]–[14], we propose to utilize physically
role which are usually hard to design for individual tasks. In meaningful stiffness information, which is extracted directly
most cases, Q and R are fixed which results in a hard-coded from a human arm endpoint stiffness model. In this model, the
behavior. To achieve an online adaptation, adaptive optimal arm endpoint stiffness is decomposed into two components,
control or Reinforcement Learning (RL) can be employed. An i.e., the Configuration Dependent Stiffness (CDS) and the
example of this approach is presented in [2], where a Policy Common Mode Stiffness (CMS), as proposed in [18], [19].
3
where ξ ∈ Rd and p(ξ) ∈ R, respectively, represent the vector CDS into the product of a lower triangular matrix and its
of variables and joint probability distribution. πk ∈ R, µk ∈ transpose, and then we arrange its elements in a vector form.
Rd and Σk ∈ Rd×d represent the prior probability, mean and Next, the offline training phase takes the compact pose and
covariance of the k-th Gaussian component respectively, while the CDS vectors as the vector of variables ξ. Subsequently,
K ∈ R is the total number of Gaussian components. the GMR reproduces the CDS vector online by taking robot
The goal of the offline training phase is to maximize the end-effector pose as input, followed by a reverse process of
log likelihood function Eq. (6) with respect to the model Cholesky decomposition to construct CDS matrix whose SPD
parameters (πk , µk and Σk ), which results in an EM procedure property is guaranteed.
(E-step in (7) and M-step in (8) and (9)) to iteratively update
the model parameters until convergence B. Constructing full impedance through optimal control
N K
For the control of robots, we need to derive the full stiffness,
X X
ln p([ξ 1 , ξ 2 , . . . , ξ N ]) = ln πk N (ξ n |µk , Σk ) , and eventually the full impedance from CDS information. To
n=1 k=1
(6) take environment dynamics into account and construct the
full impedance from an optimal point of view, [12] proposed
πk N (ξ n |µk , Σk ) to formulate it as a LQR problem in which the predicted
γn,k = PK , (7)
covariance extracted from multiple movement demonstrations
j=1 πj N (ξ n |µj , Σj )
was adopted as the weighting matrix Q. This part follows
PN PN the same idea with the difference that instead of using the
γn,k γn,k ξ n predicted covariance, we are using the physically meaningful
πknew = n=1
, µnew
k
n=1
= PN
, (8)
N n=1 γn,k CDS formulated by the human arm endpoint stiffness model.
PN In the following, a brief description of LQR problem is
γn,k (ξ n − µnew )(ξ n − µnew )T
Σnew
k = n=1
PN
k k
. (9) given. Consider the discrete state space equation of a linear
n=1 γn,k environment system as:
Here, N ∈ R is the total number of data points in a training ζ n+1 = Aζ n + Bun , (14)
dataset and ξ n ∈ Rd is the n-th training data point.
> >
After modeling the joint probability distribution of the where ζ n = [x>n , ẋn ] ∈ R
2×p
is the system state, xn ∈ Rp
p
training data offline, to derive GMR, we use superscripts I and and ẋn ∈ R respectively represent the pose and velocity.
O to denote the dimensions of the input and output variables. un ∈ Rr is the system input, and A ∈ R(2×p)×(2×p) and B ∈
At iteration n, η In and η O
n respectively represent the input and
R(2×p)×r are the discrete state and input matrices respectively.
output variables. With this notation, the data point η n ∈ Rd , For the subsequent form of control input:
mean µk and covariance Σk of the k-th Gaussian component
un = K̂ n (x̂n − xn ) − D̂ n ẋn , (15)
can be decomposed as
II where x̂n ∈ Rp is the desired pose, K̂ n ∈ Rr×p and
ΣIO
I I
ηn µk Σk
ηn = O , µk = , Σ k = k
. (10) D̂ n ∈ Rr×p are complete stiffness and damping matrices esti-
ηn µO k ΣOI
k ΣOO
k mated by LQR with time-varying CDS. For an infinite-horizon,
In the online reproduction phase, the best estimation of discrete-time LQR, the solution is obtained by minimizing the
output η̂ O I
n for a given input η n is the mean µ̂n of the following cost function:
conditional probability distribution η̂ O I
n |η n ∼ N (µ̂n , Σ̂n ), ∞
X
with parameters which was reported in [22]: J= (x̂n − xn )> Qn (x̂n − xn ) + u>
n Run , (16)
K n=0
X
µ̂n = E(η̂ O I
n |η n ) = hk (η In )µk (η In ), (11) where Qn ∈ Rp×p and R ∈ Rr×r are the weighting matrix
k=1 and the time-varying Qn will be set to CDS in this framework.
where The feedback gain is updated by
πk N (η In |µIk , ΣII
k )
hk (η In ) = PK II
, (12) [K̂ n , D̂ n ] = (R + B > P B)−1 B > P A, (17)
I I
j=1 πj N (η n |µj , Σj )
where P ∈ R(2×p)×(2×p) is the unique positive definite
OI II −1
µk (η In ) = µO
k + Σk (Σk ) (η In − µIk ). (13) solution to the following Discrete time Algebraic Riccati
In this framework, only the mean value is used although Equation (DARE):
covariance can be derived as well. human arm endpoint pose P = A> A − A> P B(R + B > P B)−1 B > P A + Qn , (18)
and CDS are respectively considered as input and output of
GMM. Pose data can be represented in vector form while the (2×p)×(2×p) Qn 0p×p
where Qn ∈ R and Qn = .
CDS profile is a series of SPD matrices, it’s not possible to 0p×p 0p×p
apply GMM to encode it directly considering GMM can only In this framework, compared with the heuristic way to con-
deal with independent variables in vector form. In [14], two struct impedance in [11], LQR is a more systematic method,
different ways to represent stiffness matrix are provided. To which takes the environment dynamics into account and holds
make it simple, we use Cholesky decomposition to decompose a good balance between the tracking error and the control
5
input (which equals to the interaction force in the point view where ṽ ed ∈ R6 is the velocity error and it equals to v d − v e ,
of Cartesian impedance control). The environment dynamics v d ∈ R6 is the desired end-effector velocity, ped ∈ R3 is
is set to a double integrator system similar to [12]. In the the position error and the orientation error is represented in
following, we discuss how to implement the full impedance axis/angle oed ∈ R3 and oed = r sin(θ), R(θ, r) = Rd RTe .
profile on robots by using a task geometry consistent Cartesian Rd ∈ R3×3 and Re ∈ R3×3 are respectively the rotation
impedance controller whose choice of parameters is further matrices from the desired and actual end-effector frames to
discussed in Sec. III. base frame. D̂ n and K̂ n are the impedance parameters derived
from Eq. (17), with r = p = 6 in Cartesian space.
C. Robot controller By adding Eq. (19) and Eq. (22) to eliminate v̇ e as in [1],
the following control law is derived:
1) Task geometry consistent Cartesian impedance control:
At present, the most widely used Cartesian impedance con- p
F τ = Λ(q)v̇ d + µ(q, q̇)v d + D̂ n ṽ ed + K̂ n ed . (23)
troller for the new generation light-weight robots (such as oed
KUKA LWR and Franka Emika Panda) follows the Jacobian The joint torque command can be computed by following
transpose scheme, in which the measurement of external the Jacobian transpose scheme.
wrench is not necessary by avoiding the inertia shaping. A
systematic description of this scheme is reported in [1], where τ = J (q)T F τ
the analytic Jacobian is used due to the minimal representation T p
= J (q) (Λ(q)v̇ d + µ(q, q̇)v d + D̂ n ṽ ed + K̂ n ed ).
of the orientation. Such a representation is known to suffer oed
from singularity and the task geometry consistency problem, (24)
i.e., the equivalent rotational stiffness between the orientation If v d = 0, then the simplified task geometry consistent
displacement and the elastic moment, which depends on the Cartesian impedance controller is
desired end-effector orientation. This issue was first discussed
T p
in [23], in which an inverse dynamics strategy is employed τ = J (q) (−D̂ n J (q)q̇ + K̂ n ed ). (25)
oed
to implement Cartesian impedance control, resulting in the re-
quirement of measuring contact forces and moments. Although 2) Tank-based system passivity observer: Considering that
only the translational stiffness is modeled in Sec. II-A and the overall framework assumes a variable impedance control,
is dominant in most applications, the rotational stiffness still the passivity of the system and hence the stability may be lost
needs to be set when the robot executes a task. To eliminate if the impedance values change too fast. In this framework, a
the effects brought by task geometry consistency, a derivation tank-based approach proposed in [24] is used to monitor the
of task geometry consistent Cartesian impedance control is stability of the system. Formally, the extended dynamics is
performed by following the Jacobian transpose scheme. given by:
In contrast with [1], instead of representing the operational c p
space dynamic equation by using analytic Jacobian, here the Λ(q)ṽ˙ ed + (µ(q, q̇) + D̂ n )ṽ ed + K̂ n ed − ωxt = F ext ,
oed
geometric Jacobian J (q) ∈ R6×n is adopted to address σ > >
the task geometry consistency problem. Without an explicit ẋt = (ṽ ed D̂ n ṽ ed ) − ω ṽ ed .
xt
representation of Cartesian coordinates by using v e = J (q)q̇ (26)
c v c
and v̇ e = J˙(q)q̇ + J (q)q̈ and neglecting the gravity term for where K̂ n = K̂ n + K̂ n , K̂ n ∈ R6×6 is the constant stiffness
v
simplicity, the dynamic model for the manipulator in Cartesian term and K̂ n ∈ R6×6 is the time-varying stiffness. The scalar
space is xt ∈ R is the state associated with the tank and the tank energy
T ∈ R+ and T = 12 x2t . σ ∈ R and ω ∈ R6 respectively are
Λ(q)v̇ e + µ(q, q̇)v e = F τ − F ext , (19) v
K̂ [p> ,o> >
( (
1 if T ≤ T̄u ed ]
− n ed if T ≥ T̄l
where v e ∈ R6 is the end-effector velocity, q ∈ Rn is the σ= , ω= x t .
joint coordinates. F τ ∈ R6 and F ext ∈ R6 are respectively 0 otherwise 0 otherwise
the Cartesian control input and external wrench. Λ(q) ∈ R6×6 (27)
and µ(q, q̇) ∈ R6×6 are the Cartesian inertia and Corio- Here, T̄u ∈ R+ is a suitable, application dependent, upper
lis/centrifugal matrices respectively bound that avoid excessive energy storing in the tank while
T̄l ∈ R+ is a lower bound below, which energy cannot be
extracted by the tank for avoiding singularities in Eq. (26)
Λ(q) = J (q)−T M (q)J (q)−1 , (20) v
and thus the time-varying stiffness K̂ n will be removed. For
µ(q, q̇) = J (q)−T (C(q, q̇) − M (q)J (q)−1 J˙(q))J (q)−1 , a detailed analysis of the system passivity, please refer to [24].
(21)
where M (q) and C(q, q̇) are respectively the joint space III. E XPERIMENTS AND RESULTS
inertia and Coriolis/centrifugal matrices. The desired Cartesian The experimental setup is shown in Fig. 1. On the human
impedance relationship derived from an energy-based argu- side of this setup, the MVN Biomech suit (Xsens Tech)
ment when considering task geometry consistency is [23]: was used for collecting the 3D position information required
to construct CDS patterns and to train the GMM. On the
p
Λ(q)ṽ˙ ed + (µ(q, q̇) + D̂ n )ṽ ed + K̂ n ed = F ext , (22) robot side of this setup, the torque-controlled Franka Emika
oed
6
Panda arm was used, which was equipped with the Pisa/IIT
SoftHand for grasping of the door handle. For the door opening Case 1
Case 2
task, the significant motion and interaction happened on X-Y
Case 3
plane. Therefore, in the following, all the stiffness profiles,
trajectories and forces data are illustrated on X-Y plane.
As shown in Fig. 4, the subject repeated the door opening
task for five times. The black lines represent human arm
endpoint trajectories, while the green ellipses represent the
demonstrated human arm endpoint stiffness profile. The CDS
part was constructed according to the collected 3D position
data of human hand, elbow, and shoulder at 100Hz with model
parameters (α1 = 0.2450, α2 = 2.7118), which was reported
Fig. 6: Tank energy variations in Case 1 (blue), Case 2 (red) and
in [19]. The CMS value was selected experimentally to match Case 3 (yellow). The lower bound was set T̄l = 0.2J.
the range of trajectories, since it was neither used in the
training of the GMM nor for reproduction by GMR. Since the main components of the proposed framework (i.e., imitation
motion was constrained, the demonstrated trajectories didn’t learning and optimal control), a comparative experiment was
show much variability, which means that the method proposed conducted under three different cases.
in [12] would fail to provide meaningful stiffness profiles Case 1: This case implements the proposed framework: The
along the trajectory. CDS extracted from human demonstrations was encoded by
After constructing the input (endpoint trajectory) and output GMM offline and reproduced by GMR online (current end-
(endpoint stiffness) data of the GMM, an EM training process effector position as the input). For LQR, the weighting matrix
was performed to estimate the model parameters. The repro- Q was set to CDS and R was set to 1 × 10−6 I (I is identity
duction of CDS by GMR was simulated by taking a planned matrix), which was decided by limiting the resulted largest
circular trajectory as input. The results are depicted in Fig. 5. stiffness within the stiffness range of Franka Emika Panda
robot.
Case 2: In this case, the process of CDS information was
the same as in Case 1. But instead of using LQR, a constant
CMS was set to 1040.37 which is decided by keeping average
area of stiffness ellipses on X-Y plane along the trajectory is
the same in Case 1 and Case 2. The damping term was set to
guarantee a damping ratio of √12 .
Case 3: Here, instead of encoding and reproducing the time-
varying CDS by GMM/GMR, a constant CDS was used, which
equals to the value of the generated CDS at the starting point
in Case 1, and then a LQR procedure was followed.
In all above cases, the desired path was generated by
the same circular trajectory planner, and the environment
dynamics was set to a double integrator system. The tank-
Fig. 4: Human arm endpoint trajectories (black lines) and stiffness
profiles (green ellipses) during the demonstration phase. The task based stability observer was used to monitor system passivity.
stiffness was calculated by the model (resulting by Eq. (2)) and only As shown in Fig. 6, the tank energy was above the lower
CDS was extracted from human demonstrations. bound (T̄l = 0.2J) in all three cases, which means that the
full stiffness including the constant (i.e. the average stiffness
along the whole path was considered as the constant stiffness
in each case) and varying parts was realised. The resulted
trajectories and the stiffness profiles of Franka Emika Panda
in three cases are shown in Fig. 7. Compared with Case
1, the stiffness ellipses in Case 2 without the LQR process
were more elongated, which may cause large interaction force
when the door is in the middle. The stiffness ellipses in
Case 3, which used a constant CDS with the LQR process
demonstrated a more isotropic shape in comparison to those
in Case 2. However, due to the use of a constant CDS, although
the resulted stiffness matched the task requirements in the
beginning of the trajectory, however, it didn’t yield a desired
stiffness ellipse geometry in the end.
Fig. 5: CDS reproduction (green ellipses) based on a planned circular Due to the difference in impedance behaviors in the above
trajectory (black line) as input. three cases, the trajectory tracking and the interacting perfor-
To reveal the effectiveness and the added value of the two mance of the robot were different. The resulting tracking errors
7
Case 1
Case 2
Case 3
Case 1
Case 1
Case 2
Case 3
Case 2