20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
1, JANUARY 2020
Abstract— This paper studies the problem of event-triggered with a sequence of Lyapunov equations in [2]. This solution
optimal control (ETOC) for continuous-time nonlinear systems requires the knowledge of complete system dynamics. In [9],
and proposes a novel event-triggering condition that enables this requirement is avoided with an online policy iteration
designing ETOC methods directly based on the solution of the
Hamilton–Jacobi–Bellman (HJB) equation. We provide formal algorithm, which can be implemented with an actor-critic
performance guarantees by proving a predetermined upper neural network (NN) structure. In [11], an off-policy learning
bound. Moreover, we also prove the existence of a lower bound for method was developed, where system data can be gener-
interexecution time. For implementation purposes, an adaptive ated by arbitrary control policies. In [13], an ADP method
dynamic programming (ADP) method is developed to realize was proposed for nonaffine nonlinear systems with unknown
the ETOC using a critic neural network (NN) to approximate
the value function of the HJB equation. Subsequently, we prove dynamics. To relax the persistence of excitation condition,
that semiglobal uniform ultimate boundedness can be guaranteed a model-based reinforcement learning method was suggested
for states and NN weight errors with the ADP-based ETOC. in [16], using a concurrent-learning-based system identifier to
Simulation results demonstrate the effectiveness of the developed simulate experience. Note that these works are mainly time-
ADP-based ETOC method. triggered approaches, where the controller needs to stay active
Index Terms— Adaptive dynamic programming (ADP), event- at all time instances, which can be expensive in terms of
triggered, neural network (NN), optimal control, performance computational and communication overhead.
guarantee. In many practical applications, it is often unnecessary
and/or infeasible to update the controller at every time
I. I NTRODUCTION
instance. Event-triggered control [19]–[21] is a promising
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 77
the optimal performance will degrade to some extent. Hence, u(t) = u ∗ (x) such that the system (1) is closed-loop asymp-
ETOC essentially provides a tradeoff between the perfor- totically stable, and the performance index (2) is minimized.
mance and resources usage. Then, a natural question is how Then, the time-triggered optimal control is given by
much performance degradation an ETOC method causes.
In [22] and [30], the real performance index was analyzed u ∗ (x) = arg min J (x 0 , u). (3)
u
for event-triggered control. However, the obtained real perfor-
In this paper, the ETOC problem is considered, where the
mance index contains an integral term and its boundness was
system information only transmits when the event-triggering
not analyzed. In this paper, an ETOC method is developed condition is violated. The triggered instants are a monotone
by proposing a novel event-triggering condition, which guar-
increasing sequence {t j } determined by the event-triggering
antees a predetermined upper bound for performance index. condition, where t0 = 0, t j < t j +1 , j ∈ N. Thus, the interexe-
Moreover, the stability and the lower bound of interexecution
cution time of the event-triggered control is defined as
times of ETOC are analyzed theoretically. For realization
purpose, ADP is employed to implement the ETOC by using h j t j +1 − t j . (4)
a critic NN to estimate the solution of the HJB equation.
It is proven that semiglobal uniform ultimate boundedness For the ETOC problem, only at event-triggered instants,
(SGUUB) can be guaranteed for states and NN weight errors the system information is transmitted to the controller and the
with the ADP-based ETOC. Furthermore, the bounds for control signal is held constant in a zero-order-hold scheme
performance and interexecution time are also analyzed for the during time interval [t j , t j +1 ).
ADP-based ETOC. In this paper, the study on the ETOC problem aims to
The rest of this paper is arranged as follows. The problem achieve the following three goals. First, a predetermined upper
description for ETOC is given in Section II. An ETOC method bound can be guaranteed for the real performance index;
is proposed and its stability, the upper bound of perfor- second, there exists a lower bound for the interexecution time;
mance index, and the lower bound of interexecution times and third, the stability of the closed-loop system with the
are analyzed theoretically in Section III. An ADP-based ETOC can be guaranteed.
ETOC method is developed with theoretical analysis
in Section IV. Simulation results are presented in Section V, III. E VENT-T RIGGERED O PTIMAL C ONTROL
and brief conclusions are given in Section VI. W ITH P ERFORMANCE G UARANTEE
Notations: Rn is the set of n-dimensional Euclidean space In this section, the ETOC method is developed and
and · denotes its norm. N denotes the set of nonnegative its theoretical analysis is provided. First, the ETOC is
integers. The superscript T is used for the transpose of a designed directly based on the HJB equation and a novel
matrix or vector. ∇x ∂/∂ x denotes a gradient operator event-triggering condition is proposed. Based on the devel-
notation. For a symmetric matrix M, M > (≥)0 means that it oped ETOC, the stability and performance bound are proved
is a positive definite (semidefinite) matrix. v2M v T Mv for the closed-loop system. Subsequently, it is proven that there
for some real vector v and symmetric matrix M > (≥)0 exists a lower bound for the interexecution times.
with appropriate dimensions. X and U represent two compact
sets. C 1 (X ) is a function space on X with first derivatives
are continuous. σ (·) and σ (·) denote the maximum and A. Event-Triggered Optimal Control Design
minimum singular values. For a vector x(t), denote x − (t) First, let us consider the time-triggered optimal control
limε→0 x(t − ε). of the system (1) and the performance index (2). For an
admissible control u(x), its value function is defined as
∞
II. P ROBLEM D ESCRIPTION
Vu (x) = Q(x(t)) + u(x(t))2R dt (5)
Consider the following continuous-time nonlinear systems: t
ẋ(t) = f (x(t)) + g(x(t))u(t), x(0) = x 0 (1) for all x(t) = x ∈ X , where Vu (x) ∈ C 1 (X ), Vu (x) 0,
and Vu (0) = 0. For a value function Vu (x) ∈ C 1 (X ), its
where x = [x 1 , · · · , x n ]T ∈ X ⊂ Rn is the state, x 0 is the Hamiltonian is denoted as
initial state, and u = [u 1 , · · · , u m ]T ∈ U ⊂ Rm is the control
input. Assume that f (x) + g(x)u(t) is Lipschitz continuous H (x, u(x), ∇x Vu (x)) = [∇x Vu (x)]T [ f (x) + g(x)u(x)]
on X that contains the origin, f (0) = 0, and that the system +Q(x) + u(x)2R . (6)
is stabilizable on X , i.e., there exists a continuous control
function such that the system is asymptotically stable on X . By using (6), differentiating (5) with respect to t yields
Consider the following infinite-horizon performance index: H (x, u(x), ∇x Vu (x)) = 0. (7)
∞
J (x 0 , u) = Q(x(t)) + u(t)2R dt (2) For the optimal control u ∗ (x) in (3), denote its optimal value
0 function as V ∗ (x) Vu ∗ (x). Then, it follows from (2) that the
where R > 0 and Q(x) is a positive definite function, optimal performance index is given by
i.e., Q(x) > 0 for all x = 0 and Q(0) = 0. For the time-
J ∗ (x 0 ) min J (x 0 , u) = V ∗ (x 0 ). (8)
triggered optimal control problem, it aims to design a control u
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
78 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
From the optimal control theory [1], V ∗ (x) satisfies the state x(t) is transmitted to update the controller. Thus, it fol-
following HJB equation: lows from (17) that the next release time instant t j +1 is given
by
min H (x, u(x), ∇x V ∗ (x)) = 0. (9)
u
t j +1 = inf{t|Cα (x(t), x̂(t)) 0, t t j } (19)
Then, ∇u H (x, u(x), ∇x V ∗ (x)) = 0, that is,
where t0 = 0.
1
u (x) = − R −1 g T (x)∇x V ∗ (x).
∗
(10)
2
B. Stability and Performance Guarantee
Substituting (10) into (9) gives the HJB equation
For the time-triggered optimal control (10), the performance
H (x, u ∗(x), ∇x V ∗ (x)) = 0 (11) index (2) can be minimized, i.e., the optimal performance
index J ∗ (x 0 ) will be obtained. In the ETOC (13), only states
which can be rewritten as at event-triggered time instants {t j } are available for control
∇xT V ∗ (x) f (x) + Q(x) update. That is to say, the performance index is bound to
1 degrade at some extent, that is, the optimal performance index
− ∇xT V ∗ (x)g(x)R −1 g T (x)∇x V ∗ (x) = 0 (12) cannot be achieved whenever any event-triggering scheme is
4
employed. Thus, it is necessary to analyze that how much the
where V ∗ (x) ∈ C 1 (X ), V ∗ (x) 0, and V ∗ (0) = 0. performance index will degrade for an event-triggered control
Now, it is ready to give the ETOC. Based on (10) and method. For the proposed event-triggering condition (17),
using V ∗ (x), the following ETOC is proposed: we will demonstrate in Theorem 1 that an upper bound of the
1 performance index with the ETOC (13) can be predetermined
μ(x̂) = u ∗ (x̂) = − R −1 g T (x̂)∇x V ∗ (x̂) (13) by giving the parameter α. Moreover, the stability of the
2
where x̂(t) denotes the state available for controller as follows: closed-loop system (16) will also be proved in Theorem 1.
Theorem 1: Consider the closed-loop system (16) with the
x(t) t = t j triggering condition (17). The triggering instant sequence {t j }
x̂(t) = (14) is determined by (19). Then:
x(t j ) t ∈ (t j , t j +1 )
1) the closed-loop system (16) is asymptotically stable;
for all j ∈ N. Note that the ETOC (13) depends on the 2) there exists an upper bound for the real performance
solution V ∗ (x) of the HJB equation (12). index J (x 0 , μ), i.e., J (x 0 , μ) (1 + α)J ∗ (x 0 ).
Remark 1: For the time-triggered optimal control u ∗ (x),
Proof: 1) Choose V ∗ (x) as the Lyapunov function. Based
it requires system state x at all time instants. Differently,
on (17) and (19), taking derivative along (16) yields
in the ETOC (13), only system state at triggered instants is
transmitted to the controller and holds constant during the V̇ ∗ (x) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
execution interval, which greatly reduces computational and 1
communication resources. = [Cα (x, x̂) − Q(x) − μ(x̂)2R ]
1+α
The error between the available state x̂(t) and true state x(t) 1
is defined as − [Q(x) + μ(x̂)2R ]
1+α
0 (20)
e(t) x̂(t) − x(t). (15)
which means that the closed-loop system (16) is asymptoti-
With the ETOC (13), it follows from (1) and (15) that the
cally stable.
closed-loop system is given by
2) According to (20), we have
ẋ = f (x) + g(x)μ(x̂)
Q(x) + μ(x)2R −(1 + α)V̇ ∗ (x) (21)
1
= f (x) − g(x)R −1 g T (x̂)∇x V ∗ (x̂)
2 for t ∈ [t j , t j +1 ) and all j . From 1) of Theorem 1, the closed-
1 loop system (16) is asymptotically stable, which means that
= f (x) − g(x)R −1 g T (x + e)∇x V ∗ (x + e). (16)
2 limt →∞ x(t) = 0. Then, based on (2), (8), and (21), we have
∞
To determine the release time instant t j , a novel event-
triggering condition is proposed as follows: J (x 0 , μ) = [Q(x(t)) + μ(x̂(t))2R ]dt
0 ∞
Cα (x, x̂) < 0 (17) −(1 + α) dV ∗ (x(t))
0 ∞
where = −(1 + α)V ∗ (x(t)) 0
Cα (x, x̂) (1 + α)∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)] = (1 + α)V ∗ (x 0 )
+Q(x) + μ(x̂)2R (18) = (1 + α)J ∗ (x 0 ).
with α > 0 being a pregiven constant that will determine an This completes the proof.
upper bound for the performance index with the ETOC (13). From Theorem 1, it is observed that with the given of the
Once the triggering condition (17) is violated, the current parameter α, the real performance index J (x 0 , μ) is upper
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 79
bounded by the predetermined (1 + α)J ∗ (x 0 ). In Corollaries 1 Assumption 2: Assume that u ∗ (x) is Lipschitz continuous.
and 2, we will analyze the influence of the parameter α in the For all x 1 , x 2 ∈ X , there exists lu > 0 such that u ∗ (x 1 ) −
event-triggering condition (17) to the performance. u ∗ (x 2 ) lu x 1 − x 2 .
Corollary 1: Consider the closed-loop system (16) with Theorem 2: Consider the closed-loop system (16) with the
the triggering condition (17). The sequence of interexecution triggering condition (17). The triggering instant t j is deter-
times {h j } is implicitly determined by (19). If α = 0, then mined by (19) and the interexecution time h j is defined
h j = 0 for all j and J (x 0 , μ) = J ∗ (x 0 ). by (4). Let Assumptions 1 and 2 hold. Then, there exists a
Proof: Based on the HJB equation (12), we have lower bound h > 0 for h j , i.e., h j h for all j .
Proof: Based on the HJB equation (12)
∇xT V ∗ (x) f (x) + Q(x) = u ∗ (x)2R . (22)
∇xT V ∗ (x) f (x) = −Q(x) + u ∗ (x)2R . (23)
With the condition α = 0, it follows from (18) and (22) that
According to Assumption 2, it follows from (10) and (13) that
Cα (x, x̂) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
μ(x̂) − u ∗ (x) = u ∗ (x̂) − u ∗ (x) lu x̂ − x = lu e.
+ Q(x) + μ(x̂)2R
(24)
= u ∗ (x)2R − 2[u ∗ (x)]T Rμ(x̂) + μ(x̂)2R
Based on (23), (24), and Assumption 1, rewrite (18) as
= u ∗ (x) − μ(x̂)2R 0
C(x, x̂) = (1 + α)∇xT V ∗ (x) f (x) + g(x)μ(x̂)
for all t t j and j . Thus, it follows from (19) that t j +1 = t j ,
i.e., h j = 0 for all j . + Q(x) + μ(x̂)2R
On the one hand, it follows from the condition α = 0 = (1 + α) − Q(x) + u ∗ (x)2R − 2μ(x̂)Ru ∗ (x)
and 2) of Theorem 1 that J (x 0 , μ) J ∗ (x 0 ). On the other + Q(x) + μ(x̂)2R
hand, J ∗ (x 0 ) is the minimum performance, which means that
= −α Q(x) + (1 + α)μ(x̂) − u ∗ (x)2R
J (x 0 , μ) J ∗ (x 0 ). Thus, we have J (x 0 , μ) = J ∗ (x 0 ).
Corollary 2: Let α1 α2 > 0. Consider the closed-loop − αμ(x̂)2R
system (16) and the triggering condition (17) with parameters −α Q(x) + (1 + α)μ(x̂) − u ∗ (x)2R
α1 and α2 . h α1 and h α2 are the interexecution times associate −αl1 x2 + lu2 (1 + α)σ (R)e2 . (25)
with the parameters α1 and α2 . Then, h α1 h α2 .
Proof: For t ∈ [0, h α2 ], it follows from (18) and (19) that Let β > t j that satisfies the following equation:
−αl1 x(β)2 + lu2 (1 + α)σ (R)e(β)2 = 0. (26)
Cα1 (x, x̂) − Cα2 (x, x̂)
= (α1 − α2 )∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)] According to (19), (25), and (26), we have
α1 − α2 t j +1 β. (27)
= Cα2 (x, x̂) − Q(x) − μ(x̂)2R
1 + α2 Define a notation as follows:
α1 − α2
e(t)
− Q(x) + μ(x̂)2R 0 s(t) .
1 + α2 x(t)
(28)
which means Cα1 (x, x̂) Cα2 (x, x̂). Thus, h α1 h α2 . By using (28), dividing x(β)2 on both sides of (26) yields
By using the parameter α = 0, it is observed from the following equation:
Corollary 1 that the ETOC will degrade into a traditional
time-triggered optimal control, i.e., h j = 0, and the optimal −αl1 + lu2 (1 + α)σ (R)s(β)2 = 0. (29)
performance J ∗ (x) can be achieved. From Corollary 2, it is For the quadratic equation (29), it has two different solutions,
noted that larger α will result in a larger interexecution time, i.e., s1 (β) = λ1 and s2 (β) = λ2 , where λ1 = ((αl1 /
and then more computational and communication resources lu2 (1 + α)σ (R)))1/2 and λ2 = −((αl1 /lu2 (1 + α)σ (R)))1/2 .
can be reduced. This is to say, α is a tuning parameter Noting that s1 (β) > 0 and s2 (β) < 0, we take
between the ETOC and the time-triggered optimal control,
which achieves a tradeoff between the optimal performance s(β) = λ1 . (30)
index and the reduction of resources. For the choice of α, Based on (28), let us consider
it depends on practical requirement. If designers emphasize d e(t)
on optimizing the performance index, a small α can be used. ṡ(t) =
dt x(t)
Otherwise, a large α can be applied if designers emphasize on
x(t)e(t)−1 eT (t)ė(t) e(t)x(t)−1 x T (t)ẋ(t)
reducing computational and communication resources. = −
x(t)2 x(t)2
−1
x(t)e(t) e(t)ė(t)
C. Lower Bound of Interexecution Times
x(t)2
In this section, it is proven that there exists a lower bound
for the interexecution time h j . Before starting, the following e(t)x(t)−1 x(t)ẋ(t)
+
assumption is required. x(t)2
Assumption 1: Assume that l1 x2 Q(x) l2 x2 and x(t)ė(t) e(t)ẋ(t)
= + . (31)
l3 x u ∗ (x) l4 x, where l1 , l2 , l3 , l4 > 0. x(t)2 x(t)2
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
80 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 81
Then, it is desired to tune θ such that the square residual Theorem 3: Consider the system (1) with control (37).
error (42) is minimized and the system stability can be The triggering condition is given by (17) with (39). Let
guaranteed. Therefore, the following tuning rule is developed: Assumptions 1–4 hold. Then, the signals x(t), x̂(t), and θ̃(t)
are SGUUB.
θ̇ = −βσ (x)ξ(x, θ )γ(x)
−1 T
Proof: Choose the following Lyapunov function candi-
+κ∇x L (x)g(x)R g (x)∇x P(x) (43) date:
where β > 0 is an adaptive gain and L(t) = V ∗ (x) + P(x) + V ∗ (x̂) + Vθ̃ (θ̃ ) (48)
γ(x) ∇x L (x)[ f (x) + g(x)μ(x)] (44) where Vθ̃ (θ̃ ) (1/2)θ̃ T θ̃ .
First, let us consider the stability
1
σ (x) (45) of flow dynamics on t ∈ [t j , t j +1 ) for all j . By taking the
1 + γ (x)γ(x)
T
derivative along with (16) and (47), we have
0 if ∇xT P(x)[ f (x) + g(x)μ(x̂)] 0 ˙
κ (46) L̇(t) = V̇ ∗ (x) + Ṗ(x) + V̇ ∗ (x̂) + θ̃ T θ̃. (49)
0.5 else.
We consider each part of (49) separately as follows. From (35),
The key advantage of (46) is that it avoids requiring
we have
an initial admissible control policy, and thus, the initial
NN weights θ (0) of (43) can be selected randomly. Similar V ∗ (x) = LT (x)θ ∗ + (x)
techniques have also been applied and analyzed in related = LT (x)(θ − θ̃ ) + (x)
ADP works [33], [34]. = V̂ (x) − LT (x)θ̃ + (x). (50)
Remark 3: It is necessary to give a brief description for
the implementation of the ADP-based ETOC. First, the critic Then, by using Assumptions 1 and 4, it follows from (50) that
NN weight θ is computed with (43) using x, and the event- V̇ ∗ (x) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
triggering condition (39) is verified using x and x̂. Then, when
= ∇xT V̂ (x)[ f (x) + g(x)μ(x̂)]
the event-triggering condition is violated, i.e., Cα (x, x̂) 0,
x̂ and θ̂ are transmitted to compute the control with (37) and − ∇xT LT (x)θ̃ − ∇xT (x) [ f (x) + g(x)μ(x̂)]
held constant in a zero-order-hold scheme. 1
− Q(x) + μ(x̂)2R
1T+ αT
B. Theoretical Analysis ∇x L (x)θ̃ − ∇xT (x) [ f (x) + g(x)μ(x̂)]
1
In this section, some theoretical analyses are provided − l1 x2 + l32 x̂2
for the ADP-based ETOC method (37), including stability, 1+α
+ (d M θ̃ + x M )( f M + g M l4 x̂). (51)
performance bound, and interexecution time bound. The proof
the system stability in Theorem 3 is in part inspired by the The derivative of P(x) is given by
works [32], [34], [36]. Before starting, Assumption 4 and the
definition of SGUUB are given as follows. Ṗ(x) = ∇xT P(x)[ f (x) + g(x)μ(x̂)]
Definition 1: [37] Consider the system (16), the solution = ∇xT P(x)[ f (x) + g(x)μ(x)]
x(t) is SGUUB, if for all x(t0 ) = x 0 ∈ X , there exist positive + ∇xT P(x)g(x)[μ(x̂) − μ(x)]. (52)
constants μ1 and T (μ1 , x 0 ) such that x(t) < μ1 for all According to the definition of x̂(t) in (14), it is noted that x̂(t)
t > t0 + T . remains invariant on t ∈ [t j , t j +1 ). Thus,
Assumption 4: Assume that:
1) θ ∗ is bounded, i.e., θ ∗ θ M , where θ M > 0; V̇ ∗ (x̂) = 0. (53)
2) f (x) is Lipschitz continuous, i.e., for all x 1 , x 2 ∈ X , Based on (43) and (47), we get
there exists l f > 0 such that f (x 1 )− f (x 2 ) l f x 1 −
x 2 . f (x) and g(x) are bounded on the compact set X , V̇θ̃ (θ̃) = θ̃ T θ̃˙ = −βσ (x)ξ(x, θ )θ̃ T γ(x)
i.e., f f M and g g M , where f M , g M > 0; + κ θ̃ T ∇x L (x)g(x)R −1 g T (x)∇x P(x). (54)
3) L (x) and ∇x L (x) are bounded on the compact
set X , i.e., L M and ∇x L d M , where With Assumptions 1 and 4, the first term in (54) is
M , d M > 0; −βσ (x)ξ(x, θ )θ̃ T γ(x)
4) ∇x (x) is Lipschitz continuous, i.e., for all x 1 , x 2 ∈ X ,
T γ (x)γ(x)θ + Q(x) + μ(x) R γ(x)
T 2
there exists ld > 0 such that ∇x (x 1 ) − ∇x (x 2 ) = −β θ̃
1 + γ T (x)γ(x)
ld x 1 − x 2 . (x) and ∇x (x) are bounded on the
compact set X , i.e., M and ∇x x M , where γ (x)γ(x)θ̃ (θ ∗ + θ̃ )
T T
= −β
1 + γ (x)γ(x)
T
M , x M > 0;
5) γ(x) is bounded on the compact set X , i.e., γm Q(x) + μ(x)2R γ(x)
−β
γ(x) γ M , where γm , γ M > 0. 1 + γ T (x)γ(x)
Define the error of critic NN weights as θ̃ (t) θ (t) − θ ∗ . βγ θ̃ (θ M + θ̃ ) βγm (l1 + l3 )x2
2
Then, it follows from (43) that − m − . (55)
1 + γM2 1 + γM
2
θ̃˙ (t) = θ̇ (t). (47) Based on the definition of κ in (46), it has two cases.
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
82 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
Case 1: if ∇xT P(x)[ f (x) + g(x)μ(x̂)] 0, κ = 0. Then where z = [x, x̂, θ̃ ]T , and υ = x M fM + k1
⎡ l βγm (l1 + l3 ) ⎤
−1 T
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R g (x)∇x P(x)
1
+ 0 0
⎢1 + α 1 + γM 2 ⎥
= ∇xT P(x)[ f (x) + g(x)μ(x̂)] 0. (56) ⎢ ⎥
⎢ 2
l3 d M g M l4 ⎥
⎢
M =⎢ 0 − ⎥
1+α ⎥
Case 2: if ∇xT P(x)[ f (x) + g(x)μ(x̂)] > 0, κ = 0.5. Then, ⎢ 2 ⎥
⎣ d M g M l4 βγm2 ⎦
it follows from (52) that 0 −
2 1 + γM 2
−1 T
⎡ ⎤
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R g (x)∇x P(x) k2
⎢ x M g M l4 + k2 ⎥
= ∇xT P(x)[ f (x) + g(x)μ(x)] N =⎢
⎣
⎥
βγm2 θ M ⎦ .
−1 T
d M fM −
+ 0.5θ̃ T ∇x L (x)g(x)R g (x)∇x P(x) 1 + γM 2
+ ∇xT P(x)g(x)[μ(x̂) − μ(x)] Let the parameters be chosen such that M > 0. Then,
it follows from (60) that
= ∇xT P(x) f (x) − 0.5 g(x)R −1 g T (x)∇xT L (x)(θ − θ̃ )
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 83
control (37), we have Moreover, the upper bound is also analyzed under the consid-
eration of NN estimation error by using ADP implementation.
V̇ ∗ (x) = V̂˙ (x) + ˙ (x) Second, in this paper, the ADP method is developed to solve
= ∇xT V̂ (x)[ f (x) + g(x)μ(x̂)] + ˙ (x) the original HJB equation (12) directly. In [22], the ADP was
1 employed to solve the event-triggered HJB equation, which
= Cα (x, x̂) − Q(x) − μ(x̂)2R + ˙ (x)
1+α can be viewed as approximation for the original HJB equation.
1 Third, because the event-triggering condition in this paper is
− Q(x) + μ(x̂)2R + ˙ (x)
1+α different from those in [22] and [30], their theoretical analysis
that is, is greatly different.
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
84 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
Fig. 1. For Case 1, the trajectories of x and x̂. Fig. 4. For Case 1, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 85
Fig. 9. For Case 2, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).
Fig. 12. For Case 3, the trajectory of the ETOC μ.
in Figs. 5–9. Figs. 5–7 shows the trajectories of states and
control. The intertimes h j are shown in Fig. 8, where the where u is the control input, M = 1 kg is the mass, and l = 3
achieved minimum h j is 0.1201 s. In Fig. 9, it is observed m is the length of the pendulum bar. Denote the system state
that the achieved real performance is J (x 0 , μ) = 15.6072, as x = [ϑ, ω]T , where ϑ and ω are the current angle and
which is below the upper bound (1 + α)J ∗ (x 0 ). angular velocity with initial values given by ϑ(0) = 0.2 and
ω(0) = −0.2, respectively. g = 9.8m/s2 is the gravity, J =
(4Ml 2 /3) is the rotary inertia, and f d is the frictional factor.
C. Case 3: Torsional Pendulum System
For the performance index (2), R = 1 and Q(x) = x T Sx with
The nonlinear torsional pendulum system [41] is given by S being an identity matrix.
⎧
⎪ dϑ To approximate the value function of the HJB equation,
⎨ =ω
dt select the NN activation functions as L (x) = [x 12 , x 1 x 2 , x 22 ,
(65)
⎪
⎩ J dω = u − Mgl sin(ϑ) − f d ω x 13 , x 12 x 2 , x 1 x 22 , x 23 , x 14 , x 13 x 2 , x 12 x 22 , x 1 x 23 , x 24 ]T . For the event-
dt triggering condition (18), select the parameter α = 0.8. Then,
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
86 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
Fig. 13. For Case 3, the interexecution time h j . Fig. 16. For Case 4, the trajectory of the ETOC μ.
Fig. 14. For Case 4, the trajectories of x1 , x2 , x̂1 and x̂2 . Fig. 17. For Case 4, the interexecution time h j .
Fig. 15. For Case 4, the trajectories of x3 , x4 , x̂3 and x̂4 . Fig. 18. For Case 4, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 87
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
88 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020
[36] K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm Derong Liu (S’91–M’94–SM’96–F’05) received the
to solve the continuous-time infinite horizon optimal control problem,” Ph.D. degree in electrical engineering from the
Automatica, vol. 46, no. 5, pp. 878–888, 2010. University of Notre Dame, Notre Dame, IN, USA,
[37] S. S. Ge, C. C. Hang, T. H. Lee, and T. Zhang, Stable Adaptive Neural in 1994.
Network Control. Norwell, MA, USA: Kluwer, 2001. In 2006, he joined the University of Illinois at
[38] H. K. Khalil and J. Grizzle, Nonlinear Systems, vol. 3. Chicago, Chicago, IL, USA, as a Full Professor of
Upper Saddle River, NJ, USA: Prentice-Hall, 2002. electrical and computer engineering and of computer
[39] X.-M. Zhang and Q.-L. Han, “Event-triggered dynamic output feedback science. In 2008, he was selected for the 100 Talents
control for networked control systems,” IET Control Theory Appl., vol. 8, Program by the Chinese Academy of Sciences. From
no. 4, pp. 226–234, Mar. 2014. 2010 to 2015, he was the Associate Director of the
[40] G. F. Franklin, J. D. Powell, and A. Emami-Naeini, Feedback Control State Key Laboratory of Management and Control
of Dynamic Systems. Boston, MA, USA: Addison-Wesley, 1986. for Complex Systems, Institute of Automation, Chinese Academy of Sciences,
[41] D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming Beijing, China. He has authored or co-authored 18 books.
algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Dr. Liu is a Fellow of the International Neural Network Society and the
Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. International Association of Pattern Recognition. He was the Editor-in-Chief
of the IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING
Biao Luo (M’15) received the Ph.D. degree from S YSTEMS from 2010 to 2015. He is currently the Editor-in-Chief of Artificial
Intelligence Review (Springer).
Beihang University, Beijing, China, in 2014.
From 2014 to 2018, he was an Associate Professor
and an Assistant Professor with the Institute of
Automation, Chinese Academy of Sciences, Beijing,
China. He is currently a Professor with the
School of Automation, Central South University,
Changsha, China. His current research interests
include distributed parameter systems, intelligent
control, reinforcement learning, deep learning, and
computational intelligence.
Dr. Luo was a recipient of the Chinese Association of Automation Outstand-
ing Ph.D. Dissertation Award in 2015. He serves as an Associate Editor for the
IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTATIONAL I NTEL -
LIGENCE, Artificial Intelligence Review, Neurocomputing, and the Journal of
Industrial and Management Optimization. He is the Secretariat of Adaptive
Dynamic Programming and Reinforcement Learning Technical Committee,
Chinese Association of Automation.
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.