0% found this document useful (0 votes)
16 views

Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)

Uploaded by

Dhruv Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)

Uploaded by

Dhruv Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Neurocomputing 601 (2024) 128176

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Data-driven-based sliding-mode dynamic event-triggered control of


unknown nonlinear systems via reinforcement learning
Tengda Wang a , Guangdeng Zong b , Xudong Zhao a ,∗, Ning Xu c
a
College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China
b
School of Control Science and Engineering, Tiangong University, 399 Binshuixi Road, Tianjin 300387, China
c
College of Information Science and Technology, Bohai University, Jinzhou 121013, Liaoning, China

ARTICLE INFO ABSTRACT

Communicated by B. Zhao This paper investigates a data-driven-based dynamic event-triggered control problem for continuous-time
unknown nonlinear systems using the reinforcement learning method and the sliding-mode surface technique.
Keywords:
Initially, by constructing a cost function associated with sliding-mode surface variables for the nominal system,
Unknown nonlinear systems
Dynamic event-triggered mechanism
the original control problem is equivalently transformed into a problem of designing a dynamic event-triggered
Reinforcement learning optimal control policy. To handle the unknown issue of system dynamics, a data-driven model is established
Data-driven control to reconstruct the system dynamics. Then, under the framework of reinforcement learning, a critic network is
Sliding mode surface employed to solve the event-triggered Hamilton–Jacobi–Bellman equation. The weight vector in the critic
network is updated through the current data and historical data, such that the persistence of excitation
condition is no longer needed. After that, it is strictly proven via Lyapunov stability theory that all the signals of
the considered system are bounded in the sense of uniformly ultimately boundedness. Finally, the effectiveness
of the developed control method is demonstrated by two simulation examples.

1. Introduction nonlinear systems affected by mismatched disturbances in [10]. In [11],


a robust control problem was addressed for nonlinear systems subject to
Over the last a few decades, optimal control [1–3] has received unmatched uncertainties, which utilized an adaptation method to learn
spreading attention from various fields and has been applied to many the critic network weight. Afterwards, an optimal control approach
practical scenarios, such as tracking optimal control for wheeled mobile was presented for constrained interconnected nonlinear systems with
robots [4], robust optimal control for missile autopilots [5], and energy unknown actuator faults in [12], which constructed a single-critic
optimal control for wave energy converters [6]. Optimal control aims at network instead of actor-critic network to approximate the optimal
finding a stable control policy by solving the Hamilton–Jacobi–Bellman performance function. It is worth noting that a priori knowledge of
equation (HJBE), which belongs to a nonlinear partial differential these system dynamics is assumed to be known, which limits applica-
equation. Since the HJBE is quite complicated, it is hard to solve or tions of the above-mentioned control methods to unknown nonlinear
even impossible to solve through analytical approaches, which becomes systems.
a major obstacle in designing optimal control policies for nonlinear
To relax the limitations, a robust adaptive optimal control method
systems. To overcome the obstacle, adaptive dynamic programming
using an identifier-critic network architecture was presented for par-
(ADP), also named as reinforcement learning (RL), was proposed in [7],
tially unknown nonlinear systems with unmatched uncertainties in [13].
which avoided the curse of dimensionality issue that cannot be han-
Using the same neural network architecture, an ADP-based resilient
dled by the traditional dynamic programming method. By using RL
control problem was studied for partially unknown nonlinear systems
algorithms within an actor-critic framework, plenty of results [7–9]
with malicious injections in [14], where an identifier network was
on optimal control were reported. Although these control strategies
used to estimate unknown internal dynamics. All listed optimal control
can achieve satisfactory performances, the actor network generated
approximation errors in the process of implementation. To reduce schemes using the identifier-critic networks can deal with the partially
approximation errors, some optimal control schemes using single-critic unknown problem of system dynamics, however, they hardly removed
RL algorithms were presented for nonlinear systems in [10–12]. For ex- the identifier errors and cannot be implemented in the case that a priori
ample, a self-learning robust optimal control method was proposed for knowledge of system dynamics is completely unknown. To overcome

∗ Corresponding author.
E-mail addresses: [email protected] (T. Wang), [email protected] (G. Zong), [email protected] (X. Zhao), [email protected] (N. Xu).

https://doi.org/10.1016/j.neucom.2024.128176
Received 27 October 2023; Received in revised form 12 April 2024; Accepted 4 July 2024
Available online 8 July 2024
0925-2312/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
T. Wang et al. Neurocomputing 601 (2024) 128176

these drawbacks of the identifier-critic networks, data-driven control terminal SMC policy. The authors in [38] proposed an optimal con-
was proposed in [15], which avoided the requirement for dynamic trol strategy using the SMS technique for switched nonlinear systems,
system information and made the modeling errors zero. Furthermore, which adopted a switched actor-critic framework instead of the non-
the data-driven control approach required the collection of available switched framework. Although these approaches possessed stronger
input–output data and the indirect incorporation of the data into a robustness, they were not expert in dealing with communication re-
data-driven model. Generally, the data-driven model [16,17] can be source constraints and model uncertainties widely existed in practical
classified into three categories, namely the neural network model, the applications, which inspires our present work.
wavelet one, and the Markov one. In this article, a recurrent neural In this paper, a data-driven-based DETC problem for unknown
network is employed to construct the data-driven model. Thus, the nonlinear systems affected by uncertain perturbations is investigated
data-driven RL schemes for completely unknown nonlinear systems will using the single-critc RL algorithm and the SMS technique. Compared
be a key development of RL for its real-world applications. with existing results, the contributions of this article are three-folds.
Note that the above systems all operated in a time-triggered man-
ner, where control inputs were periodically transmitted to a system 1. On the basis of the RL algorithm and the SMS technique, a data-
even if the system did not need them. This may lead to inefficient driven-based DETC approach for unknown nonlinear systems is
use of communication resources, including communication channel proposed for the first time, which can relax the restrictions on
bandwidth [18,19] and computation abilities [20–22]. Recently, an the system dynamics and force the modeling errors to zero.
event-triggered mechanism provides a way to improve the utilization 2. Instead of relying on the system states, our designed dynamic
of communication resources. Unlike the time-triggered mechanism, an triggering condition depends on the SMS and the triggering
actuator communicates with the system only when a predetermined dynamic variable. Under this SMS-based dynamic triggering con-
triggering condition is broken, such that fewer information [23,24] dition, our proposed DETC strategy can further reduce commu-
are transmitted. With the help of the event-triggered mechanism, an nication burden than the static ETC method [39].
event-triggered control (ETC) strategy using RL was firstly presented 3. Compared with the optimal DETC schemes in [28,29], the SMS-
for nonlinear continuous-time systems in [25]. In [26], an optimal ETC based control method has stronger robustness and improves the
problem was addressed for nonlinear continuous-time systems affected response speed of the system. Furthermore, different from the
by asymmetric control constraints. Then, Yang et al. [27] designed a actor-critic networks in [38,40], the single-critic network can
decentralized ETC method for interconnected nonlinear dynamical sys- reduce the approximation error in the implementation process
tems subject to asymmetric input constraints, where the control signal of the actor network.
and the auxiliary control signal were updated simultaneously in the
event-triggered manner. It is worth noting that these schemes adopted 2. Problem formulation and preliminaries
static ETC strategies, whose threshold parameters are fixed. Different
from static ETC strategies, dynamic ETC (DETC) strategies can dynam- 2.1. System model
ically adjust the threshold parameters by introducing a monotonically
decreasing dynamic variable, such that fewer triggers are generated Consider the following general nonlinear continuous-time nonlinear
and the resource utilization is further enhanced. Recently, numerous systems:
outstanding achievements on optimal DETC have been made in the
works [28,29]. To mention a few, a stochastic optimal DETC problem ̇ = 𝑓 (𝑥(𝑡)) + 𝜙(𝑥(𝑡))𝑢(𝑡) + 𝑑(𝑥(𝑡))
𝑥(𝑡) (1)
was addressed for nonlinear systems in [28], where an actor network where 𝑥 ∈ 𝑅𝑛
is the system state, 𝑢(𝑡) ∈ 𝑅𝑚
represents the input
was aperiodically tuned in dynamic event-triggered manner. In [29], vector, 𝑓 (𝑥) ∈ 𝑅𝑚 indicates the unknown drift dynamic, 𝜙(𝑥) ∈ 𝑅𝑛×𝑚
a distributed fault tolerant DETC approach was presented for nonlin- indicates the unknown input gain matrix, and 𝑑(𝑥(𝑡)) ∈ 𝑅𝑛 represents
ear interconnected systems with actuator faults, which introduced the the uncertain perturbation with 𝑑(𝑥) = 𝜙(𝑥)𝛥(𝑥). For the sake of clarity
dynamic triggering mechanism into the ADP process. Unfortunately, and ease of comprehension, the independent variable 𝑡 is omitted in
all aforementioned control results were obtained without taking the certain formulae and equations. When 𝑑(𝑥) = 0, the nominal system
robustness into account. can be defined as
To improve the robustness, sliding mode control (SMC) was pro-
posed in [30], which achieved various control effects based on the 𝑥̇ = 𝑓 (𝑥) + 𝜙(𝑥)𝑢. (2)
changeable slide mode surface (SMS) variables composed of system
In accordance with [38], the SMS variable is given by
states. In addition, SMC possessed other advantages, such as fast re-
sponse [31], insensitivity to system uncertainty, and invariance to 𝑠 = 𝛯𝑥 + 𝑥, (3)
external perturbations [32]. To date, many categories of SMC ap- [ ]𝑇
𝑡 𝑡 𝑡
proaches have been developed, such as terminal SMC approach, frac- where 𝑥 = ∫0 𝑥1 (𝜏)𝑑𝜏, ∫0 𝑥2 (𝜏)𝑑𝜏, … , ∫0 𝑥𝑛 (𝜏)𝑑𝜏 ∈ 𝑅𝑛 , and 𝛯 ∈ 𝑅𝑛×𝑛
tional SMC one, integral SMC one, etc. In recent years, numerous SMC stands for a diagonal positive-define matrix. Using (3), the derivation
problems have been intensively investigated. In [33], a disturbance of the SMS variable 𝑠 is obtained as
compensation-based adaptive nonsingular terminal SMC method was
proposed for magnetic levitation systems. In [34], an approximation- 𝑠̇ = 𝛯𝑥 + 𝑥.
̇ (4)
based adaptive fractional SMC method was developed for microgyro- According to [41], 𝛥 is assumed to have an upper bound, i.e., ‖𝛥(𝑥)‖
scope systems, which had the merits of fractional calculus and SMC. ̄
≤ 𝛥(𝑥) ̄
with 𝛥(0) = 0. To handle the optimal control issue, we design
In [35], a fixed-time tracking control method was designed for high- a control policy 𝑢 to make the system (1) stable and minimize the
order nonlinear systems with matched disturbances, which employed performance index described as
the sliding mode disturbance observer to estimate the external dis- ∞
turbances. Unfortunately, these control methods did not consider the 𝐽 (𝑥0 , 𝑠0 ) = (𝜂 𝛥̄2 (𝑥(𝜏)) + 𝑄(𝑠(𝜏)) + 𝑢𝑇 (𝜏)𝑢(𝜏))𝑑𝜏, (5)
∫0
situation that the systems are under a minimum cost function condition.
To minimize the cost function, a nearly optimal fault-tolerant terminal where 𝜂 > 0 represents a designed parameter and 𝑄(𝑠) represents a
SMC method was designed for multi-axis servo system in [36]. An positive-definite function.
online SMC problem was studied for space circumnavigation missions Prior to carrying on further analysis, an assumption used in [26,42]
in [37], which used a learning-based ADP technique to learn the was presented.

2
T. Wang et al. Neurocomputing 601 (2024) 128176

Assumption 1. The mapping 𝑓 (𝑥) is locally Lipschitz continuous and where 𝐵 ∗ , 𝑊𝑥∗ , 𝑊𝑢∗ , and 𝐵𝑢∗ represent the unknown ideal weight
𝑓 (0) = 0. Specifically, for 𝑥 ∈ 𝛺, there exists ‖𝑓 (𝑥)‖ ≤ 𝑘𝑓 ‖𝑥‖ with 𝑘𝑓 matrices, 𝜀𝑥 stands for the reconstruction error and 𝜃𝑥 (⋅) represents the
being a positive constant. In addition, 𝜙(𝑥) is bounded over 𝛺, that is, activation function, which has the monotonically increasing property,
‖𝜙(𝑥)‖ ≤ 𝑘𝜙 with 𝑘𝜙 being a positive constant. i.e.,

Let 𝐴𝑢 be an admissible control set [14]. For the admissible control 0 ≤ 𝜃𝑥 (𝑥1 ) − 𝜃𝑥 (𝑥2 ) ≤ 𝛤 (𝑥1 − 𝑥2 ) (15)
𝑢 ∈ 𝐴𝑢 , the cost function containing the SMS variables is defined as
with 𝑥1 ≥ 𝑥2 and 𝛤 > 0. In this article, 𝜃𝑥 (⋅) is selected as 𝑡𝑎𝑛ℎ(⋅).

Using (14), the corresponding data-driven model is given as
𝑉 (𝑥, 𝑠) = (𝜂 𝛥̄2 (𝑥(𝜏)) + 𝑄(𝑠(𝜏)) + 𝑢𝑇 (𝜏)𝑢(𝜏))𝑑𝜏. (6)
∫𝑡
𝑥̂̇ = 𝐵̂ 𝑇 𝑥̂ + 𝑊̂ 𝑥𝑇 𝜃𝑥 (𝑥)
̂ + 𝑊̂ 𝑢𝑇 𝑢 + 𝐵̂ 𝑢𝑇 − 𝛾, (16)
Remark 1. Based on (6), 𝜂 is chosen to make the inequality (13) hold. where 𝑥̂ stands for the estimated system states, 𝐵, ̂ 𝑊̂ 𝑥 , 𝑊̂ 𝑢 , and 𝐵̂ 𝑢
And the variable 𝑄(𝑠) determines the proportion of the system state to represent the estimated weight matrices, 𝛾 is utilized to eliminate the
the cost function. adverse effects of the reconstruction error, and it is formulated as

Thus, the Hamiltonian function is given as 𝜑̂ 𝑥̃


𝛾 = 𝐷𝑥̃ + (17)
𝑥̃ 𝑇 𝑥̃ + 1
𝐻(𝑥, 𝑠, ∇𝑉 , 𝑢) = 𝜂 𝛥̄2 (𝑥(𝜏)) + 𝑄(𝑠) + 𝑢𝑇 𝑢 + ∇𝑉 𝑇 𝑠.̇ (7) with 𝐷 ∈ 𝑅𝑛×𝑛 being a designed matrix, 𝑥̃ = 𝑥 − 𝑥̂ being the modeling
Let the optimal cost function 𝑉 ∗ (𝑥, 𝑠) be error and 𝜑̂ ∈ 𝑅 being a designed constant.

𝑉 ∗ (𝑥, 𝑠) = (𝜂 𝛥̄2 (𝑥(𝜏)) + 𝑄(𝑠(𝜏)) + 𝑢∗𝑇 (𝜏)𝑢∗ (𝜏))𝑑𝜏. (8) Assumption 2. ̃ there exists 𝜀𝑇𝑥 𝜀𝑥 ≤ 𝜑∗ 𝑥̃ 𝑇 𝑥.
For 𝜀𝑥 and 𝑥, ̃
∫𝑡
Combining (14) and (16) yields
Define the HJBE as
𝑥̃̇ =𝐵 ∗𝑇 𝑥̃ + 𝐵̃ 𝑇 𝑥̂ + 𝑊𝑥∗𝑇 𝜃̃𝑥 (𝑥)
̃ +𝑊 ̃ 𝑇 𝜃𝑥 (𝑥)
̂ +𝑊 ̃ 𝑇𝑢
min 𝐻(𝑥, 𝑠, ∇𝑉 ∗ , 𝑢) = 0. (9) 𝑥

𝑢
𝑢∈𝐴𝑢 𝜑̃ 𝑥̃ 𝜑 𝑥̃
+ 𝐵̃ 𝑢𝑇 + 𝜀𝑥 + 𝐷𝑥̃ − 𝑇 + 𝑇 , (18)
To satisfy the stationary condition [43], the optimal control policy 𝑥̃ 𝑥̃ + 1 𝑥̃ 𝑥̃ + 1
can be deduced as where 𝐵̃ = 𝐵 ∗ − 𝐵, ̂ 𝑊 ̃ 𝑥 = 𝑊 ∗ − 𝑊̂ 𝑥 , 𝑊
𝑥
̃ 𝑢 = 𝑊 ∗ − 𝑊̂ 𝑢 , 𝐵̃ 𝑢 = 𝐵 ∗ − 𝐵̂ 𝑢 ,
𝑢 𝑢
1 ̃𝜃𝑥 (𝑥)
̃ = 𝜃𝑥 (𝑥) − 𝜃𝑥 (𝑥) ̂ and 𝜑̃ = 𝜑 − 𝜑.∗ ̂
𝑢∗ (𝑥, 𝑠) = − 𝜙𝑇 (𝑥)∇𝑉 ∗ (𝑥, 𝑠). (10)
2
Replacing 𝑢 in (9) with 𝑢∗ (𝑥, 𝑠), the HJBE (9) becomes Theorem 1. Let Assumption 2 hold. The modeling error 𝑥̃ converges
asymptotically to 0 if the weight matrices and the parameters in (16) are
∇𝑉 ∗𝑇 (𝑥, 𝑠)(𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢∗ (𝑥, 𝑠)) + 𝜂 𝛥̄2 (𝑥) + 𝑄(𝑠) + 𝑢∗𝑇 (𝑥, 𝑠)𝑢∗ (𝑥, 𝑠) = 0. tuned via the following equations:
(11) 𝐵̂̇ = 𝑝1 𝑥̂ 𝑥̃ 𝑇
Then, the following lemma proves the stability of the unknown 𝑊̂̇ 𝑥 = 𝑝2 𝜃𝑥 (𝑥)̂ 𝑥̃ 𝑇
system (1).
𝑊̂̇ 𝑢 = 𝑝3 𝑢𝑥̃ 𝑇

Lemma 1. There exists a continuously differentiable cost function 𝑉 ∗ (𝑥, 𝑠), 𝐵̂̇ = 𝑝 𝑥̃ 𝑇
𝑢 4
which satisfies 𝑉 ∗ (𝑥, 𝑠) > 0 for all 𝑥 ≠ 0 and 𝑉 ∗ (0) = 0. Then, the 𝑥̃ 𝑇 𝑥̃
𝜑̂̇ = −𝑝5 , (19)
control policy 𝑢∗ (𝑥, 𝑠) given by (10) can make the considered system (1) 𝑥̃ 𝑇 𝑥̃ + 1
asymptotically stable. where 𝑝𝑖 , 𝑖 = 1, 2, … , 5 represent symmetric positive definite matrices.

Proof. Observing (10), we obtain Proof. Let the Lyapunov function candidate be
∗𝑇 ∗𝑇
∇𝑉 (𝑥, 𝑠)𝜙(𝑥) = −2𝑢 (𝑥, 𝑠). (12) 𝐿𝑑 = 𝐿𝑥̃ + 𝐿𝑎 , (20)
Based on (11) and (12), one has that where
1
𝑉̇ ∗ (𝑥, 𝑠) =∇𝑉 ∗𝑇 (𝑥, 𝑠)(𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)(𝑢∗ (𝑥, 𝑠) + 𝛥(𝑥))) 𝐿𝑥̃ = 𝑥̃ 𝑇 𝑥,
̃ (21)
2
= − 𝜂 𝛥̄2 (𝑥) − 𝑄(𝑠) − 𝑢∗𝑇 (𝑥, 𝑠)𝑢∗ (𝑥, 𝑠) − 2𝑢∗𝑇 (𝑥, 𝑠)𝛥(𝑥) and
= − 𝜂 𝛥̄2 (𝑥) − 𝑄(𝑠) + 𝛥2 (𝑥) − (𝑢∗ (𝑥, 𝑠) + 𝛥(𝑥))𝑇 (𝑢∗ + 𝛥(𝑥)) 1
𝐿𝑎 = 𝑡𝑟{𝐵̃ 𝑇 𝑝−11
̃ 𝑇 𝑝−1 𝑊
𝐵̃ + 𝑊 𝑥 2
̃ 𝑇 𝑝−1 𝑊
̃𝑥 +𝑊
𝑢 3
̃𝑢
≤ − 𝑄(𝑠) − (𝜂 − 𝜆max (𝐼))𝛥̄2 (𝑥) − (𝑢∗ (𝑥, 𝑠) + 𝛥(𝑥))𝑇 (𝑢∗ (𝑥, 𝑠) + 𝛥(𝑥)). 2
1
(13) + 𝐵̃ 𝑢𝑇 𝑝−1
4
𝐵̃ 𝑢 } + 𝜑̃ 𝑇 𝑝−1
5
𝜑.
̃ (22)
2
The inequality (13) indicates that 𝑉̇ ∗ (𝑥, 𝑠) < 0 holds for 𝑠 ≠ 0 Differentiating 𝐿𝑥̃ along the solution of (18) yields
only the condition 𝜂 ≥ 𝜆max (𝐼) is satisfied. Hence, the system (1) is
𝐿̇ 𝑥̃ =𝑥̃ 𝑇 𝐵 ∗𝑇 𝑥̃ + 𝑥̃ 𝑇 𝐵̃ 𝑇 𝑥̂ + 𝑥̃ 𝑇 𝑊𝑥∗𝑇 𝜃̃𝑥 (𝑥)
̃ + 𝑥̃ 𝑇 𝑊
̃ 𝑇 𝜃𝑥 (𝑥)
𝑥 ̂
guaranteed to be asymptotic stability and the SMS variables are forced
to 0. □ 𝜑̃ 𝑥̃ 𝑇 𝑥̃ 𝜑∗ 𝑥̃ 𝑇 𝑥̃
+ 𝑥̃ 𝑇 𝑊
̃ 𝑇 𝑢 + 𝑥̃ 𝑇 𝐵̃ 𝑇 + 𝑥̃ 𝑇 𝜀𝑥 + 𝑥̃ 𝑇 𝐷𝑥̃ −
𝑢 𝑢 𝑇
+ 𝑇 . (23)
𝑥̃ 𝑥̃ + 1 𝑥̃ 𝑥̃ + 1
3. Data-driven model Using Young’s inequality [45,46], we deduce from (15) that
𝑥̃ 𝑇 𝑊𝑥∗𝑇 𝜃̃𝑥 (𝑥)
̃ = 𝑥̃ 𝑇 𝑊𝑥∗𝑇 (𝜃𝑥 (𝑥) − 𝜃𝑥 (𝑥))
̂
Since the system dynamics are assumed to be completely unknown,
a data-driven model is given in this section. Then, by introducing ≤ 𝛤 𝑥̃ 𝑇 𝑊𝑥∗𝑇 𝑥̃
a adjustable term associated with the approximation error, the state 1 1
≤ 𝑥̃ 𝑇 𝑊𝑥∗𝑇 𝑊𝑥∗ 𝑥̃ + 𝛤 2 𝑥̃ 𝑇 𝑥.
̃ (24)
approximation error can be removed. 2 2
Using [44], the system (2) is rewritten as According to Assumption 2, we have
1 𝑇 1
̇ = 𝐵 ∗𝑇 𝑥 + 𝑊𝑥∗𝑇 𝜃𝑥 (𝑥) + 𝑊𝑢∗𝑇 𝑢 + 𝐵𝑢∗𝑇 + 𝜀𝑥 ,
𝑥(𝑡) (14) 𝑥̃ 𝑇 𝜀𝑥 ≤ 𝑥̃ 𝑥̃ + 𝜑∗ 𝑥̃ 𝑇 𝑥.
̃ (25)
2 2

3
T. Wang et al. Neurocomputing 601 (2024) 128176

Combining (23), (24) and (25) yields Between two consecutive trigger moments, there will be a gap
1 1 between the continuous state 𝑥(𝑡) and the sample state 𝑥̄ 𝓁 . The gap is
𝐿̇ 𝑥̃ ≤𝑥̃ 𝑇 𝐵 ∗𝑇 𝑥̃ + 𝑥̃ 𝑇 𝐵̃ 𝑇 𝑥̂ + 𝑥̃ 𝑇 𝑊𝑥∗𝑇 𝑥𝑊 ̃ 𝑥∗ + 𝛤 2 𝑥̃ 𝑇 𝑥̃ defined as a state error function 𝑥̄ 𝓁 (𝑡) and is represented as
2 2
1 𝑇
+ 𝑥̃ 𝑇 𝑊 ̃ 𝑇 𝜃𝑥 (𝑥)
𝑥 ̂ + 𝑥
̃ 𝑇 ̃ 𝑇
𝑊 𝑢 𝑢 + 𝑥
̃ 𝑇 ̃𝑇
𝐵 𝑢 + 𝑥̃ 𝑥̃ 𝑒𝓁 (𝑡) = 𝑥̄ 𝓁 − 𝑥(𝑡), 𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ). (32)
2
𝑇
𝜑̃ 𝑥̃ 𝑥̃ ∗
𝜑 𝑥̃ 𝑇 𝑥̃
1 Similarly, a SMS error function 𝑠̄𝓁 (𝑡) is given as
+ 𝜑∗ 𝑥̃ 𝑇 𝑥̃ + 𝑥̃ 𝑇 𝐷𝑥̃ − 𝑇 + 𝑇
2 𝑥̃ 𝑥̃ + 1 𝑥̃ 𝑥̃ + 1
≤𝑥̃ 𝑇 𝐸 𝑥̃ + 𝑥̃ 𝑇 𝐵̃ 𝑇 𝑥̂ + 𝑥̃ 𝑇 𝑊̃ 𝑇 𝜃𝑥 (𝑥) ̂ + 𝑥̃ 𝑇 𝑊̃ 𝑇𝑢 𝑒𝑠𝓁 (𝑡) = 𝑠̄𝓁 − 𝑠(𝑡), 𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ). (33)
𝑥 𝑢
𝜑̃ 𝑥̃ 𝑇 𝑥̃ From (32) and (33), it can be found that 𝑒𝓁 (𝑡) = 𝑒𝑠𝓁 (𝑡) = 0 when an
+ 𝑥̃ 𝑇 𝐵̃ 𝑢𝑇 − , (26)
𝑥̃ 𝑇 𝑥̃ + 1 event is triggered.
where 1
𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) = − 𝜙𝑇 (𝑥̄ 𝓁 )∇𝑉 ∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ), (34)
1 1 + 𝛤 2 + 3𝜑∗ 2
𝐸 = 𝐵 ∗𝑇 + 𝑊𝑥∗𝑇 𝑊𝑥∗ + 𝐷 + 𝐼𝑛 .
2 2 where ∇𝑉 ∗ (𝑠̄𝓁 ) = 𝑉 ∗ (𝑠)∕ 𝑠|𝑠=𝑠̄𝓁 .
Then, 𝐿̇ 𝑎 becomes Due to the discontinuity of the triggering instants 𝑡𝓁 , 𝓁 ∈ 𝑁,
the control signal 𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) are discrete. Through a zero-order hold
𝐿̇ 𝑎 =𝑡𝑟{𝐵̃ 𝑇 𝑝−1
1
𝐵̃̇ + 𝑊 𝑥 2
̃̇ 𝑥 + 𝑊
̃ 𝑇 𝑝−1 𝑊 ̃̇ 𝑢
̃ 𝑇 𝑝−1 𝑊
𝑢 3 technique [49], the continuous control signal containing the triggering
̃ 𝑇 −1 ̃̇ 𝑇 −1 ̇ state and the triggering SMS variable is formulated as
+ 𝐵 𝑝 𝐵 } + 𝜑̃ 𝑝 𝜑̃
𝑢 4 𝑢 5
= − 𝑡𝑟{𝐵̃ 𝑇 𝑥̂ 𝑥̃ 𝑇 + 𝑊
̃ 𝑇 𝜃𝑥 (𝑥)
̂ 𝑥̃ 𝑇 + 𝑊
̃ 𝑇 𝑢𝑥̃ 𝑇 1
𝑥 𝑢 𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 , 𝑡) = 𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) = − 𝜙𝑇 (𝑥̄ 𝓁 )∇𝑉 ∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ). (35)
2
𝑥̃ 𝑇 𝑥̃
+ 𝐵̃ 𝑢𝑇 𝑥̃ 𝑇 } + 𝜑̃ 𝑇 . (27) Inserting (35) into (11), the event-triggered HJBE is obtained as
𝑥̃ 𝑇 𝑥̃ + 1
Putting together (26) and (27) gives 𝐻(𝑥, 𝑠, ∇𝑉 , 𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ))

𝐿̇ 𝑑 ≤ 𝑥̃ 𝑇 𝐸 𝑥.
̃ (28) =∇𝑉 ∗𝑇 (𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ))
+ 𝜂 𝛥̄2 (𝑥) + 𝑄(𝑠) + 𝑢∗𝑇 (𝑥̄ 𝓁 , 𝑠̄𝓁 )𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 )
By choosing an appropriate matrix 𝐷, we can obtain 𝐸 < 0, which
ensures that 𝐿̇ 𝑑 < 0. According to the Lyapunov stability theory [47], =0 (36)
𝑥̃ will asymptotically converge to 0. □ with 𝑉 ∗ (0) = 0.

Remark 2. Based on Theorem 1, the estimated weighted matrices 𝐵, ̂


4.2. Dynamic TC
𝑊̂ 𝑥 , 𝑊̂ 𝑢 and 𝐵̂ 𝑢 in (16) can converge to 𝐵 ∗ , 𝑊𝑥∗ , 𝑊𝑢∗ and 𝐵𝑢∗ .
Assumption 3. For 𝑥, 𝑠, 𝑥̄ 𝓁 and 𝑠̄𝓁 , there exists a positive constant 𝜉,
Remark 3. In comparison with the identifier-critic method [48], the ‖ 𝑠‖
which makes the inequality ‖ ‖ ‖ ‖
‖𝑢 (𝑥, 𝑠) − 𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ ≤ 𝜉 ‖𝑠 − 𝑠̄𝓁 ‖ = 𝜉 ‖
∗ ∗ 𝑒 ‖
data-driven method can obtain all information of the system dynamics, ‖ 𝓁‖
valid.
including the internal dynamic and the matrix function. Furthermore,
the data-driven method can force the resulting modeling errors to 0. Based on the static ETC strategy, a dynamic TC related to the SMS
variable is constructed as
Hence, the nominal system (2) becomes { }
𝑡𝓁+1 = inf 𝑡 > 𝑡𝓁 |𝑘𝛼(𝑡) + 𝐺(𝑠, 𝑒𝑠𝓁 ) < 0 , (37)
𝑥̇ = 𝐵 𝑇 𝑥 + 𝑊𝑥𝑇 𝜃𝑥 (𝑥) + 𝑊𝑢𝑇 𝑢 + 𝐵𝑢𝑇 . (29)
where 𝑘 represents a positive parameter and 𝛼(𝑡) ∈ 𝑅 satisfies
Using (2) and (29), 𝑓 (𝑥) and 𝜙(𝑥) can be formulated as
̇ = −𝜌𝛼(𝑡) + 𝐺(𝑠, 𝑒𝑠𝓁 ), 𝛼(0) = 𝛼 0 ≥ 0
𝛼(𝑡) (38)
𝑓 (𝑥) = 𝐵 𝑇 𝑥 + 𝑊𝑥𝑇 𝜃𝑥 (𝑥) + 𝐵𝑢𝑇
with 𝜌 being a positive constant, and
𝜙(𝑥) = 𝑊𝑢𝑇 . (30)
‖ ‖2
𝐺(𝑠, 𝑒𝑠𝓁 ) = (1 − 𝛽 2 )𝑄(𝑠) − 2𝜉 2 ‖𝑒𝑠𝓁 ‖ (39)
‖ ‖
with 0 < 𝛽 < 1 being a parameter, 𝑄(𝑠) being given in (6), and 𝜉 > 0
4. Dynamic event-triggered strategy being a Lipschitz constant described in Assumption 3.

Remark 4. If 𝛼(𝑡) → 0 or 𝑘 → 0, (37) can be written as


Firstly, we introduce a static ETC strategy and present the event-
{ }
driven HJBE for the nonlinear system (1). In accordance with the static 𝑡𝓁+1 = inf 𝑡 > 𝑡𝓁 |𝐺(𝑠, 𝑒𝑠𝓁 ) < 0 . (40)
triggering condition (TC), a dynamic TC is constructed. After that, the
event-driven HJBE is solved by using the RL algorithm. Finally, the Obviously, it is a corresponding static TC. Hence, the static TC is a
special case of dynamic TC. Furthermore, by comparing the two condi-
stability analysis of the nonlinear system is given.
tions, the dynamic TC (37) can reduce more communication resources,
which will be illustrated in Section 6.
4.1. Static ETC strategy
As we progress with our exposition, a lemma is introduced to help
Let the triggering instants be 𝑡𝓁 and satisfy 𝑡𝓁 < 𝑡𝓁+1 . Thus, we can our subsequent analysis.
obtain the sequence of triggering instants described as {𝑡𝓁 }∞0
. Once the
time reaches the triggering instant 𝑡𝓁 , the system state and the SMS are Lemma 2. Let the adaptive law of 𝛼(𝑡) be given as (38). Then, there exists
sampled as 𝛼(𝑡) ≥ 0, 𝑡 ∈ [0, ∞).

𝑥̄ 𝓁 = 𝑥(𝑡𝓁 ), 𝓁 = 0, 1, 2, … Proof. Utilizing (37), the inequality can be obtained as


𝑠̄𝓁 = 𝑠(𝑡𝓁 ), 𝓁 = 0, 1, 2, … (31) 𝑘𝛼(𝑡) + 𝐺(𝑠, 𝑒𝑠𝓁 ) ≥ 0. (41)

4
T. Wang et al. Neurocomputing 601 (2024) 128176

Through (38) and (41), it can guarantee the next relation: with

̇ + 𝜌𝛼(𝑡) ≥ −𝑘𝛼(𝑡).
𝛼(𝑡) (42) 𝜍ℎ = ∇𝜎𝑐 (𝑠(𝑡ℎ ))(𝛯𝑥 + 𝑓 (𝑥ℎ ) + 𝜙(𝑥(𝑡ℎ ))𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )). (54)

Based on the comparison lemma, it follows: With the help of the gradient descent approach and the normaliza-
0 −(𝜌+𝑘)𝑡 tion technique, the tuning rule of 𝜔̂ 𝑐 can be derived as
𝛼(𝑡) ≥ 𝛼 𝑒 , 𝑡 ∈ [0, 𝑡∞ ). (43)
𝑁
𝑘𝑐 𝜕𝐸 ∑ 0
𝑘𝑐 𝜕𝐸
From (43), 𝛼(𝑡) is greater than a positive exponential function, which 𝜔̂̇ 𝑐 = − ( )2 𝜕 𝜔̂ − ( 𝑇
)2 𝜕 𝜔̂
means that 𝛼(𝑡) ≥ 0. □ 𝑇
1+𝜍 𝜍 𝑐 ℎ=1 1 + 𝜍ℎ 𝜍ℎ 𝑐

𝑁
𝑘𝑐 𝜍 ∑ 0
𝑘𝑐 𝜍ℎ
4.3. RL for solving the event-triggered HJBE = −( )2 𝑒𝑐 − ( )2 𝑒𝑐,ℎ , (55)
1 + 𝜍𝑇 𝜍 𝑇
ℎ=1 1 + 𝜍ℎ 𝜍ℎ

Using the single-critic RL algorithm, the dynamic event-triggered where 𝑘𝑐 denotes a adjustable parameter, and 𝜍 and 𝜍ℎ is defined as
HJBE (36) will be solved. With the aid of neural networks [50,51], (51) and (54), respectively.
𝑉 ∗ (𝑠) is redescribed as Define the weight estimation error as 𝜔̃ 𝑐 = 𝜔𝑐 − 𝜔̂ 𝑐 . To facilitate the
later analysis, 𝛾 and 𝛾ℎ are given by
𝑉 ∗ (𝑠) = 𝜔𝑇𝑐 𝜎𝑐 (𝑠) + 𝛿𝑐 (𝑠), (44)
( )2 ( )2
where 𝜔𝑐 ∈ 𝑅𝑁𝑐 indicates the ideal weight vector, 𝜎𝑐 (𝑠) = 𝛾 = 𝜍∕ 1 + 𝜍 𝑇 𝜍 , 𝛾ℎ = 𝜍ℎ ∕ 1 + 𝜍ℎ 𝑇 𝜍ℎ . (56)
[𝜎𝑐1 (𝑠), 𝜎𝑐2 (𝑠), … , 𝜎𝑁𝑐 (𝑠)]𝑇 ∈ 𝑅𝑁𝑐 indicates the activation function Based on (55), we have
vector with 𝜎𝑐1 (𝑠), 𝜎𝑐2 (𝑠), … , 𝜎𝑁𝑐 (𝑠) being linearly independent, and ( )
𝑁0
𝛿𝑐 (𝑠) ∈ 𝑅 stands for the approximation error. ∑ 𝑘𝑐 𝛾
𝑇 𝑇
𝜔̃̇ 𝑐 = − 𝑘𝑐 𝛾𝛾 + 𝛾ℎ 𝛾ℎ 𝜔̃ 𝑐 + 𝛿𝐻
𝜕𝑉 ∗ (𝑠) | 1 + 𝜍𝑇 𝜍
𝑉𝑠̄∗ = | = ∇𝜎𝑐𝑇 (𝑠̄𝓁 )𝜔𝑐 + ∇𝛿𝑐 (𝑠̄𝓁 ). (45)
ℎ=1
|
𝓁 𝜕𝑠 |𝑠=𝑠̄𝓁 𝑁0
∑ 𝑘𝑐 𝛾ℎ
+ 𝛿𝐻,ℎ , (57)
Using (35) and (45), we have ℎ=1 1 + 𝜍ℎ𝑇 𝜍ℎ
1 where
𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) = − 𝜙𝑇 (𝑥̄ 𝓁 )∇𝜎𝑐𝑇 (𝑠̄𝓁 )𝜔𝑐 + ∇𝛿𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 ), (46)
2
where ∇𝛿𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 ) = −(1∕2)𝜙𝑇 (𝑥̄ 𝓁 )𝛿𝑐 (𝑠̄𝓁 ). 𝛿𝐻 = − ∇𝛿𝑐𝑇 (𝑠)(𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ))
Since 𝜔𝑐 is unavailable, 𝑢∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) in (46) cannot be implemented. 𝛿𝐻,ℎ = − ∇𝛿𝑐𝑇 (𝑠(𝑡ℎ ))(𝛯𝑥 + 𝑓 (𝑥ℎ )
To overcome this issue, the corresponding approximate value 𝑉̂ (𝑠) is
+ 𝜙(𝑥(𝑡ℎ ))𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )). (58)
taken into account, and 𝑉̂ (𝑠) is formulated as

𝑉̂ (𝑠) = 𝜔̂ 𝑇𝑐 𝜎𝑐 (𝑠), (47) Remark 5. By adding the summation term to the update rule (55), we
can make 𝜔̂ 𝑐 → 𝜔𝑐 without considering the persistence of excitation
where 𝜔̂ stands for the estimated weight vector.
condition. According to [43], the marked historical SMS variable data
Utilizing (47), we have is required to satisfy that
1
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ) = − 𝜙𝑇 (𝑥̄ 𝓁 )∇𝜎𝑐𝑇 (𝑠̄𝓁 )𝜔̂ 𝑐 .
𝑢( (48) 𝑟𝑎𝑛𝑘[𝜎𝑐 (𝑠(𝑡1 )), 𝜎𝑐 (𝑠(𝑡2 )), … , 𝜎𝑐 (𝑠(𝑡𝑁0 ))] = 𝑁𝑐
2
Replacing ∇𝑉 ∗ and 𝑢∗ (𝑥, 𝑠) in (11) with ∇𝑉̂ and 𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ), respec- 𝑁0 > 𝑁𝑐 . (59)
tively, one obtains
̂
𝐻(𝑥, 𝑠, ∇𝑉̂ , 𝑢( ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )) + 𝜂 𝛥̄2 (𝑥(𝜏))
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )) =∇𝑉̂ 𝑇 (𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢(
Remark 6. In this paper, a single critic network is employed to
+ 𝑄(𝑠) + 𝑢̂ 𝑇 (𝑥̄ 𝓁 , 𝑠̄𝓁 )𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ). (49)
implement the designed control method. In comparison with the con-
Based on the fact that 𝐻(𝑥, 𝑠, ∇𝑉 ∗ , 𝑢∗ (𝑥𝓁 , 𝑠𝓁 )) = 0, an error function ventional critic-actor network architecture [38], the implementation
is defined as process is simplified and the computational burden is relatively re-
duced.
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )) − 𝐻(𝑥, 𝑠, ∇𝑉 ∗ , 𝑢∗ (𝑥𝓁 , 𝑠𝓁 ))
ℵ𝑐 = 𝐻(𝑥, 𝑠, ∇𝑉̂ , 𝑢(
= 𝜔̂ 𝑇𝑐 𝜍 + 𝜂 𝛥̄2 (𝑥) + 𝑄(𝑠) + 𝑢̂ 𝑇 (𝑥̄ 𝓁 , 𝑠̄𝓁 )𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ), (50) Remark 7. On the basis of the previous static ETC schemes [25–27],
the dynamic event-triggered control method further reduces the com-
where
putation burden by introducing the dynamic variable 𝛼(𝑡) associated
𝜍 = 𝜎𝑐 (𝑠)(𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )). (51) with the SMS into the static threshold condition.

To achieve the target of 𝑒𝑐 → 0, 𝜔̂ 𝑐 needs to be tuned to minimize the To make (59) hold, we should select enough number of historical
square function 𝐸 = 0.5ℵ𝑇𝑐 ℵ𝑐 . By utilizing the historical SMS variable SMS variable data, i.e., 𝑁0 >𝑁𝑐 . Moreover, the historical data and the
data, the square function 𝐸 becomes concurrent data are used to tune the weight vector 𝜔𝑐 . This technique
𝑁0 was called as concurrent learning or experience replay in [39].
1 𝑇 1∑ 𝑇 Based on the preceding discussions, a block diagram is depicted
𝐸ℎ = ℵ𝑐 ℵ𝑐 + ℵ ℵ , (52)
2 2 ℎ=1 𝑐,ℎ 𝑐,ℎ
to illustrate the developed data-driven-based DETC method for the
{ } close-loop system (see Fig. 1).
where ℎ ∈ 1, 2, … , 𝑁0 denotes the index of the marked historical SMS
variable 𝑠(𝑡ℎ ), 𝑡ℎ ∈ [𝑡𝓁 , 𝑡𝓁+1 ), 𝑁0 denotes the total number of marked
historical SMS variables, and ℵ𝑐,ℎ is formulated as 4.4. Stability analysis

ℵ𝑐,ℎ = ℵ𝑐 (𝑠(𝑡ℎ ))
Before proceeding, an assumption is provided, which was exten-
= 𝜔̂ 𝑇𝑐 𝜍ℎ + 𝜂 𝛥̄2 (𝑥) + 𝑄(𝑠) + 𝑢̂ 𝑇 𝑢,
̂ (53) sively used in [27,52].

5
T. Wang et al. Neurocomputing 601 (2024) 128176

‖ 1 ‖2 2‖ 𝑠 ‖
2
≤ 2‖ 𝑇 ‖
‖− 2 𝜙(𝑥̄ 𝓁 )∇𝜎𝑐 (𝑠̄𝓁 )𝜔̃ 𝑐 + ∇𝛿𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ + 2𝜉 ‖𝑒̄ ‖
‖ ‖ ‖ 𝓁‖
2‖ 𝑠 ‖
2
≤ 𝑏2𝜙 𝑏2𝜎 ‖ ‖2
‖𝜔̃ 𝑐 ‖ + 2𝜉 ‖ 𝑒̄ ‖ + 4𝑏2𝛿 . (64)
‖ 𝓁‖ 𝑢

Using (37) and (64), the expression (63) becomes


2‖ 𝑠 ‖
2
𝐿̇ 2 ≤ − 𝜂 𝛥̄2 (𝑥) − 𝑄(𝑠) − ‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ 2 2 ‖ ‖2
2
‖𝑢( ‖ + 𝑏𝜙 𝑏𝜎 ‖𝜔̃ 𝑐 ‖ + 2𝜉 ‖𝑒 ‖ + 4𝑏2𝛿
‖ 𝓁‖ 𝑢

2‖ 𝑠 ‖
2
2
− 𝜌𝛼(𝑡) + (1 − 𝛽 )𝑄(𝑠) − 2𝜉 ‖𝑒𝓁 ‖
‖ ‖
≤ − 𝜂 𝛥̄2 (𝑥) − ‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ 2 2 ‖ ‖2
2 2 2
‖𝑢( ‖ + 𝑏𝜙 𝑏𝜎 ‖𝜔̃ 𝑐 ‖ + 4𝑏𝛿 − 𝜌𝛼(𝑡) − 𝛽 𝑄(𝑠). (65)
𝑢

Combining (37) and (65), we have


𝐿̇ 3 ≤ − 𝜂 𝛥̄2 (𝑥) − ‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ 2 2 ‖ ‖2
2 2
‖𝑢( ‖ + 𝑏𝜙 𝑏𝜎 ‖𝜔̃ 𝑐 ‖ + 4𝑏𝛿 𝑢

− 𝜌𝛼(𝑡) − 𝛽 2 𝑄(𝑠). (66)


Using (57), the time derivative of 𝐿4 in (60) is formulated as
𝑁0
𝑘𝑐 𝜔̃ 𝑇𝑐 𝛾 ∑ 𝑘𝑐 𝜔̃ 𝑇𝑐 𝛾ℎ
𝐿̇ 4 = −𝑘𝑐 𝜔̃ 𝑇𝑐 ℑ(𝛾, 𝛾ℎ )𝜔̃ 𝑐 + 𝛿𝐻 + 𝛿𝐻,ℎ , (67)
1 + 𝜍𝑇 𝜍 ℎ=1 1 + 𝜍ℎ𝑇 𝜍ℎ
∑𝑁0
where ℑ(𝛾, 𝛾ℎ ) = 𝛾𝛾 𝑇 + ℎ=1 𝛾ℎ 𝛾ℎ𝑇 .
Fig. 1. Block diagram of the presented control architecture.
Applying Young’s inequality to the second term on the right-hand
side of (67), one has
𝑘𝑐 𝜔̃ 𝑇𝑐 𝛾 𝑘𝑐
Assumption 4. For some positive constants 𝑏𝛿𝑢 , 𝑏𝜎𝑐 and 𝑏𝛿𝐻 , there exist 𝛿𝐻 ≤ (𝜔̃ 𝑇 𝛾 𝜔̃ 𝑇 𝛾 + 𝛿𝐻
𝑇
𝛿𝐻 )
1 + 𝜍𝑇 𝜍 2(1 + 𝜍 𝑇 𝜍) 𝑐 𝑐
‖∇𝛿𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ ≤ 𝑏𝛿 𝑘 𝑘 𝑇
‖ ‖ 𝑢 ≤ 𝑐 𝜔̃ 𝑇𝑐 (𝛾𝛾 𝑇 )𝜔̃ 𝑐 + 𝑐 𝛿𝐻 𝛿𝐻 . (68)
‖∇𝜎𝑐 (𝑠)‖ ≤ 𝑏𝜎 2 2
‖ ‖ 𝑐
‖𝛿𝐻 ‖ ≤ 𝑏𝛿 . Similarly, there holds
‖ ‖ 𝐻
𝑁0 𝑁0
∑ 𝑘𝑐 𝜔̃ 𝑇𝑐 𝛾ℎ 𝑘𝑐 ∑
𝛿𝐻,ℎ ≤ 𝜔̃ 𝑇 (𝛾 𝛾 𝑇 )𝜔̃
ℎ=1 1 + 𝜍ℎ𝑇 𝜍ℎ 2 ℎ=1 𝑐 ℎ ℎ 𝑐
𝑁0
Theorem 2. Take the nominal system (2) and the dynamic event-triggered 𝑘𝑐 ∑
+ 𝛿𝑇 𝛿 . (69)
HJBE (49) into consideration under Assumptions 1–4. The control policy 2 ℎ=1 𝐻,ℎ 𝐻,ℎ
within an admissible control set 𝛺 is deigned, and the critic weight vector
is tuned through (55). Then, all signals are guaranteed to be bounded if the Inserting (68) and (69) into (67) and using Assumption 3, we obtain
𝑁0
TC is given by (37) and the condition (59) holds. 𝑘𝑐 𝑇 𝑘 𝑇 𝑘 ∑
𝐿̇ 4 ≤ − 𝜔̃ 𝑐 ℑ(𝛾, 𝛾ℎ )𝜔̃ 𝑐 + 𝑐 𝛿𝐻 𝛿𝐻 + 𝑐 𝛿𝑇 𝛿
2 2 2 ℎ=1 𝐻,ℎ 𝐻,ℎ
Proof. Choose the Lyapunov function candidate as
𝑘𝑐 𝑇 𝑘 (𝑁 + 1)
1 ≤− 𝜔̃ ℑ(𝛾, 𝛾ℎ )𝜔̃ 𝑐 + 𝑐 0 𝑏𝛿 2 . (70)
𝐿(𝑡) = 𝑉 ∗ (𝑥, ̄ + 𝑉 ∗ (𝑥, 𝑠) + 𝛼(𝑡) + 𝜔̃ 𝑇 𝜔̃ 𝑐 ,
̄ 𝑠) (60) 2 𝑐 2 𝐻
⏟⏞⏟⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ 2 𝑐 Using (66) and (70), the time derivative of (60) becomes
⏟⏟⏟
𝐿1 (𝑡) 𝐿2 (𝑡)
𝐿3 (𝑡)
𝐿̇ ≤ − 𝜂 𝛥̄2 (𝑥) − ‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ 2 2 ‖ ‖2
2 2 2
‖𝑢( ‖ + 𝑏𝜙 𝑏𝜎 ‖𝜔̃ 𝑐 ‖ + 4𝑏𝛿 − 𝜌𝛼(𝑡) − 𝛽 𝑄(𝑠)
𝑢
Then, we conduct the stability analysis from the following two cases.
𝑘 𝑘 (𝑁 + 1)
Case 1: No events are triggered, i.e., 𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ), 𝓁 ∈ 𝑁. It is not − 𝑐 𝜔̃ 𝑇𝑐 ℑ(𝛾, 𝛾ℎ )𝜔̃ 𝑐 + 𝑐 0 𝑏𝛿 2
2 2 𝐻
difficult to conclude that 𝐿̇ 1 = 𝑑𝑉 ∗ (𝑥,
̄ 𝑠)∕𝑑𝑡
̄ = 0.
≤ − 𝛽 𝑄(𝑠) − 𝜂 𝛥̄ (𝑥) − 𝜌𝛼(𝑡) − ‖
2 2
‖ 𝓁 𝑠̄𝓁 )‖
𝑢(
̂ 𝑥̄ , ‖2
Using (2) and (4) and taking the time derivative of 𝐿2 (𝑡) results in ( )

𝑘𝑐
ℑ(𝛾, 𝛾ℎ ) − 𝑏2𝜙 𝑏2𝜎 ‖ ‖2 𝑘𝑐 (𝑁0 + 1) 𝑏 2 + 4𝑏2
𝐿̇ 2 =∇𝑉 ∗𝑇 (𝑥, 𝑠)(𝛯𝑥 + 𝑓 (𝑥) + 𝜙(𝑥)𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )) + 𝛼(𝑡)
̇ 2 ‖𝜔̃ 𝑐 ‖ + 2 𝛿𝐻 𝛿𝑢
( )
=∇𝑉 ∗𝑇 (𝑥, 𝑠)(𝛯𝑥 + 𝑓 (𝑥)) + ∇𝑉 ∗𝑇 (𝑠)𝜙(𝑥)𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ) 𝑘𝑐 ‖2 𝑘𝑐 (𝑁0 + 1) 𝑏 2 + 4𝑏2 ,
≤− ℑ(𝛾, 𝛾ℎ ) − 𝑏2𝜙 𝑏2𝜎 ‖‖𝜔̃ 𝑐 ‖ + 𝛿𝐻 𝛿𝑢 (71)
‖ ‖2 2 2
− 𝜌𝛼(𝑡) + (1 − 𝛽 2 )𝑄(𝑠) − 2𝜉 2 ‖𝑒𝑠𝓁 ‖ . (61)
‖ ‖
The inequality (71) means 𝐿̇ < 0 only if the following inequality is
From (9) and (10), we obtain violated

∇𝑉 ∗𝑇 (𝑥, 𝑠)(𝛯𝑥 + 𝑓 (𝑥)) = −𝜂 𝛥̄2 (𝑥) − 𝑄(𝑠) + 𝑢∗𝑇 (𝑥, 𝑠)𝑢∗ (𝑥, 𝑠) √
√ 𝑘𝑐 (𝑁0 + 1)𝑏 2 + 8𝑏2
√ 𝛿𝐻 𝛿𝑢
‖𝜔̃ 𝑐 ‖ ≤ √ . (72)
∇𝑉 ∗𝑇 (𝑥, 𝑠)𝜙(𝑥, 𝑠) = −2𝑢∗𝑇 (𝑥, 𝑠). (62) ‖ ‖
𝑘𝑐 ℑ𝑖 (𝛾, 𝛾ℎ ) − 2𝑏2𝜙 𝑏2𝜎
Using (62), 𝐿̇ 2 can be rewritten as Hence, the boundedness of all signals is guaranteed according to the
𝐿̇ 2 ≤ − 𝜂 𝛥̄2 (𝑥) − 𝑄(𝑠) − ‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ ‖ ∗ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ Lyapunov extension theorem. Besides, the ultimate bound of 𝜔̃ 𝑐 is given
2 2
‖𝑢( ‖ + ‖𝑢 (𝑥) − 𝑢( ‖
by (72).
‖ ‖2
− 𝜌𝛼(𝑡) + (1 − 𝛽 2 )𝑄(𝑠) − 2𝜉 2 ‖𝑒𝑠𝓁 ‖ . (63) Case 2: Events are triggered, i.e., 𝑡 = 𝑡𝓁+1 , 𝓁 ∈ 𝑁. The corresponding
‖ ‖
differential expression is given as
Based on Young’s inequality [49], combining (46), (48) and (63)
yields 𝛥𝐿(𝑡) = 𝑉 ∗ (𝑥̄ 𝓁+1 , 𝑠̄𝓁+1 ) − 𝑉 ∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ) + 𝛥℘, (73)
‖𝑢∗ (𝑥) − 𝑢(
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖
2 ‖ ∗ ∗ ∗
̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 ))‖
2 where
‖ ‖ = ‖(𝑢 (𝑥) − 𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 )) + (𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 ) − 𝑢( ‖
≤ 2‖ ‖2 ‖ ∗ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖
∗ ∗ 2
‖𝑢 (𝑥) − 𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ + 2‖𝑢 (𝑥̄ 𝓁 , 𝑠̄𝓁 ) − 𝑢( ‖ 𝛥℘ =𝑉 ∗ (𝑥(𝑡𝓁+1 ), 𝑠(𝑡𝓁+1 )) − 𝑉 ∗ (𝑥(𝑡−
𝓁+1
), 𝑠(𝑡−
𝓁+1
)) + 𝛼(𝑠(𝑡𝓁+1 ))

6
T. Wang et al. Neurocomputing 601 (2024) 128176

1 𝑇 1
− 𝛼(𝑠(𝑡− )) + 𝜔̃ (𝑡 )𝜔̃ (𝑡 ) − 𝜔̃ 𝑇𝑐 (𝑡− )𝜔̃ (𝑡− ) Table 1
𝓁+1 2 𝑐 𝓁+1 𝑐 𝓁+1 2 𝓁+1 𝑐 𝓁+1
Parameters used in the pendulum system.
with 𝜆(𝑡 ̄ −
𝓁+1 ̄ 𝓁+1 − 𝜗) [note that 𝜆(⋅)
) = lim𝜗→0+ 𝜆(𝑡 ̄ represents 𝛼(⋅) and Parameter Meaning Value
𝜔̃ 𝑐 (⋅)]. 𝐽 Rotary inertia 4 kg m2
From case 1, 𝑑(𝐿1 (𝑡) + 𝐿2 (𝑡) + 𝐿3 (𝑡))∕𝑑𝑡 < 0 when ‖𝜒‖ satisfies 𝐿̄ Length of the pendulum 1.5 m
the condition (72). It is easy to conclude that 𝐿2 (𝑡) + 𝐿3 (𝑡) is strictly 𝑀 ̄ Mass of the pendulum 0.75 kg
monotonically decreasing for 𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ). 𝑔 Acceleration of gravity 9.8 m∕s2
𝑓𝑟 Frictional factor 0.5 N m s∕rad
Since 𝑡𝓁+1 > 𝑡𝓁+1 − 𝜗 for 𝜗 ∈ (0, 𝑡𝓁+1 − 𝑡𝓁 ), we obtain

𝐿2 (𝑡𝓁+1 ) + 𝐿3 (𝑡𝓁+1 ) < 𝐿2 (𝑡𝓁+1 − 𝜗) + 𝐿3 (𝑡𝓁+1 − 𝜗). (74)


Utilizing the property of the limit, (74) can be written as Based on (37) and (39), we have
𝐿2 (𝑡𝓁+1 ) + 𝐿3 (𝑡𝓁+1 ) ≤ lim+ 𝐿2 (𝑡𝓁+1 − 𝜗) + 𝐿3 (𝑡𝓁+1 − 𝜗) ‖ ‖2
𝑘𝛼(𝑡) + (1 − 𝛽 2 )𝑄(𝑠) − 2𝜉 2 ‖𝑒𝑠𝓁 ‖ ≥ 0. (81)
𝜗→0 ‖ ‖
= 𝐿2 (𝑡−
𝓁+1
) + 𝐿3 (𝑡−
𝓁+1
). (75) From (81), we obtain
According to the expressions of 𝐿2 and 𝐿3 in (60) and using (75), ‖ 𝑠 ‖2 𝑘𝛼(𝑡) + (1 − 𝛽 2 )𝑄(𝑠)
‖𝑒𝓁 ‖ ≤ = 𝑒𝑇 . (82)
we have ‖ ‖ 2𝜉 2
1 The next triggering instant 𝑡𝓁+1 is released once the value on the
𝑉 ∗ (𝑥(𝑡𝓁+1 ), 𝑠(𝑡𝓁+1 )) + 𝛼(𝑡𝓁+1 ) + 𝜔̃ 𝑇𝑐 (𝑡𝓁+1 )𝜔̃ 𝑐 (𝑡𝓁+1 ) √
2 right-hand side of (80) is greater than 𝑒𝑇 in (82). That is,
1
≤ 𝑉 (𝑥(𝑡𝓁+1 ), 𝑠(𝑡𝓁+1 )) + 𝛼(𝑡𝓁+1 ) + 𝜔̃ 𝑇𝑐 (𝑡−
∗ − − −
)𝜔̃ (𝑡− ).
𝓁+1 𝑐 𝓁+1
(76) √
2 𝑘1 ‖ ‖
‖𝑠̄𝓁 ‖ (𝑒𝑘0 (𝑡𝓁+1 −𝑡𝓁 ) − 1) > √𝑒 (𝑡 ) = 𝑘𝛼(𝑡) + (1 − 𝛽 2 )𝑄(𝑠)
This indicates that 𝛥℘ ≤ 0. Since 𝑥(𝑡) and 𝑠(𝑡) are convergent, we 𝑇 𝓁+1 . (83)
𝑘0 2𝜉 2
obtain
Since Lemma 2 can guarantee 𝛼(𝑡) > 0, it is clear that 𝑒𝑇 1∕2 > 0.
𝑉 ∗ (𝑥̄ 𝓁+1 , 𝑠̄𝓁+1 ) ≤ 𝑉 ∗ (𝑥̄ 𝓁 , 𝑠̄𝓁 ).
Using (83), we have
Hence, 𝛥𝐿(𝑡𝓁 ) ≤ 0 once ‖ ‖
‖𝜔̃ 𝑐 ‖ satisfies the condition given by (72). 1
Through the Lyapunov theorem extension [53], all signals are guar- 𝑇𝓁 = ln(1 + 𝐻𝓁 ), 𝓁 = {0, 1, 2, …}, (84)
𝑘0
anteed to be bounded. □
where

5. Zeno behavior exclusion 𝑘0 𝑒𝑇 (𝑡𝓁+1 )
𝐻𝓁 = , 𝓁 = {0, 1, 2, …}. (85)
𝑘1 ‖ ‖
‖𝑠̄𝓁 ‖
Assumption 5. [42] For 𝑠𝛼 , 𝑠𝛽 ∈ 𝛺, there exists
Define 𝐻𝑚𝑖𝑛 = 𝑚𝑖𝑛{𝐻𝓁 }, there exists 𝐻𝑚𝑖𝑛 > 0.
‖ ‖ ‖ ‖
‖∇𝜎𝑐 (𝑠𝛼 ) − ∇𝜎𝑐 (𝑠𝛽 )‖ ≤ 𝑘𝜎 ‖𝑠𝛼 − 𝑠𝛽 ‖ , Thus, we obtain
‖ ‖ ‖ ‖
where 𝑘𝜎 is a positive constant. 1
min{𝑇𝓁 } > ln(1 + 𝐻𝑚𝑖𝑛 ) > 0. (86)
𝑘0
Theorem 3. Define the time interval 𝑇𝓁 = 𝑡𝓁+1 − 𝑡𝓁 and assume that This ensures that Zeno behavior cannot occur. □
Assumptions 1 and 5 hold. When the dynamic
{ } TC is given by (37), the
zeno behavior can be excluded, i.e., 𝑚𝑖𝑛 𝑇𝓁 > 0. 6. Simulation result

Proof. With the help of Assumptions 1 and 5, we have In this section, two simulation examples are used to demonstrate
‖ ‖ the effectiveness of the data-driven-based DETC approach.
‖𝑢( ‖ ‖ 1 𝑇 𝑇 ‖
‖ ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ = ‖− 2 𝜙 (𝑥̄ 𝓁 )∇𝜎𝑐 (𝑠̄𝓁 )𝜔̂ 𝑐 ‖
‖ ‖
1
≤ 𝑘𝜙 𝑘𝜎𝑐 ‖ 𝜔̂
‖ 𝑐‖
‖ . 6.1. Example 1
2
Using (48), we obtain Consider a pendulum system [26], which is denoted as
‖ ‖ {
‖𝑠‖̇ ≤ 𝛯 ‖𝑥‖ + 𝑘𝑓 ‖𝑥‖ + ‖𝜙𝑇 𝑢( ̂ 𝑥̄ 𝓁 , 𝑠̄𝓁 )‖ 𝑑 𝛼̄
= 𝜐̄ + 𝑑(𝑥)
‖ ‖ 𝑑𝑡 (87)
1 2
≤ (𝛯 + 𝑘𝑓 ) ‖𝑥‖ + 𝑘𝜙 𝑘𝜎𝑐 ‖𝑠̄𝓁 ‖ ‖𝜔̂ 𝑐 ‖
‖ ‖ ‖ 𝐽 𝑑𝑑𝑡𝜐̄ = 𝑢 − 𝑀𝑔 ̄ − 𝑓𝑟 𝑑𝑑𝑡𝛼̄
̄ 𝐿̄ sin(𝛼)
2 ‖
1 2 where 𝛼̄ ∈ 𝑅 and 𝜐̄ ∈ 𝑅 represent the angle position and the angular
≤ (𝛯 + 𝑘𝑓 ) ‖𝑠‖ + 𝑘𝜙 𝑘𝜎𝑐 ‖ ‖‖ ‖
‖𝑠̄𝓁 ‖ ‖𝜔̂ 𝑐 ‖ . (77)
2 velocity, 𝑢 ∈ 𝑅 represents the control signal, 𝑑(𝑥) ∈ 𝑅 stands for the
Based on Theorem 2 and using the fact that 𝜔𝑐 is often assumed to perturbation with 𝑑(𝑥) = 0.4 sin(𝑥2 )+0.2(𝑥1 +𝑥2 )𝑥2 , and other parameters
be bounded, we know that 𝜔̂ 𝑐 is bounded. Thus, we have ‖𝜔̂ 𝑐 ‖ ≤ 𝑏𝜔̂ are given in Table 1.
‖ ‖ 𝑐
with 𝑏𝜔̂ 𝑐 > 0 being a constant. Then, (77) can be rewritten as Let 𝑥 = [𝑥1 , 𝑥2 ]𝑇 = [𝛼,
̄ 𝜐̄ ]𝑇 , and its initial state is 𝑥0 = [1.2, −1.5]𝑇 .
Then, the cost function of system (87) is provided as
1
‖𝑠‖
̇ ≤ (𝛯 + 𝑘𝑓 ) ‖𝑠‖ + 𝑏 𝑘2 𝑘 ‖𝑠̄ ‖ . (78)
2 𝜔̂ 𝑐 𝜙 𝜎𝑐 ‖ 𝓁 ‖

𝑉 ∗ (𝑥, 𝑠) = (𝜂‖𝑥‖2 + 𝑄(𝑠) + 𝑢2 )𝑑𝜏, (88)
𝑠
Then, using (33) yields 𝑠(𝑡) = 𝑠̄𝓁 − 𝑒𝓁 (𝑡) and 𝑠(𝑡)
̇ 𝑠
= −𝑒̇ 𝓁 (𝑡) for ∫𝑡
𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ). Define 𝑘0 = 𝛯 + 𝑘𝑓 , we obtain where 𝜂 = 6 and 𝑄(𝑠) = 𝑠21 + 𝑠22 . The parameters in the dynamic TC
‖ 𝑠‖ ‖ ‖ (37) are given as 𝑘 = 𝜌 = 0.8, 𝛽 = 0.9 and 𝜉 = 2.5. Similar to [3], the
‖𝑒̇ 𝓁 ‖ ≤ 𝑘0 ‖𝑒𝑠𝓁 ‖ + 𝑘1 ‖ ‖
‖𝑠̄𝓁 ‖ (79)
‖ ‖ ‖ ‖ activation function is chosen as 𝜎𝑐 (𝑠) = [𝑠21 , 𝑠1 𝑠2 , 𝑠22 ]𝑇 , and the weight
where 𝑘1 = 𝑘0 + (1∕2)𝑏𝜔̂ 𝑐 𝑘2𝜙 𝑘𝜎𝑐 . vector is chosen as 𝜔̂ 𝑐 = [𝜔̂ 𝑐,1 , 𝜔̂ 𝑐,2 , 𝜔̂ 𝑐,3 ]𝑇 with its initial value 𝜔̂ 𝑐 (0) =
Due to 𝑒𝑠𝓁 (𝑡𝓁 ) = 0 and invoking Comparison Lemma [26], the [6.1, 2, 4.23]𝑇 . Specifically, the initial value 𝜔̂ 𝑐 (0) is given via the trial
solution of (79) satisfies and error method [5], such that we can obtain the initial admissible
‖ ‖
‖ 𝑠 ‖ 𝑘1 ‖𝑠̄𝓁 ‖ ( 𝑘0 (𝑡−𝑡𝓁 ) ) control [54]. Besides, the parameter related to the SMS variable is given
‖𝑒𝓁 ‖ ≤ 𝑒 −1 , 𝑡 ∈ [𝑡𝓁 , 𝑡𝓁+1 ). (80) by 𝛯 = 𝑑𝑖𝑎𝑔[0.01, 0.02]𝑇 .
‖ ‖ 𝑘0

7
T. Wang et al. Neurocomputing 601 (2024) 128176

Fig. 2. Curves of the modeling errors. Fig. 5. Evolution of the weight vector 𝜔̂ 𝑐 .

Fig. 3. Curves of the system states. Fig. 6. Inter-event time 𝑇𝓁 .

Fig. 4. Evolution of approximate DETC input 𝑢(


̂ 𝑥,
̄ 𝑠).
̄ Fig. 7. Curves of the modeling errors.

After simulation, the curves of the modeling errors are shown in Associated with the cost function, 𝜂 = 2.3 and 𝜔̂ 𝑐 (0) = [6, 11, 2]𝑇 ,
Fig. 2. The curves of the system states are described in Fig. 3, where where the initial value 𝜔̂ 𝑐 (0) is given via the trial and error method.
the system states are convergent after 15 s. Fig. 4 displays the evolution The other parameters are the same as Example 1.
of the approximate DETC input 𝑢( ̂ 𝑥, ̄ which means that the controller
̄ 𝑠), After simulation, the curves of the modeling errors are depicted in
updates its value aperiodically. The evolution of the weight vector Fig. 7, which means that a priori knowledge of system dynamics is ob-
𝜔̂ 𝑐 is provided in Fig. 5, where 𝜔̂ 𝑐 converges to [5.305, 3.866, 0.9179]𝑇 tained via the data-driven control technique. The curves of the system
after the first 10 s. Fig. 6 describes the inter-event time 𝑇𝓁 . Obviously, states are described in Fig. 8, where the system states are convergent
min{𝑇𝓁 } = 0.06 > 0, which proves that the Zeno behavior is avoided. after 12 s. Fig. 9 depicts the evolution of the approximate DETC input
Furthermore, the triggering numbers are up to 6000, 1003, and 250 𝑢(
̂ 𝑥, ̄ which means that the controller updates its value aperiodically.
̄ 𝑠),
when we implement the time-triggered control method [31], the static We provide the evolution of the weight vector 𝜔̂ 𝑐 in Fig. 10. It can
event-triggered method [39], and our proposed method. be observed from Fig. 10 that 𝜔̂ 𝑐 converges to [4.374, 8.096, 2.101]𝑇
after the first 8 s. Fig. 11 describes the inter-event time 𝑇𝓁 . Obviously,
6.2. Example 2 min{𝑇𝓁 } = 0.1 > 0, which proves that the Zeno behavior is avoided.
Furthermore, the triggering numbers are up to 4000, 259, and 171
The dynamics of a robot arm [41] are given as follows: when we implement the time-triggered control method [31], the static
[ ] [ ] [ ] event-triggered method [39], and our proposed method.
𝑥2 0 𝑑(𝑥)
𝑥̇ = + 𝑢+ , (89)
−4.905 sin(𝑥1 ) − 0.2𝑥2 0.1 0
7. Conclusion
where 𝑥 = [𝑥1 , 𝑥2 ]𝑇 ∈ 𝑅2 with the initial value 𝑥0 = [1.6, 1.75]𝑇 , and
𝑑(𝑥) = sin2 (𝑥1 + 𝑥2 ) + 2𝑥2 . In addition, the parameter related to the SMS A feasible data-driven-based DETC method using the single-critic
variable is given by 𝛯 = 𝑑𝑖𝑎𝑔[0.001, 0.002]𝑇 . RL method and the SMS technique is proposed for unknown nonlinear

8
T. Wang et al. Neurocomputing 601 (2024) 128176

ETC method, the DETC method obtains a larger triggering interval,


which stabilizes the considered system with very few communication
resources. It is worth noting that our proposed DETC scheme needs
to monitor the triggering threshold continuously, which still lead to
some unnecessary computations. On the basis of the DETC method, we
will present a periodic DETC method to control the unknown nonlinear
system in the future, such that the requirement to continuously monitor
the triggering condition is avoided.

CRediT authorship contribution statement

Tengda Wang: Conceptualization, Methodology, Writing – origi-


nal draft. Guangdeng Zong: Conceptualization, Visualization. Xudong
Fig. 8. Curves of the system states.
Zhao: Conceptualization, Visualization. Ning Xu: Conceptualization,
Visualization.

Declaration of competing interest

The authors declare that they have no known competing finan-


cial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

The authors do not have permission to share data.

Acknowledgments

Fig. 9. Evolution of approximate DETC input 𝑢(


̂ 𝑥,
̄ 𝑠).
̄
This work was partially supported by the National Natural Science
Foundation of China (62203064).

References

[1] J.-G. Zhao, F.-F. Chen, Off-policy integral reinforcement learning-based optimal
tracking control for a class of nonzero-sum game systems with unknown
dynamics, Optim Control Appl Methods. 43 (6) (2022) 1623–1644.
[2] B. Zhao, G. Shi, D. Liu, Event-triggered local control for nonlinear intercon-
nected systems through particle swarm optimization-based adaptive dynamic
programming, IEEE Trans. Syst. Man Cybern. Syst. 53 (12) (2023) 7342–7353.
[3] S. Liu, B. Niu, N. Xu, X. Zhao, Zero-sum game-based decentralized optimal
control for saturated nonlinear interconnected systems via a data and event
driven approach, IEEE Syst. J. 18 (1) (2024) 758–769.
[4] S. Li, L. Ding, H. Gao, Y.-J. Liu, L. Huang, Z. Deng, ADP-based online tracking
control of partially uncertain time-delayed nonlinear system and application to
wheeled mobile robots, IEEE Trans. Cybern. 50 (7) (2020) 3182–3194.
Fig. 10. Evolution of the weight vector 𝜔̂ 𝑐 . [5] J. Sun, C. Liu, Disturbance observer-based robust missile autopilot design with
full-state constraints via adaptive dynamic programming, J. Franklin Inst. 355
(5) (2018) 2344–2368.
[6] J. Na, B. Wang, G. Li, S. Zhan, W. He, Nonlinear constrained optimal control of
wave energy converters with adaptive dynamic programming, IEEE Trans. Ind.
Electron. 66 (10) (2019) 7904–7915.
[7] J.-G. Zhao, Adaptive dynamic programming-based adaptive optimal tracking
control of a class of strict-feedback nonlinear system, INT. J. CONTROL. AUTOM.
21 (4) (2023) 1349–1360.
[8] J. Duan, J. Li, Q. Ge, S.E. Li, M. Bujarbaruah, F. Ma, D. Zhang, Relaxed
actor-critic with convergence guarantees for continuous-time optimal control of
nonlinear systems, IEEE Trans. Intell. Veh. 8 (5) (2023) 3299–3311.
[9] H.-J. Ma, L.-X. Xu, G.-H. Yang, Multiple environment integral reinforcement
learning-based fault-tolerant control for affine nonlinear systems, IEEE Trans.
Cybern. 51 (4) (2021) 1913–1928.
[10] X. Yang, H. He, Self-learning robust optimal control for continuous-time
nonlinear systems with mismatched disturbances, Neural Netw. 99 (2018) 19–30.
[11] J. Zhao, J. Na, G. Gao, Adaptive dynamic programming based robust control of
Fig. 11. Inter-event time 𝑇𝓁 . nonlinear systems with unmatched uncertainties, Neurocomputing 395 (2020)
56–65.
[12] Y. Zhao, H. Wang, N. Xu, G. Zong, X. Zhao, Reinforcement learning-based
systems. Utilizing the SMS technique, the developed control scheme decentralized fault tolerant control for constrained interconnected nonlinear
improves the response speed of the system and has the stronger ro- systems, Chaos Solitons Fractals 167 (2023) 113034.
bustness. Then, by constructing a SMS-based cost function for the [13] X. Yang, H. He, Q. Wei, B. Luo, Reinforcement learning for robust adaptive con-
trol of partially unknown nonlinear systems subject to unmatched uncertainties,
nominal system, the original problem is transformed into the optimal
Inform. Sci. 463–464 (2018) 307–322.
control problem. In addition, this paper adopts the single-critic network [14] X. Huang, J. Dong, ADP-based robust resilient control of partially unknown
framework rather than the actor-critic one, such that the approximation nonlinear systems via cooperative interaction design, IEEE Trans. Syst. Man
error brought by an action network is eliminated. Different from static Cybern. Syst. 51 (12) (2021) 7466–7474.

9
T. Wang et al. Neurocomputing 601 (2024) 128176

[15] J.M. Lee, J.H. Lee, Approximate dynamic programming-based approaches for [41] D. Wang, D. Liu, Learning and guaranteed cost control with event-based adaptive
input-output data-driven control of nonlinear processes, Automatica 41 (7) critic implementation, IEEE Trans. Neural Netw. Learn. Syst. 29 (12) (2018)
(2005) 1281–1288. 6004–6014.
[16] H. Zhang, L. Cui, X. Zhang, Y. Luo, Data-driven robust approximate optimal [42] Q. Zhao, J. Sun, G. Wang, J. Chen, Event-triggered ADP for nonzero-sum games
tracking control for unknown general nonlinear systems using adaptive dynamic of unknown nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst. 33 (5)
programming method, IEEE Trans. Neural Netw. 22 (12) (2011) 2226–2236. (2022) 1905–1913.
[17] Q. Wei, X. Wang, Y. Liu, G. Xiong, Data-driven adaptive-critic optimal output [43] X. Yang, H. He, Adaptive critic learning and experience replay for decentralized
regulation towards water level control of boiler-turbine systems, Expert Syst. event-triggered control of nonlinear interconnected systems, IEEE Trans. Syst.
Appl. 207 (2022) 117883. Man Cybern. Syst. 50 (11) (2020) 4043–4055.
[18] N. Zhao, Y. Tian, H. Zhang, E. Herrera-Viedma, Fuzzy-based adaptive event- [44] N. Wang, Y. Gao, X. Zhang, Data-driven performance-prescribed reinforcement
triggered control for nonlinear cyber-physical systems against deception attacks learning control of an unmanned surface vehicle, IEEE Trans. Neural Netw. Learn.
via a single parameter learning method, Inform. Sci. 657 (2024) 119948. Syst. 32 (12) (2021) 5456–5467.
[19] X. Wu, S. Ding, B. Niu, N. Xu, X. Zhao, Predefined-time event-triggered adaptive [45] Z. Gao, N. Zhao, X. Zhao, B. Niu, N. Xu, Event-triggered prescribed per-
tracking control for strict-feedback nonlinear systems with full-state constraints, formance adaptive secure control for nonlinear cyber physical systems under
Int. J. Gen. Syst. 53 (3) (2024) 352–380. denial-of-service attacks, Commun. Nonlinear Sci. Numer. Simul. 131 (2024)
[20] X. Wu, N. Zhao, S. Ding, H. Wang, X. Zhao, Distributed event-triggered output- 107793.
feedback time-varying formation fault-tolerant control for nonlinear multi-agent [46] L. Zhang, J. Liang, Z. Feng, N. Zhao, Improved results of asynchronous
systems, IEEE Trans. Autom. Sci. Eng. (2024) 1–12, http://dx.doi.org/10.1109/ mixed H and passive control for discrete-time linear switched system with
TASE.2024.3400325. mode-dependent average dwell time, Chaos Solitons Fractals 178 (2024) 114401.
[21] N. Zhao, X. Zhao, G. Zong, N. Xu, Resilient event-triggered filtering for net- [47] S. Huang, G. Zong, N. Xu, H. Wang, X. Zhao, Adaptive dynamic surface control of
worked switched T-S fuzzy systems under denial-of-service attacks, IEEE Trans. mimo nonlinear systems: a hybrid event triggering mechanism, Int. J. Adaptive
Fuzzy Syst. 32 (4) (2024) 2140–2152. Control Signal Process. 38 (2) (2024) 437–454.
[22] Y. Cao, B. Niu, H. Wang, X. Zhao, Event-based adaptive resilient control for [48] S. Liu, H. Wang, Y. Liu, N. Xu, X. Zhao, Sliding-mode surface-based adaptive
networked nonlinear systems against unknown deception attacks and actuator optimal nonzero-sum games for saturated nonlinear multi-player systems with
saturation, Internat. J. Robust Nonlinear Control 34 (7) (2024) 4769–4786. identifier-critic networks, Neurocomputing 584 (2024) 127575.
[23] H. Zhang, Y. Zhang, X. Zhao, Event-triggered adaptive dynamic programming [49] Z. Gao, Z. Wang, X. Wu, W. Wang, Y. Liu, Finite-time stability analysis of a
for hierarchical sliding-mode surface-based optimal control of switched nonlinear class of discrete-time switched nonlinear systems with partial finite-time unstable
systems, IEEE Trans. Autom. Sci. Eng. (2023) 1–13, http://dx.doi.org/10.1109/ modes, Asian J. Control 24 (1) (2022) 309–319.
TASE.2023.3303359. [50] F. Cheng, B. Niu, N. Xu, X. Zhao, Resilient distributed secure consensus control
[24] X. Wu, S. Ding, N. Xu, B. Niu, X. Zhao, Periodic event-triggered bipartite for uncertain networked agent systems under hybrid DoS attacks, Commun.
containment control for nonlinear multi-agent systems with input delay, Int. J. Nonlinear Sci. Numer. Simul. 129 (2024) 107689.
Syst. Sci. 55 (10) (2024) 2008–2022, http://dx.doi.org/10.1080/00207721.2024. [51] S. Huang, G. Zong, N. Zhao, X. Zhao, A.M. Ahmad, Performance recovery-based
2328780. fuzzy robust control of networked nonlinear systems against actuator fault: A
[25] K.G. Vamvoudakis, Event-triggered optimal adaptive control algorithm for deferred actuator-switching method, Fuzzy Sets and Systems 480 (2024) 108858.
continuous-time nonlinear systems, IEEE/CAA J. Autom. Sin. 1 (3) (2014) [52] X. Yang, Z. Zeng, Z. Gao, Decentralized neurocontroller design with critic
282–293. learning for nonlinear-interconnected systems, IEEE Trans. Cybern. 52 (11)
[26] X. Yang, Q. Wei, Adaptive critic learning for constrained optimal event-triggered (2022) 11672–11685.
control with discounted cost, IEEE Trans. Neural Netw. Learn. Syst. 32 (1) (2021) [53] L. Zhang, J. Wang, X. Zhao, N. Zhao, S. Sharaf, Event-based reachable set
91–104. synthesis for continuous delayed fuzzy singularly perturbed systems, IEEE Trans.
[27] X. Yang, Y. Zhu, N. Dong, Q. Wei, Decentralized event-driven constrained control Circuits Syst. II 71 (1) (2024) 246–250.
using adaptive critic designs, IEEE Trans. Neural Netw. Learn. Syst. 33 (10) [54] H. Zhao, H. Wang, B. Niu, X. Zhao, N. Xu, Adaptive fuzzy decentralized
(2022) 5830–5844. optimal control for interconnected nonlinear systems with unmodeled dynamics
[28] C. Mu, K. Wang, T. Qiu, Dynamic event-triggering neural learning control via mixed data and event driven method, Fuzzy Sets and Systems 474 (2024)
for partially unknown nonlinear systems, IEEE Trans. Cybern. 52 (4) (2022) 108735.
2200–2213.
[29] L. Cui, X. Xie, H. Guo, Y. Luo, Dynamic event-triggered distributed guaranteed Tengda Wang received the B.S. degree in electrical engi-
cost FTC scheme for nonlinear interconnected systems via ADP approach, Appl. neering and automation from Zhengzhou University of Light
Math. Comput. 425 (2022) 127082. Industry, Zhengzhou, China, in 2019, and the M.S. degree
[30] V. Utkin, Variable structure systems with sliding modes, IEEE Trans. Autom. in control science and engineering from Bohai University,
Control 22 (2) (1977) 212–222. Jinzhou, China, in 2024. He is currently pursuing the Ph.D.
[31] B. Zhao, D. Liu, C. Alippi, Sliding-mode surface-based approximate optimal con- degree in control science and engineering with Central
trol for uncertain nonlinear systems with asymptotically stable critic structure, South University, Changsha, China.
IEEE Trans. Cybern. 51 (6) (2021) 2858–2869. His research interests include optimal control, sliding
[32] S. Yue, B. Niu, H. Wang, L. Zhang, A.M. Ahmad, Hierarchical sliding mode- mode control, and their applications.
based adaptive fuzzy control for uncertain switched under-actuated nonlinear
systems with input saturation and dead-zone, Robotic Intell. Autom. 43 (5)
(2023) 523–536.
[33] J. Wang, L. Zhao, L. Yu, Adaptive terminal sliding mode control for magnetic Guangdeng Zong received the M.S. degree in mathematics
levitation systems with enhanced disturbance compensation, IEEE Trans. Ind. from Qufu Normal University, Qufu, China, in 2002, and the
Electron. 68 (1) (2020) 756–766. Ph.D. degree in control theory and control engineering from
[34] J. Fei, Z. Wang, X. Liang, Z. Feng, Y. Xue, Fractional sliding-mode control for the Control Science and Engineering Department, School of
microgyroscope based on multilayer recurrent fuzzy neural network, IEEE Trans. Automation, Southeast University, Nanjing, China, in 2005.
Fuzzy Syst. 30 (6) (2022) 1712–1721. He is a Full Professor with Tiangong University, Rizhao,
[35] B. Li, H. Zhang, B. Xiao, C. Wang, Y. Yang, Fixed-time integral sliding mode China. In 2010, he was a Visiting Professor with the
control of a high-order nonlinear system, Nonlinear Dynam. 107 (2022) 909–920. Department of Electrical and Computer Engineering, Utah
[36] H. Dong, Z. Ning, Z. Ma, Nearly optimal fault-tolerant constrained tracking for State University, Logan, UT, USA. In 2012, he was a Visiting
multi-axis servo system via practical terminal sliding mode and adaptive dynamic Fellow with the School of Computing, Engineering and
programming, ISA Trans. 144 (2024) 308–318. Mathematics, University of Western Sydney, Penrith, NSW,
[37] H. Dong, X. Yang, Learning-based online optimal sliding-mode control for space Australia. In 2016, he was a Visiting Professor with the
circumnavigation missions with input constraints and mismatched uncertainties, Institute of Information Science, Academia Sinica, Taipei,
Neurocomputing 484 (2022) 13–25. Taiwan. From 2018 to 2019, he visited the Department
[38] H. Zhang, H. Wang, B. Niu, L. Zhang, A.M. Ahmad, Sliding-mode surface-based of Mechanical Engineering, The University of Hong Kong,
adaptive actor-critic optimal control for switched nonlinear systems with average Hong Kong. From 2019 to 2020, he visited the Department
dwell time, Inform. Sci. 580 (2021) 756–774. of Mechanical Engineering, University of Victoria, Victoria,
[39] S. Liu, B. Niu, G. Zong, X. Zhao, N. Xu, Data-driven-based event-triggered optimal BC, Canada. Prof. Zong is currently an Editorial Board
control of unknown nonlinear systems with input constraints, Nonlinear Dynam. Member of some international journals, such as the Inter-
109 (2) (2022) 891–909. national Journal of Systems Science, IEEE ACCESS, and the
[40] G. Wen, C.L.P. Chen, S.S. Ge, H. Yang, X. Liu, Optimized adaptive nonlinear International Journal of Control, Automation and Systems.
tracking control using actor-critic reinforcement learning strategy, IEEE Trans.
Ind. Inform. 15 (9) (2019) 4969–4977.

10
T. Wang et al. Neurocomputing 601 (2024) 128176

Xudong Zhao received the B.S. degree in automation from Ning Xu received the B.S. degree in public administration
the Harbin Institute of Technology, Harbin, China, in 2005, from Harbin Engineering University, Harbin, China, in 2005,
and the Ph.D. degree in control science and engineering the M.S. degree in business administration from the Harbin
from the Space Control and Inertial Technology Center, Institute of Technology, Harbin, in 2012, and the Ph.D.
Harbin Institute of Technology, in 2010. degree in control science and engineering from the Institute
Since December 2015, he has been with the Dalian Uni- of Information and Control, Hangzhou Dianzi University,
versity of Technology, Dalian, China, where he is currently Hangzhou, in 2023.
a Professor. His research interests include hybrid systems, Since 2013, she has been with the College of Infor-
positive systems, multiagent systems, and control of aero mation Science and Technology, Bohai University, Jinzhou,
engine. Dr. Zhao was awarded as the Web of Science Highly China, where she is currently a Lecturer. Her research
Cited Researcher in engineering from 2017 to 2020. He is interests include switched systems, nonlinear systems, and
an Associate Editor of the IEEE Transactions on Systems, sampled data systems.
Man, and Cybernetics: Systems, Nonlinear Analysis: Hybrid
Systems, Neurocomputing, International Journal of General
Systems, Acta Automatica Sinica, Assembly Automation, and
Journal of Aeronautics.

11

You might also like