0% found this document useful (0 votes)
8 views35 pages

KARATZINIS Et Al, 2024 - Aircraft Engine Remaining Useful Life Prediction - A Comparison Study of Kernel Adaptive Filtering Architectures

This study evaluates Kernel Adaptive Filtering (KAF) architectures for predicting the Remaining Useful Life (RUL) of aircraft engines using NASA's C-MAPSS dataset. The research highlights the effectiveness of KAF algorithms, demonstrating that they outperform many neural network models in terms of computational efficiency and fewer trainable parameters, despite not always exceeding the best neural networks in performance metrics. The findings suggest that KAFs are a viable option for RUL prediction, paving the way for future innovations in machine learning and hybrid models.

Uploaded by

Carlos Felipe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views35 pages

KARATZINIS Et Al, 2024 - Aircraft Engine Remaining Useful Life Prediction - A Comparison Study of Kernel Adaptive Filtering Architectures

This study evaluates Kernel Adaptive Filtering (KAF) architectures for predicting the Remaining Useful Life (RUL) of aircraft engines using NASA's C-MAPSS dataset. The research highlights the effectiveness of KAF algorithms, demonstrating that they outperform many neural network models in terms of computational efficiency and fewer trainable parameters, despite not always exceeding the best neural networks in performance metrics. The findings suggest that KAFs are a viable option for RUL prediction, paving the way for future innovations in machine learning and hybrid models.

Uploaded by

Carlos Felipe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Mechanical Systems and Signal Processing 218 (2024) 111551

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing


journal homepage: www.elsevier.com/locate/ymssp

Aircraft engine remaining useful life prediction: A comparison study


of Kernel Adaptive Filtering architectures
Georgios D. Karatzinis a ,∗, Yiannis S. Boutalis a , Steven Van Vaerenbergh b
a Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
b Department of Mathematics, Statistics and Computing, University of Cantabria, Santander, Spain

ARTICLE INFO ABSTRACT

Communicated by E. Chatzi Predicting the Remaining Useful Life (RUL) of mechanical systems poses significant challenges
in Prognostics and Health Management (PHM), impacting safety and maintenance strategies.
Keywords:
This study evaluates Kernel Adaptive Filtering (KAF) architectures for predicting the RUL
Remaining Useful Life (RUL) prediction
Kernel Adaptive Filtering (KAF)
of aircraft engines, using NASA’s C-MAPSS dataset for an in-depth intra-comparison. We
Prognostics and Health Management (PHM) investigate the effectiveness of KAF algorithms, focusing on their performance dynamics in RUL
C-MAPSS prediction. By examining their behavior across different pre-processing scenarios and metrics,
we aim to pinpoint the most reliable and efficient KAF models for aircraft engine prognostics.
Further, our study extends to an inter-comparison with approximately 60 neural network
approaches, revealing that KAFs outperform more than half of these models, highlighting
the potential and viability of KAFs in scenarios where computational efficiency and fewer
trainable parameters are both crucial. Although KAFs do not always surpass the most advanced
neural networks in performance metrics, they demonstrate resilience and efficiency, particularly
underscored by the ANS-QKRLS algorithm. This evaluation study offers valuable insights into
KAFs for RUL prediction, highlighting their operational behavior, setting a foundation for
future machine learning innovations. It also paves the way for research into hybrid models
and deep-learning-inspired KAF structures, potentially enhancing prognostic tools in mechanical
systems.

1. Introduction

Kernel methods are a class of algorithms often used in machine learning [1], nonlinear signal processing and pattern analysis [2].
The core idea behind kernel methods is that, in the context of reproducing kernel Hilbert spaces (RKHS), input data are transformed
into a high dimensional feature space (Hilbert space 𝐻) using a positive-definite function named reproducing kernel [3]. This way,
the inner product operation in the feature space can be computed explicitly through a kernel evaluation. Conventional kernel-
based implementations involve a wide range of batch formulations such as regularization network [4], support vector machine
(SVM) [5], Gaussian process regression (GPR) [6], kernel principal component analysis (KPCA) [7], relevance vector machine
(RVM) [8]. However, batch algorithms operate in an offline manner and they need to be retrained once new data are present
imposing restrictions for real-time applications. Sequential learning algorithms are suitable for real-time applications as they update
their parameters producing predictions, at each iteration, within a constant stream of data observations’ acquisition.
Kernel adaptive filters (KAFs) [9], as a class of kernel sequential learning approaches, have received increased research interest
over the past decades, mainly because of their online adaptation scheme and the universal approximation capabilities they offer.

∗ Corresponding author.
E-mail address: [email protected] (G.D. Karatzinis).

https://doi.org/10.1016/j.ymssp.2024.111551
Received 15 December 2023; Received in revised form 17 May 2024; Accepted 19 May 2024
Available online 5 June 2024
0888-3270/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

They can be considered as efficient extensions of the well-established linear adaptive filtering algorithms. Indicative applications
that have been tackled using KAF variants include: stock returns prediction [10], nonlinear time series contaminated by impulsive
noise [11], aero-engine degradation prediction [12], machine condition prediction [13], noisy chaotic time series prediction [14],
suppressing noise and artifact interference from ECG signals [15].
Key challenges that emerge in KAFs are: (i) kernel type and hyperparameters selection; (ii) inherent linear growing network
structure that leads to constant increase in memory and computational demands; Gaussian kernel is considered a default choice
due to its universal approximation capability, while also it provides infinite dimension to the nonlinear mapping. In order to
obtain a desirable approximation performance, proper specification of kernel parameters is crucial based on the under examination
application. Optimal kernel parameter selection can be achieved using appropriate methods like cross validation [16], but such
related methods lack in online learning due to high computational needs. More efficient online techniques for optimizing the kernel
size include adaptive kernel size [17], multikernel adaptive filtering [18] and Gaussian KAFs with adaptive kernel bandwidth [19].
Online kernel-based learning using adaptive projection algorithms [20] serves as an efficient and simple structured alternative for
training the model in a recursive manner. Various sparse methods have been proposed in order to address the continuously increased
size and the computational cost. Widely used sparse methods in KAF context are: novelty criterion [21], surprise criterion [22],
coherence criterion [23], variance criterion [24], approximate linear dependence (ALD) criterion [25] and others. Alternative
approaches that curb the growth of the network are: quantization methods [26,27], budget-based ones [28,29], algorithms that
combine established mechanisms [30,31] and algorithmic extensions of them like nearest-instance-centroid-estimation kernel
least-mean-square (NICE-KLMS) [32] and KRLS Tracker (KRLS-T) [33]. Other recent approaches that belong in the KAF family
include multikernel adaptive filtering [34], Nyström kernel recursive least squares [35], kernel recursive algorithms that include
M-estimate [36].
In this work, an evaluation study of KAF-based algorithms is presented which is dedicated to aircraft engine remaining useful
life prediction. Remaining useful life (RUL) prediction is a real-world problem that is directly connected with the productivity
and maintenance of industrial systems [37,38] under the prognostic and health management (PHM) context. This is an imperative
process to determine the time for maintenance and component replacement in industrial applications. RUL prediction spans among
a plethora of problems such as rolling bearings [39,40], Lithium-ion batteries [41,42], supercapacitors [43], power electronics [44]
and aircraft engines [45]. The under examination dataset in this paper is called C-MAPSS [46] and it has been created by NASA,
simulating the degradation of aircraft engines (turbofan engines). A set of established KAF algorithms is evaluated in this RUL
prediction problem in terms of divergent metrics. Also, an analysis is conducted in order to interpret the insight behavior of the
mechanism that is encapsulated in each KAF algorithm. Finally, an extensive comparison is presented between KAF algorithms and
neural network approaches reported in the literature in terms of performance and number of trainable weights. Summarizing, the
contribution of this work is three-fold:
1. KAF Review and Algorithmic Insight — A detailed review of a subset of KAF algorithms is presented, with an emphasis on
clarifying the underlying mechanisms that govern their performance. This analysis is not merely descriptive but analytical, delving
into the operational details of each algorithm to understand its functional strengths and limitations within the RUL prediction
problem.
2. Extensive Intra-comparison within KAF Domain — Extensive evaluation study of the adopted KAF architectures dedicated
in a well-known RUL application, using different preprocessing and normalization scenarios. This is performed for intra-comparison
purposes within KAF universe to evaluate the behavior of each KAF mechanism in the aircraft engine RUL problem and subsequently
identify the most dominant in terms of performance and reliability.
3. Comprehensive Inter-comparison with Neural Networks — Comparative results with well-known neural network approaches
reported in the literature (around 60 approaches) regarding aircraft engine RUL prediction problem. This is an inter-comparison
analysis to rank KAF produced prediction behavior with other network-based approaches in terms of performance, training time
and number of trainable parameters. The results demonstrate that KAF algorithms achieve decent performance with significantly
fewer trainable parameters, highlighting their efficiency.
In the trade-off between computational burden and prediction accuracy, KAFs emerge as a preferable option, particularly in
resource-constrained environments. Their efficiency, combined with competitive performance, paves the way for future research
to explore deep learning-inspired KAF architectures. The rest of this paper is organized as follows: Section 2 details the theoretical
background of Linear Adaptive Filters. Section 3 expands to Kernel Adaptive Filters, where the nonlinear extension of linear methods
is explored. In Section 4, the focus shifts to sparsification with ALD, detailing the mechanisms for reducing computational complexity
in kernel methods. Section 5 discusses the growing and pruning strategies, which address the challenge of managing the growth of
the model size over time. Section 6 presents the quantization approaches, while candidates of combined approaches are presented
in Section 7. The Remaining Useful Life problem is described in Section 8 covering the adopted data preprocessing cases and the
utilized evaluation metrics. The extensive evaluation part of KAF approaches is presented in Section 9. Section 10 is devoted to the
comparison of KAF algorithms with nearly 60 approaches reported in the literature, providing also a discussion analysis regarding
the research outcomes of this work. Section 11 stands for the results, conclusion and future work.

2. Linear adaptive filters

2.1. Least mean square

Least Mean Square (LMS) [47,48] is the simplest adaptive filtering algorithm. Suppose there is a sequence of training input–output
pairs {𝒙𝑖 , 𝑦𝑖 }𝑁
𝑖=1
, with 𝑁 be the size of data, the objective is to learn a continuous input–output mapping 𝑓 ∶ U → R. The input is

2
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

assumed a 𝐿 dimensional input vector and the input domain is a subspace of R𝐿 , i.e. 𝒙 ∈ U ⊆ R𝐿 and the output 𝑦 is assumed as
one dimensional, 𝑦 ∈ R. The LMS algorithm minimizes the cost function 𝐽𝑖 as follows:
1 2
𝐽𝑖 = 𝑒 (1)
2 𝑖
where 𝑒𝑖 is the estimation error. The optimal weight vector at 𝑖th iteration is approximately obtained by calculating:
𝑒𝑖 = 𝑦𝑖 − 𝒘𝑇𝑖−1 𝒙𝑖
(2)
𝒘𝑖 = 𝒘𝑖−1 + 𝜂𝑒𝑖 𝒙𝑖
where 𝜂 is the learning rate. The initial values of weight vector are equal to zero. The LMS algorithm is derived by using the
instantaneous gradient of the cost function Eq. (1) with respect to weight vector, thus it is formulated similarly with the stochastic
gradient descent algorithm (LMS is a special case of stochastic gradient descent).

2.2. Recursive least squares

In contrast with the LMS minimization procedure, which calculates the instantaneous value of the squared estimation error as
presented in Eq. (1), the Recursive Least Squares (RLS) algorithm [49,50] minimizes the sum of all squared estimation errors up
to the current time step. Following a sequence of input–output pairs up to time 𝑖 − 1, {𝒙𝑗 , 𝑦𝑗 }𝑖−1
𝑗=1
, the objective is to minimize the
corresponding cost:

𝑖−1
min |𝑦𝑗 − 𝒙𝑇𝑗 𝒘𝑖−1 |2 + 𝜆‖𝒘𝑖−1 ‖2 (3)
𝑤𝑖−1
𝑗=1

where 𝜆 is the regularization parameter. The recursive nature of this algorithm is reflected in the estimation of weight 𝒘𝑖 , when a
new pair {𝒙𝑖 , 𝑦𝑖 } becomes available, from the previous weight 𝒘𝑖−1 . Hence, the RLS algorithm iterates following the procedure:
𝑟𝑖 = 1 + 𝒙𝑇𝑖 𝑷 𝑖−1 𝒙𝑖
𝑷 𝒙
𝒌𝑖 = 𝑖−1 𝑖
𝑟𝑖
𝑒𝑖 = 𝑦𝑖 − 𝒙𝑇𝑖 𝒘𝑖−1 (4)
𝒘𝑖 = 𝒘𝑖−1 + 𝒌𝑖 𝑒𝑖
𝑷 𝑖−1 𝒙𝑖 𝒙𝑇𝑖 𝑷 𝑖−1
𝑷 𝑖 = 𝑷 𝑖−1 −
𝑟𝑖
where 𝒌𝑖 is the gain vector (𝐿 × 1), the input 𝒙𝑖 is also of the form (𝐿 × 1), 𝑒𝑖 is the prediction error and 𝑷 𝑖 is the correlation
matrix (𝐿 × 𝐿). The initial values of weight vector are equal to zero (𝒘0 = 0), while also the initial values of correlation matrix are
𝑷 0 = 𝜆−1 𝑰.

3. Kernel adaptive filters

Kernel methods provide non-linear and non-parametric versions of conventional learning algorithms, which translate data into
higher dimensional space. Then, linear adaptive algorithms are applied to transformed data, where these algorithms correspond to
non-linear implementations in the original input space. The core idea behind kernel methods is that, in the context of reproducing
kernel Hilbert spaces (RKHS), input data are transformed into a high dimensional feature space (Hilbert space 𝐻) using a positive-
definite function named reproducing kernel [3]. This way, the inner product in the feature space can be computed through kernel
evaluations. More specifically, there is no need to execute calculations in the high dimensional feature space as long as an approach
can be expressed in terms of inner products (or kernel evaluations). In case that algorithmic operations in RKHS can be expressed by
inner products, then these operations can be calculated by kernel evaluations in the input space without making any direct reference
to feature vectors. This methodological aspect is commonly known as the ‘‘kernel trick’’ and based on this idea a wide variety of
adaptive filtering algorithms has been introduced in RKHS [9].
The ‘‘kernel trick’’ describes that for a given algorithm that can be expressed in terms of inner products, an alternative algorithm
can be constructed by replacing the inner products with a positive definite kernel function, i.e., the algorithm can be extended
to RKHS. Reproducing Kernel Hilbert Spaces are Hilbert spaces that satisfy certain additional properties. RKHS theory is normally
described as a transform theory between RKHSs and positive semi-definite functions called kernels. RKHS is a space of functions
that: (a) each point in space is a particular function; (b) functions are smooth and continuous; (c) linear functions in RKHS provide
useful non-linear results; (d) each RKHS has a unique kernel function. Each kernel induces exactly one RKHS, and each RKHS
has a unique kernel function and certain problems posed in RKHSs are more easily solved by involving the kernel [51]. Based on
the aforementioned properties, Kernel adaptive filters (KAF) are generated in RKHS by implementing well-known linear adaptive
filtering techniques that correspond to nonlinear filters in the original input space, utilizing the linear structure and inner product
of this space [17]. Therefore, the linear adaptive algorithms described in Section 2 can be extended in Kernel Least Mean Square
(KLMS) [52], Kernel Recursive Least Squares (KRLS) [25] and Extended Kernel Recursive Least Squares (EX-KRLS) [53] respectively.
Based on the above rationale, suppose the goal is to learn a continuous input–output mapping 𝑓 ∶ U → R based on a sequence
of training data {𝒙𝑖 , 𝑦𝑖 }𝑁
𝑖=1
, where U ⊆ R𝐿 is the input domain and 𝑦 ∈ R is the corresponding desired output. A kernel (Mercer

3
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

kernel) [3] is a continuous, symmetric and positive-definite function 𝜅 ∶ U × U → R, where a nonlinear mapping 𝜑(⋅) is associated
with this kernel to transform input data 𝒙 into a potentially infinite-dimensional feature space RKHS H. Common examples of
symmetric and positive definite kernels are the linear, polynomial and Gaussian kernel as:

𝜅(𝒙, 𝒙′ ) = 𝒙𝑇 ⋅ 𝒙′ (5)

𝜅(𝒙, 𝒙′ ) = (𝒙𝑇 ⋅ 𝒙′ + 1)𝑝 (6)

𝜅(𝒙, 𝒙′ ) = exp(−𝜉‖𝒙 − 𝒙′ ‖2 ) (7)


or equivalent for the Gaussian kernel:
( )
‖𝒙 − 𝒙′ ‖2
𝜅(𝒙, 𝒙′ ) = exp − (8)
2ℎ2
where 𝑝 is the order of the polynomial kernel, ℎ is called kernel bandwidth or kernel size and 𝜉 is the kernel parameter. From Mercer
theorem [3], any reproducing kernel 𝜅(𝒙, 𝒙′ ) can be expressed as:


𝜅(𝒙, 𝒙′ ) = 𝜁𝑖 𝜙𝑖 (𝒙)𝜙𝑖 (𝒙′ ) (9)
𝑖=1
where 𝜁𝑖 and 𝜙𝑖 are the eigenvalues (non-negative) and eigenfunctions respectively. Therefore, the kernel induced mapping 𝜑(⋅),
where 𝜑 ∶ U → H is constructed as:
[√ √ ]
𝝋(𝒙) = 𝜁1 𝜙1 (𝒙), 𝜁2 𝜙2 (𝒙), … (10)

such that:
𝜅(𝒙, 𝒙′ ) = 𝝋(𝒙)𝑇 𝝋(𝒙′ ) (11)
Note that the dimensionality of H is determined by the number of strictly positive eigenvalues, which can be infinite in the case of
the Gaussian kernel. Also, note that there is a unique isometric isomorphism in RKHS, which means that the linear structure and the
inner product are both preserved. A linear model is constructed using the nonlinear mapping 𝝋(⋅), i.e., the kernel-induced mapping
of Eq. (10) transforms the input 𝒙𝑖 into the high dimensional feature space H as 𝝋(𝒙𝑖 ), in the feature space:

𝑓 (⋅) = 𝝎𝑇𝑖 𝝋(⋅) (12)


where 𝝎𝑖 is the weight vector in the feature space. Then, the learning task is to find (recursively) a weight vector at each iteration
that minimizes the regularized least squares regression in H:

𝑖
min |𝑦𝑗 − 𝝎𝑇𝑖 𝝋(𝒙𝑗 )|2 + 𝜆‖𝝎𝑖 ‖2 (13)
𝜔𝑖
𝑗=1

Finally, by the representer theorem [54], for a new input 𝒙′ the optimal solution can be expressed as:

𝑖
𝑓̂(𝒙′ ) = 𝜶 𝑗𝑖 𝜅(𝒙𝑗 , 𝒙′ ) (14)
𝑗=1

where 𝜶 𝑗𝑖 is the 𝑗th component of coefficient vector 𝜶 𝑖 and the form of Eq. (14) reminds of a radial-basis function (RBF) network. KAF
architectures create a growing RBF network which aims to [9]: (i) learn the network topology, and (ii) adapt the free parameters
directly from data at the same time. However, two main challenges emerge in the KAF context [9]: (i) growing architecture as,
at each step, new input points are getting involved to the estimation leading to constant increase in memory and computational
demands; (ii) proper selection of Mercer kernel type and its parameters is needed. In the first case, sparsification approaches are
applied to limit the expansion size of dictionary considering only the important/informative data. Such approaches are presented
in Platt’s novelty criterion [21] and approximate linear dependence (ALD) condition described in [25]. This way, sparsity allows
the solution to be kept in memory in a compact form and to be easily assessed later rather than storing information pertaining to
the entire history of training instances. Regarding the second challenge, various methods are used for kernel size selection such as
cross-validation [16], adaptive width Gaussian kernel [55] and a sequential optimization strategy with adaptive kernel size that
have been proposed in [17].

3.1. Kernel least mean square

The implementation of Kernel Least Mean Square (KLMS) algorithm [52] starts from the fact that the output of LMS model
(Eq. (2)) can be expressed in terms of inner products. Thus the incorporation of kernel function is feasible, by the kernel trick,
resulting in the form of Eq. (14). More specifically, the output of Eq. (2), for a given input 𝒙′ , can be expressed as inner product as
follows:
𝑓 𝐿𝑀𝑆 (𝒙′ ) = 𝒘𝑇𝑖 𝝋(𝒙′ )
∑ 𝑖 [ ] (15)
=𝜂 𝑒𝑗 𝝋(𝒙𝑗 )𝑇 𝝋(𝒙′ )
𝑗=1

4
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Therefore, the output of the KLMS algorithm using the kernel trick and Eq. (11) becomes:

𝑖
𝑓 𝐾𝐿𝑀𝑆 (𝒙′ ) = 𝜂 𝑒𝑗 𝜅(𝒙𝑗 , 𝒙′ ) (16)
𝑗=1

As can been seen from Eq. (16), the solution to the unknown nonlinear mapping is computed step-by-step leading to a growing
RBF topological network. This way, the algorithm allocates a new kernel, at each time step, with center 𝒙′ and fitting parameter as
the measured error scaled by learning step 𝜂. Thus, a dictionary 𝐷𝑖 is formed that increases constantly as new samples arrive. The
KLMS algorithm is summarized in Algorithm 1.

Algorithm 1 Kernel Least Mean Square [52].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
Initialization: choose kernel type 𝜅 , kernel parameter 𝜉 and learning step 𝜂, 𝑓1 = 0, 𝜶 11 = 0, center list 𝐷1 = {}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
∑ 𝑗
2: 𝑓𝑖−1 (𝒙𝑖 ) = 𝑖−1
𝑗=1 𝜶 𝑖−1 𝜅(𝒙𝑖 , 𝒙𝑗 )
3: 𝑒𝑖 = 𝑦𝑖 − 𝑓𝑖−1 (𝒙𝑖 ) ⊳ Prediction error calculation
4: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register new center
5: 𝜶 𝑖𝑖 = 𝜂𝑒𝑖 ⊳ Store new coefficient
6: end for

3.2. Kernel recursive least squares

Using the Mercer theorem, the input 𝒙𝑖 can be transformed into the feature space H as 𝝋(𝒙𝑖 ) and the RLS algorithm can be
derived in RKHS leading to Kernel Recursive Least Squares (KRLS) algorithm [25]. Thus, for the example sequence {𝝋(𝒙𝑖 ), 𝑦𝑖 }𝑁 𝑖=1
the
Eq. (3) becomes similar with Eq. (13). Using 𝒚 𝑖 = [𝑦1 , … , 𝑦𝑖 ]𝑇 and 𝜱𝑖 = [𝝋(𝒙1 ), … , 𝝋(𝒙𝑖 )] the optimal solution of weight vector is
calculated generally as:

𝝎𝑖 = 𝜱𝑖 [𝜆𝑰 + 𝜱𝑇𝑖 𝜱𝑖 ]−1 𝒚 𝑖 (17)

The weight 𝝎 can be also expressed explicitly as a linear combination of the input data such as 𝝎𝑖 = 𝜱𝑖 𝜶 𝑖 with 𝜶 𝑖 = [𝜆𝑰 + 𝜱𝑇𝑖 𝜱𝑖 ]−1 𝒚 𝑖 .
̃ 𝑖 = [𝜆𝑰 + 𝜱𝑇 𝜱𝑖 ]−1 , 𝑲
Denoting 𝑲 ̃ 𝑖 can be expressed in the recursive form:
𝑖
[ ]
̃ 𝑖−1
𝑲 𝒌̃ 𝑖
𝑲̃𝑖= (18)
𝑇
𝒌̃ 𝑖 𝜆 + 𝑘𝑖𝑖

where 𝒌̃ 𝑖 = 𝜱𝑇𝑖−1 𝝋(𝒙𝑖 ) = [𝜅(𝒙𝑖 , 𝒙1 ), … , 𝜅(𝒙𝑖 , 𝒙𝑖−1 )] is the kernel vector and 𝑘𝑖𝑖 = 𝝋(𝒙𝑖 )𝑇 𝝋(𝒙𝑖 ) = 𝜅(𝒙𝑖 , 𝒙𝑖 ). Thus, the inverse of 𝑲
̃ 𝑖 is
updated using:
[ −1 ]
−1 1 𝑲 ̃ 𝛿𝑖 + 𝒂̃ 𝑇 𝒂̃ 𝑖 −𝒂̃ 𝑖
̃ 𝑖−1 𝑖
𝑲𝑖 = (19)
𝛿𝑖 −𝒂̃ 𝑇𝑖 1

̃ −1 𝒌̃ 𝑖 and 𝛿𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 ) − 𝒌̃ 𝑇 𝒂̃ 𝑖 . Then, the approximated function can be expressed similarly with Eq. (14) where the
where 𝒂̃ 𝑖 = 𝑲 𝑖 𝑖
coefficient vector is calculated as follows:
[ ]
𝜶 𝑖−1 − 𝒂̃ 𝑖 𝛿𝑖−1 𝑒𝑖
𝜶𝑖 = (20)
𝛿𝑖−1 𝑒𝑖

This way, at every time-step a new unit is registered in 𝐷𝑖 with center 𝒙𝑖 and 𝜶 𝑖 as the coefficient. More specifically, the current
coefficient is calculated using 𝛿𝑖−1 𝑒𝑖 (second row of Eq. (20)), while the previous coefficients are updated by −𝒂̃ 𝑖 𝛿𝑖−1 𝑒𝑖 (first row of
Eq. (20)). The KRLS algorithm is summarized in Algorithm 2.

4. Sparsification with ALD

As described in Section 3, at each iteration, a new feature input is registered into the current state of dictionary 𝐷𝑖 . Adopting
Gaussian kernels, a new kernel unit is registered for every new input sample as a RBF center leading to a constantly growing
RBF network. This way, the network is linearly depended on the number of training samples. Sparsification techniques have been
proposed in the literature to limit the expansion size of dictionary, alleviating the computational load and memory requirements.
In order to curb the growth of the networks, a wide set of sparsification techniques has been proposed in the literature including
novelty criterion [21], surprise criterion [22], coherence criterion [23], variance criterion [24] and others. Approximate Linear
Dependence (ALD) [25] criterion is such an efficient sparsification procedure which is usually preferred. ALD selects a set of basis
vectors (dictionary vectors) in the feature space resulting to a simplified network by discarding the redundant information.

5
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Algorithm 2 Kernel Recursive Least Squares [25].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1 −1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆 + 𝜅(𝒙1 , 𝒙1 )), 𝜶 1 = (𝑲̃ 1 𝑦1 ),
center list 𝐷1 = {}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
2: 𝒌̃ 𝑖 = [𝜅(𝒙𝑖 , 𝒙1 ), … , 𝜅(𝒙𝑖 , 𝒙𝑖−1 )] ⊳ Kernel vector
−1
3: ̃𝑖
𝒂̃ 𝑖 = 𝑲̃ 𝑖−1 𝒌
4: 𝛿𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 )) − 𝒌 ̃ 𝑇 𝒂̃ 𝑖
𝑖
−1
5: Compute kernel matrix 𝑲̃ 𝑖 by using Eq. (19)
6: ̃ 𝑇 𝜶 𝑖−1
𝑒𝑖 = 𝑦𝑖 − 𝒌 ⊳ Prediction error calculation
𝑖
7: Compute 𝜶 𝑖 by using Eq. (20)
8: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register new center
9: end for

More specifically, after a set of training samples {𝒙𝑗 , 𝑦𝑗 }𝑖−1


𝑗=1
have been observed, ALD participates for the creation of a dictionary
𝑚𝑖−1 𝑚𝑖−1
which is consisted of a subset of these samples, i.e. 𝐷𝑖−1 = {𝒙̃ 𝑗 }𝑗=1 , where the feature inputs {𝝋(𝒙̃ 𝑗 )}𝑗=1 are linearly independent.
Thus, every new feature input 𝝋(𝒙𝑖 ) is tested whether it breaches (added to the dictionary as informative sample) or passes (discarded
as redundant sample) the ALD condition. The ALD condition is expressed as [25]:
𝑚𝑖−1 2
‖∑ ‖
𝛿𝑖 ∶= min ‖ 𝜶 𝑗 𝝋(𝒙̃ 𝑗 ) − 𝝋(𝒙𝑖 )‖ ≤ 𝜈 (21)
𝜶𝑖 ‖ ‖
𝑗=1

where 𝜶 𝑖 = (𝛼1 , … , 𝛼𝑚𝑖−1 )𝑇 are the coefficients that satisfy the ALD condition, 𝜈 indicates the ALD threshold parameter or the level of
sparsity (small positive constant), 𝒙̃ 𝑗 is the 𝑗th support vector in the dictionary until 𝑖 − 1 and 𝑚𝑖−1 defines the number of breaches.
The 𝛿𝑖 represents the square of the Euclidean distance between the new feature input 𝝋(𝒙𝑖 ) and the subspace spanned by the support
feature vector bases that have been already selected until time 𝑖 − 1. If Eq. (21) holds, then 𝝋(𝒙𝑖 ) can be expressed within a squared
error 𝜈 as a linear combination of current dictionary centers. In such a case, this training sample will be discarded and the expansion
coefficients will not be updated. The 𝛿𝑖 can be expressed in terms of inner products (in H) and subsequently as:

𝛿𝑖 = min{𝜶 𝑇 𝑲
̃ 𝑖−1 𝜶 − 2𝜶 𝑇 𝒌̃ 𝑖−1 (𝒙𝑖 ) + 𝑘𝑖𝑖 } (22)
𝜶

where 𝒌̃ 𝑖−1 (𝒙𝑖 ) is the kernel vector with (𝒌̃ 𝑖−1 (𝒙𝑖 ))𝑙 = 𝜅(𝒙̃ 𝑙 , 𝒙𝑖 ), the kernel matrix is [𝑲
̃ 𝑖−1 ]𝑙,𝑗 = 𝜅(𝒙̃ 𝑙 , 𝒙̃ 𝑗 ), 𝑘𝑖𝑖 = 𝜅(𝒙𝑖 , 𝒙𝑖 ) with
𝑙, 𝑗 = 1, … , 𝑚𝑖−1 . The optimal 𝒂̃ 𝑖 and the ALD threshold parameter are given by solving Eq. (22) as:
−1
̃ 𝒌̃ 𝑖−1 (𝒙𝑖 ), 𝛿𝑖 = 𝑘𝑖𝑖 − 𝒌̃ 𝑖−1 (𝒙𝑖 )𝑇 𝒂̃ 𝑖 ≤ 𝜈
𝒂̃ 𝑖 = 𝑲 𝑖−1 (23)

Therefore, if 𝛿𝑖 > 𝜈 then ALD breaches and the current input sample, 𝒙𝑖 , is added to the dictionary so as, 𝐷𝑖 = 𝐷𝑖−1 ∪ {𝒙𝑖 } and
subsequently the number of breaches occurred is increased by one, 𝑚𝑖 = 𝑚𝑖−1 + 1. Finally, the corresponding approximation in terms
̃ 𝑖 𝑨𝑇 , where [𝑲 𝑖 ]𝑙,𝑗 = 𝜅(𝒙𝑙 , 𝒙𝑗 ) with 𝑙, 𝑗 = 1, … , 𝑖 is the full kernel matrix.
of kernel matrices is 𝑲 𝑖 = 𝑨𝑖 𝑲 𝑖

4.1. The ALD-KRLS algorithm

The ALD criterion has been initially proposed in conjunction with KRLS in [25]. The concept of this sparsification procedure
defines two operational paths that are encapsulated in ALD-KRLS algorithm [25]:

• Breach Path (𝛿𝑖 > 𝜈): The input sample 𝒙𝑖 is not approximately linear dependent on 𝐷𝑖−1 , thus it is added to the dictionary,
̃ −1 (inverse kernel matrix)
𝐷𝑖 = 𝐷𝑖−1 ∪ {𝒙𝑖 }, and 𝑚𝑖 = 𝑚𝑖−1 + 1. Here, 𝑲 𝑖 ≠ 𝑲 𝑖−1 . Then, the kernel matrix recursive formula for 𝑲 𝑖
and the coefficient vector 𝜶 𝑖 are computed by Eqs. (19) and (20) respectively as in Section 3.2 and 𝑷 𝑖 (covariance matrix) is
expressed as:
[ ]
𝑷 𝑖−1 𝟎
𝑷𝑖 = (24)
𝟎𝑇 1

̃ −1 = 𝑲
• Pass Path (𝛿𝑖 ≤ 𝜈): The dictionary remains unchanged, thus 𝐷𝑖 = 𝐷𝑖−1 , 𝑲 ̃ −1 and 𝑚𝑖 = 𝑚𝑖−1 . Defining:
𝑖 𝑖−1
𝑷 𝑖−1 𝒂̃ 𝑖
𝒒𝑖 = (25)
1 + 𝒂̃ 𝑇𝑖 𝑷 𝑖−1 𝒂̃ 𝑖
while also the recursive formula for 𝑷 𝑖 :
𝑷 𝑖−1 𝒂̃ 𝑖 𝒂̃ 𝑇𝑖 𝑷 𝑖−1
𝑷 𝑖 = 𝑷 𝑖−1 − (26)
1 + 𝒂̃ 𝑇𝑖 𝑷 𝑖−1 𝒂̃ 𝑖

6
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

the coefficient vector 𝜶 𝑖 is updated as:


−1 𝑇
̃ 𝒒 𝑖 (𝑦𝑖 − 𝒌̃ 𝜶 𝑖−1 )
𝜶 𝑖 = 𝜶 𝑖−1 + 𝑲 𝑖 𝑖−1 (27)

Note that the kernel vector 𝒌̃ 𝑖−1 (𝒙𝑖 ) = 𝒌̃ 𝑖−1 = 𝜅(𝐷𝑖−1 , 𝒙𝑖 ) is calculated using the support vectors 𝒙̃𝑗 stored in the dictionary at each
time step with the updated dictionary length 𝑚 (Eq. (21)). Thus, the kernel vector for the Gaussian kernel case is given by:
−(𝒙 −𝒙̃ ) 𝑇 (𝒙 ̃ 1∶𝑚 )∕(2ℎ2 )
𝑖 −𝒙
𝒌̃ 𝑖−1 = 𝜅(𝐷𝑖−1 , 𝒙𝑖 ) = 𝑒 𝑖 1∶𝑚𝑖−1 𝑖−1 (28)

Also, the prediction is computed by an inner product as:


𝑇
𝑓̂(𝒙𝑖 ) = 𝒌̃ 𝑖−1 𝜶 𝑖−1 (29)

Therefore, the resulting ALD-KRLS algorithm is summarized in Algorithm 3.

Algorithm 3 ALD-Kernel Recursive Least Squares [25].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1 −1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆 + 𝜅(𝒙1 , 𝒙1 )), 𝜶 1 = (𝑲̃ 1 𝑦1 ),
center list 𝐷1 = {}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
2: Compute kernel vector 𝒌 ̃ 𝑖 by using Eq. (28)
−1
3: ̃ ̃
𝒂̃ 𝑖 = 𝑲 𝑖−1 𝒌𝑖
4: ̃ 𝑇 𝒂̃ 𝑖
𝛿𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 ) − 𝒌 𝑖
5: if 𝛿𝑖 > 𝜈 then ⊳ ALD Breach Path
6: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register 𝒙𝑖 to dictionary
7: 𝑚𝑖 = 𝑚𝑖−1 + 1 ⊳ Update number of breaches
−1
8: Compute kernel matrix 𝑲̃ 𝑖 by using Eq. (19)
9: Compute 𝑷 𝑖 matrix by using Eq. (24)
10: Compute 𝜶 𝑖 by using Eq. (20)
11: else ⊳ ALD Pass Path (𝛿𝑖 ≤ 𝜈)
12: 𝐷𝑖 = 𝐷𝑖−1 , 𝑚𝑖 = 𝑚𝑖−1 ⊳ Dictionary unchanged
13: Compute 𝒒 matrix by using Eq. (25)
14: Compute 𝑷 𝑖 matrix by using Eq. (26)
−1 −1
15: Compute 𝜶 𝑖 by using Eq. (27) with 𝑲̃ = 𝑲̃ 𝑖 𝑖−1
16: end if
17: end for

5. Sliding-window and fixed-budget kernel algorithms

This class of kernel algorithms provides an alternative solution to sparsification procedures introducing a combination strategy
of growing and pruning. This way the growing number of support vectors (directly connected with the size of kernel matrix) is
curbed either by pruning the oldest input sample (sliding-window algorithm) or by pruning the least significant one (fixed-budget
algorithm) at each time step.

5.1. Sliding-window kernel recursive least squares

The Sliding Window KRLS (SW-KRLS) algorithm has been proposed in [28] combining a sliding window with fixed length and a
L2 regularization. This technique presents low computational complexity and provides a solution against overfitting that tracks time
variations. Also, SW-KRLS presents better performance over RLS, KRLS and ALD-KRLS when tracking abrupt changes and operating
in non-stationary scenarios. However, more recent extensions achieve better tracking ability in non-stationary environments [56].
The main idea behind sliding window is that only depends on the latest 𝑚 number of observations for regression. At every
iteration the algorithm stores the new sample into the dictionary, while discards the most outdated (older) training sample keeping
the dictionary at fixed pre-defined size. In the general form in a kernel-based least squares manner as in Eq. (17), the coefficient
can be expressed as 𝜶 = [𝜆𝑰 + 𝜱𝑇 𝜱]−1 𝒚. Instead of having 𝑲 = [𝜆𝑰 + 𝜱𝑇 𝜱]−1 as in the KRLS case, here 𝜶 = (𝜆𝑰 + 𝑲)−1 𝒚 with 𝑲
being the 𝑚 by 𝑚 kernel matrix. For now on, in this section 𝑲 ̃ 𝑖 will be the regularized version of kernel matrix:
[ ]
𝑲̃ 𝑖−1 𝒌̃ 𝑖−1 (𝒙𝑖 )
̃ 𝑖 = 𝜆𝑰 + 𝜱𝑇 𝜱𝑖 =
𝑲 (30)
𝑖 ̃𝒌𝑖−1 (𝒙𝑖 )𝑇 𝜆 + 𝑘𝑖𝑖

where 𝑘𝑖𝑖 = 𝜅(𝒙𝑖 , 𝒙𝑖 ), 𝜆 is still the regularization factor and 𝒌̃ 𝑖−1 (𝒙𝑖 ) = 𝒌̃ 𝑖−1 = [𝜅(𝒙𝑖−𝑚+1 , 𝒙𝑖 ), … , 𝜅(𝒙𝑖−1 , 𝒙𝑖 )]𝑇 = 𝜅(𝐷𝑖−1 , 𝒙𝑖 ). The size of
the kernel matrix remains unchanged using the aforementioned operations of growing and pruning. At time step 𝑖 the new sample
data is used to add one row and one column to the last row and column of the kernel matrix leading to an upsized matrix 𝑲̆ 𝑖 . The

7
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

−1 −1
inverse matrix 𝑲̆ 𝑖 can be computed using the previous inverse kernel matrix 𝑲
̃
𝑖−1 (inverse kernel matrix before expansion-upsizing )
by calculating:
[ ] [ −1 ]
𝑲̃ 𝑖−1 𝒃 ̃ 𝑇
̆ −1 =
𝑲 𝑖−1 + 𝑔𝒆𝒆 −𝑔𝒆
𝑲̆ 𝑖 = ⇒ 𝑲 𝑖 (31)
𝒃𝑇 𝑑 −𝑔𝒆𝑇 𝑔
̃ −1 𝒃 and 𝑔 = (𝑑 − 𝒃𝑇 𝒆)−1 . After expansion made, the kernel matrix is compressed, removing the
where 𝒃 = 𝒌̃ 𝑖−1 (𝒙𝑖 ), 𝑑 = 𝑘𝑖𝑖 , 𝒆 = 𝑲 𝑖−1
first row and column (most outdated sample) of the upsized kernel matrix 𝑲̆ 𝑖 . Thus, the upsized kernel matrix 𝑲̆ 𝑖 is downsized back
to matrix 𝑲 ̃ 𝑖 and its inverse matrix can be obtained based on the knowledge of 𝑲̆ −1 by the following calculations:
𝑖
[ 𝑇
]
𝑒 𝐟 −1 𝑇
𝑲̆ 𝑖 = ⇒𝑲̃
𝑖 = 𝑮 − 𝐟 𝐟 ∕𝑒 (32)
𝐟 𝑮
−1 −1 −1
where 𝑲 ̃ can be implemented by fast matrix operations as practically the first row and column pruned, i.e., 𝑮 ← 𝑲̆ (2 ∶ |𝑲̆ |, 2 ∶
𝑖 𝑖 𝑖
−1 −1 −1 −1 −1
|𝑲̆ |), 𝐟 ← 𝑲̆ (2 ∶ |𝑲̆ |, 1) and 𝑒 ← 𝑲̆ (1, 1). The pruning criterion is applied when the size of the upsized kernel matrix |𝑲̆ |
𝑖 𝑖 𝑖 𝑖 𝑖
exceeds the memory size (dictionary size) 𝑚. Note that in this algorithm, the concept of dictionary or memory is applied also for 𝒚.
If the memory size exceeds 𝑚, then pruning is applied on the oldest sample pair (𝒙1 , 𝑦1 ) of memory (first row and column of the 𝑲̆ 𝑖
are removed). The SW-KRLS approach is summarized in Algorithm 4.

Algorithm 4 Sliding-Window Kernel Recursive Least Squares [28].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1 −1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆 + 𝜅(𝒙1 , 𝒙1 )), 𝜶 1 = (𝑲̃ 1 𝑦1 ),
𝑚𝑒𝑚𝑜𝑟𝑦 = {(𝒙𝑖 , 𝑦𝑖 )}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
−1
2: Add pair {(𝒙𝑖 , 𝑦𝑖 )} to memory and calculate 𝑲̆ 𝑖 using Eq. (31) ⊳ Inverse of Upsized kernel matrix
3: if memory size > 𝑚 then
−1
4: Prune the oldest pair of data from memory and calculate 𝑲̃ 𝑖 using Eq. (32) ⊳ The upsized kernel matrix 𝑲̆ 𝑖 is
downsized back to 𝑲̃ 𝑖 and its inverse matrix is obtained
5: end if
−1
6: The updated solution is calculated using 𝜶 = 𝑲̃ 𝑖 𝒚
7: end for

5.2. Fixed-budget kernel recursive least squares

Fixed Budget KRLS (FB-KRLS) [29] in contrast with SW-KRLS does not prune the most outdated sample. Instead, this algorithm
omits the least significant center (data sample) at each time step. Also, this algorithm introduces a label updating procedure enhancing
the tracking ability of the overall approach. More specifically, in FB-KRLS a pruning criterion is applied to determine the index 𝑗 ∗ of
the least significant stored sample among all stored 𝑗 samples in memory (memory is still of size 𝑚). The utilized error is calculated
by:
|𝜶 𝑗𝑖 |
𝑐𝑟(𝒙𝑗 , 𝑦𝑗 ) = (33)
−1
[𝑲̆ 𝑖 ]𝑗,𝑗
where the index 𝑗 ∗ is obtained as 𝑗 ∗ = arg min1≤𝑗≤𝑚 𝒄𝒓. Then, the Eq. (32) is utilized as in standard SW-KRLS. Regarding the label
updating procedure that is adopted in FB-KRLS, this phase is applied right before the upsizing operation (as presented in Eq. (31))
to adjust the outputs stored in memory and therefore to achieve an enhanced tracking capability. The updating procedure of all the
stored labels, at each step, is given by:

𝑦𝑗 ← 𝑦𝑗 − 𝜂𝜅(𝒙𝑗 , 𝒙𝑖 )(𝑦𝑗 − 𝑦𝑖 ), ∀𝑗 (34)

where 𝜂 is a learning parameter, 𝑗 is the number of stored samples and 𝑖 refers to the step when a new input–output pair sample
(𝒙𝑖 , 𝑦𝑖 ) is received. The FB-KRLS method is summarized in Algorithm 5.

6. Quantization in kernel adaptive filtering

As discussed in Section 4, sparsification techniques prune the growth of networks by discarding the redundant input data and
accepting only the most important/informative inputs as new centers. The idea behind quantization approaches is that even if the
input data are identified as redundant they can be utilized to update the coefficients of the network, then they will be discarded.
These data are not important to update the structure of the network adding new centers, but they can provide a compact network
form by being incorporated into the coefficients update. More specifically, the quantization approach partitions the input space into
smaller regions. If the quantized input has already assigned a center, the input is identified as redundant, i.e. discarded (a new

8
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Algorithm 5 Fixed-Budget Kernel Recursive Least Squares [29].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉, learning step 𝜂 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆+𝜅(𝒙1 , 𝒙1 )),
−1
𝜶 1 = (𝑲̃ 1 𝑦1 ), 𝑚𝑒𝑚𝑜𝑟𝑦 = {(𝒙𝑖 , 𝑦𝑖 )}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
2: Update all stored labels 𝑦𝑗 using Eq. (34)
−1
3: Add pair {(𝒙𝑖 , 𝑦𝑖 )} to memory and calculate 𝑲̆ 𝑖 using Eq. (31) ⊳ Inverse of Upsized kernel matrix
4: if memory size > 𝑚 then
5: Determine least significant sample 𝑗 ∗ using Eq. (33)
−1
6: Prune 𝑗 ∗ th pair of data from memory and calculate 𝑲̃ 𝑖 using Eq. (32) ⊳ The upsized kernel matrix 𝑲̆ 𝑖 is downsized
̃
back to 𝑲 𝑖 and its inverse matrix is obtained
7: end if
−1
8: The updated solution is calculated using 𝜶 = 𝑲̃ 𝑖 𝒚
9: end for

center is not registered), but the coefficient of the already registered center is updated. This way input data irrespectively of their
information quality are utilized to update the coefficients of the network improving its performance. The main quantization versions
in kernel adaptive filtering context are the quantized KLMS (QKLMS) [26] and quantized KRLS (QKRLS) [27] algorithms.

6.1. Quantized kernel least mean square

The QKLMS algorithm is derived directly from KLMS by applying an online vector quantization (VQ) method [26]. It is reminded
that a nonlinear mapping 𝝋(⋅) transforms the input signal vector 𝒙𝑖 into the RKHS by Mercer theorem (Eq. (11)) and Eq. (12). Also,
the connection between the transformed input (into the high dimensional feature space H) 𝝋(𝒙𝑖 ), and the kernel is defined in Eq. (11)
as the usual dot product. The weight update equation in KLMS can be expressed as:

𝝎𝑖 = 𝝎𝑖−1 + 𝜂𝑒𝑖 𝝋(𝒙𝑖 ) (35)


where 𝝎𝑖 denotes the weight vector, 𝜂 is the learning step and 𝑒𝑖 is the prediction error written as:

𝑒𝑖 = 𝑦𝑖 − 𝝎𝑇𝑖−1 𝝋(𝒙𝑖 ) (36)


By quantizing the feature vector 𝝋(𝒙𝑖 ), the weight update equation of QKLMS algorithm is expressed as:
𝝎0 = 0
𝑒𝑖 = 𝑦𝑖 − 𝝎𝑇𝑖−1 𝝋(𝒙𝑖 ) (37)
̃
𝝎𝑖 = 𝝎𝑖−1 + 𝜂𝑒𝑖 𝑸[𝝋(𝒙 𝑖 )]
̃
where 𝑸[⋅] is the quantization operator in the RKHS H. Due to the high dimensionality of the feature space, the quantization is
applied in the original input space U instead of H. Therefore, denoting 𝑓𝑖 = 𝝎𝑖 𝝋(⋅) as in Eq. (12) the QKLMS algorithm yields:
𝑓0 = 0
𝑒𝑖 = 𝑦𝑖 − 𝑓𝑖−1 (𝒙𝑖 ) (38)
𝑓𝑖 = 𝑓𝑖−1 + 𝜂𝑒𝑖 𝜅(𝑸[𝒙𝑖 ], ⋅)
where 𝑖 is the step size, 𝒙𝑖 is the 𝑁 dimensional input, 𝑓 is the input–output mapping and 𝑸[⋅] denotes a quantization operator
in U. The notation of both quantization operators (in feature and input space) can be simplified respectively as: 𝝋𝑞𝑖 = 𝑸[𝝋(𝒙 ̃ 𝑖 )],
𝒙𝑞𝑖 = 𝑸[𝒙𝑖 ]. The network size is pruned by the quantization codebook size, where the codebook 𝐶𝑖 is equal with the dictionary 𝐷𝑖
and the partition depends on the Euclidean distance. Note that the vector quantizer 𝑸[𝒙𝑖 ] provides a mapping of the input into one
𝑚𝑖−1
of the 𝑚 code (support) vectors in 𝐷, thus 𝐷𝑖−1 = {𝒄 𝑛 }𝑛=1 . The QKLMS algorithm is described in Algorithm 6.
More specifically, the dictionary 𝐷1 and the quantization size 𝜀U > 0 are initialized. At each iteration the filter output and
the error are calculated. Next, the distance between current input sample 𝒙𝑖 and the dictionary 𝐷𝑖−1 is computed producing two
operational cases. In the first case if 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U , the dictionary remains unchanged (𝐷𝑖 = 𝐷𝑖−1 ) and 𝒙𝑖 is quantized to the
𝑗∗
closest center of the dictionary, i.e., 𝒙𝑞𝑖 = 𝐷𝑖−1 𝑗
with 𝑗 ∗ = arg min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖. The coefficient of the closest center is
updated as:
∗ ∗
𝜶 𝑗𝑖 = 𝜶 𝑗𝑖−1 + 𝜂𝑒𝑖 (39)
Otherwise, in the second case where 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) > 𝜀U , the dictionary is updated and new coefficient is assigned. Finally, the output
of the QKLMS can be expressed similar with Eq. (14) as:

𝑚
𝑇
𝑓̂(𝒙𝑖 ) = 𝜶 𝑗𝑖−1 𝜅(𝐷𝑖−1
𝑗
, 𝒙𝑖 ) = 𝜶 𝑖−1 𝒌̃ 𝑖 (40)
𝑗=1

9
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Algorithm 6 Quantized Kernel Least Mean Square [26].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
Initialization: choose kernel type 𝜅, kernel parameter 𝜉, learning step 𝜂, quantization threshold 𝜀U > 0, center dictionary
𝐷1 = {𝒙1 } and coefficient vector 𝜶 1 = [𝜂𝑦1 ]
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
∑𝑠𝑖𝑧𝑒(𝐷 )
2: 𝑓𝑖 = 𝑗=1 𝑖−1 𝜶 𝑗𝑖−1 𝜅(𝐷𝑖−1𝑗
, 𝒙𝑖 ) ⊳ Filter output
3: 𝑒𝑖 = 𝑦𝑖 − 𝑓𝑖 ⊳ Compute error
𝑗
4: 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) = min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
𝑗
5: 𝑗 ∗ = arg min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
6: if 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U then
7: 𝐷𝑖 = 𝐷𝑖−1 ⊳ 𝐷𝑖 Unchanged

8: Compute 𝜶 𝑗𝑖 by using Eq. (39) ⊳ Quantize 𝒙𝑖 to the closest center through updating the coefficient of that center
9: else
10: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register new center
11: 𝜶 𝑖 = [𝜶 𝑖−1 , 𝜂𝑒𝑖 ] ⊳ Assign new coefficient
12: 𝑚𝑖 = 𝑚𝑖−1 + 1 ⊳ Update size of Dictionary
13: end if
14: end for

where 𝑚 is the size of the dictionary i.e., 𝑚 = 𝑠𝑖𝑧𝑒(𝐷𝑖−1 ), the coefficient vector in RKHS is 𝜶 𝑖−1 = [𝜶 1𝑖−1 , 𝜶 2𝑖−1 , … , 𝜶 𝑚 𝑖−1
] and
𝒌̃ 𝑖 = [𝜅(𝐷𝑖−1 𝑖 𝑖−1 𝑖 𝑖−1 𝑖
̃ 𝑖 = 𝜅(𝐷𝑖−1 , 𝒙𝑖 ) as 𝐷𝑖−1 contains the support vectors (subset of input samples)
1 , 𝒙 ), 𝜅(𝐷2 , 𝒙 ), … , 𝜅(𝐷𝑚 , 𝒙 )] or simply 𝒌
𝑗
𝒙̃ 𝑗 until 𝑖 − 1 and is composed of 𝑚 quantization regions and 𝐷𝑖−1 is the 𝑗th member of dictionary 𝐷𝑖−1 .

6.2. Quantized kernel recursive least squares

The Quantized Kernel Recursive Least Squares (QKRLS) algorithm has been proposed in [27] deriving a quantization approach
of KRLS replacing the original input with the quantized one. As in the QKLMS case, the dictionary 𝐷𝑖 at time step 𝑖 is composed of
𝑚𝑖−1
𝑚 code (support) vectors, i.e., 𝐷𝑖 = {𝒄 𝑛 ∈ U}𝑛=1 . The cost function can be expressed as:
𝑚 (∑
]2 )
𝐿𝑛
∑ [
𝑦𝑛𝑗 − 𝝎𝑇 𝜑(𝒄 𝑛 ) + 𝜆‖𝝎‖2 (41)
𝑛=1 𝑗=1

where 𝜆 is still the regularization parameter, 𝐿𝑛 denotes the number of input data that lie in the same quantization region with
its center 𝒄 𝑛 and 𝑦𝑛𝑗 is the 𝑗th desired output within the 𝑛th quantization region. The optimal solution, as a quantized version of
Eq. (17), is given by:
[ ]
𝝎𝑖 = 𝜱 ̃ 𝑖 𝜦𝑖 𝜱 ̃ 𝑇 + 𝜆𝑰 −1 𝜱̃ 𝑖 𝒚̃ 𝑖 (42)
𝑖
[ ] [ ]
where 𝜱 ̃ 𝑖 = 𝜑(𝒄 1 ), … , 𝜑(𝒄 𝑚 ) denoting the centers in the RKHS, 𝜦𝑖 = diag 𝐿1 , … , 𝐿𝑚 is a diagonal matrix, 𝒚̃ 𝑖 is the vector of
[∑𝐿1 ∑𝐿𝑚 ]𝑇 [ ]
quantized desired outputs 𝒚̃ 𝑖 = 𝑦 , … , 𝑗=1 ̃ 𝑖 + 𝜆𝑰 −1 with 𝑲
𝑦𝑚𝑗 . Then, denoting 𝑷 𝑖 = 𝜦𝑖 𝑲 ̃ 𝑇𝜱
̃𝑖 =𝜱 ̃ 𝑖 the optimal solution
𝑗=1 1𝑗 𝑖
becomes 𝝎𝑖 = 𝜱 ̃ 𝑖 𝜶 𝑖 . Similar with the QKLMS algorithm, two operational cases emerge regarding the distance between 𝑥𝑖 and the
dictionary 𝐷𝑖−1 :

• If distance meets the condition 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U the dictionary will remain unchanged (𝐷𝑖 = 𝐷𝑖−1 and 𝑲 ̃𝑖=𝑲 ̃ 𝑖−1 ) and the input
𝑗∗ 𝑗
𝒙𝑖 is quantized to the 𝑗 ∗ th element (closest center) of the dictionary, i.e., 𝐷𝑖−1 = 𝑸[𝒙𝑖 ] where 𝑗 ∗ = arg min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 −𝐷𝑖−1 ‖.
𝑗∗
Therefore, in this case distance is 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) = ‖𝒙𝑖 − 𝐷𝑖−1 ‖. The updated forms of 𝜦𝑖 and 𝒚̃ 𝑖 are given by:

𝜦𝑖 = 𝜦𝑖−1 + 𝜃𝑗 ∗ 𝜃𝑗𝑇∗
(43)
𝒚̃ 𝑖 = 𝒚̃ 𝑖−1 − 𝑦𝑖 𝜃𝑗 ∗
where 𝜃𝑗 ∗ is a column vector of size equal with dictionary (|𝐷𝑖−1 |) with the 𝑗 ∗ th element being 1 and all other entries being
[ ]
0. The matrix 𝑷 𝑖 can be expressed as 𝑷 𝑖 = 𝑷 −1 ̃ 𝑖−1 −1 which can be rewritten, by the matrix inversion lemma, as:
+ 𝜃𝑗 ∗ 𝜃𝑗𝑇∗ 𝑲
𝑖−1
∗ ∗
̃ 𝑗 )𝑇 𝑷 𝑖−1 ]
𝑷 𝑗𝑖−1 [(𝑲 𝑖−1
𝑷 𝑖 = 𝑷 𝑖−1 − ∗ (44)
̃ 𝑗 )𝑇 𝑷 𝑗 ∗
1 + (𝑲 𝑖−1 𝑖−1
∗ ∗
̃ 𝑗 denote the 𝑗 ∗ th columns of 𝑷 𝑖−1 and 𝑲
where 𝑷 𝑗𝑖−1 and 𝑲 ̃ 𝑖−1 respectively. Then, the coefficient vector can be expressed as:
𝑖−1

∗ ∗
̃ 𝑗 =𝑇 𝜶 𝑖−1 ]
𝑷 𝑗𝑖−1 [𝑦𝑖 − (𝑲 𝑖−1
𝜶 𝑖 = 𝑷 𝑖 𝒚̃ 𝑖 = 𝜶 𝑖−1 + 𝑗∗ ∗
(45)
̃ )𝑇 𝑷 𝑗
1 + (𝑲 𝑖−1 𝑖−1

10
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

• If distance meets the condition 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) > 𝜀U the dictionary is updated 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 }. In this case, 𝜦𝑖 and 𝒚̃ 𝑖 are given
by:
[ ] [ ]
𝜦𝑖−1 𝒚̃
𝜦𝑖 = , 𝒚̃ 𝑖 = 𝑖−1 (46)
1 𝑦𝑖
The matrix 𝑷 𝑖 can be expressed as:
[ ]
−1
𝑷 𝑖−1 𝑟𝑖 + 𝑧𝛬
𝑖 𝑧𝑖
𝑇 −𝑧𝛬𝑖
𝑷 𝑖 = 𝑟𝑖 (47)
−𝑧𝑖 𝑇 1
[ ]𝑇
where 𝑟𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 ) − 𝒉𝑖 𝑇 𝑧𝛬 𝛬 1 𝑚
𝑖 , 𝑧𝑖 = 𝑷 𝑖−1 𝜦𝑖−1 𝒉𝑖 , 𝒉𝑖 = 𝜅(𝐷𝑖−1 , 𝒙𝑖 ), … , 𝜅(𝐷𝑖−1 , 𝒙𝑖 ) with 𝑚 being the size of dictionary at 𝑖 − 1
and 𝑧𝑖 = 𝑷 𝑖−1 𝒉𝑖 . Thus, the coefficient vector is obtained by:
[ ]
𝜶 𝑖−1 − 𝑧𝛬 −1
𝑖 𝑟𝑖 𝑒𝑖
𝜶 𝑖 = 𝑷 𝑖 𝒚̃ 𝑖 = (48)
𝑟−1
𝑖 𝑒𝑖

where the prediction error is 𝑒𝑖 = 𝑦𝑖 − 𝒉𝑖 𝑇 𝜶 𝑖−1 .

The QKRLS algorithm is summarized in Algorithm 7.

Algorithm 7 Quantized Kernel Recursive Least Squares [27].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
Initialization: choose kernel type 𝜅, kernel parameter 𝜉, learning step 𝜂, quantization threshold 𝜀U > 0, center dictionary
[ ]−1
𝐷1 = {𝒙1 }, 𝛬1 = 1, 𝑷 1 = 𝜆 + 𝜅(𝒙1 , 𝒙1 ) and coefficient vector 𝜶 1 = 𝑷 1 𝑦1
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
𝑗
2: Compute distance between 𝒙𝑖 and 𝐷𝑖−1 : 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) = min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
𝑗
3: 𝑗 ∗ = arg min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
4: if 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U then
5: 𝐷𝑖 = 𝐷𝑖−1 , 𝑚𝑖 = 𝑚𝑖−1 ⊳ 𝐷𝑖 Unchanged
6: Update 𝜦𝑖 using Eq. (43)
7: Update 𝑷 𝑖 using Eq. (44)

8: Compute 𝜶 𝑗𝑖 by using Eq. (45) ⊳ Quantize 𝒙𝑖 to the closest center through updating the coefficient of that center
9: else
10: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register new center
11: Update 𝜦𝑖 using Eq. (46)
12: Update 𝑷 𝑖 using Eq. (47)
13: Update 𝜶 𝑖 using Eq. (48) ⊳ Quantize 𝒙𝑖 to itself
14: 𝑚𝑖 = 𝑚𝑖−1 + 1 ⊳ Update size of Dictionary
15: end if
16: end for

7. Combined approaches

Combination of already established mechanisms in the KAF domain has been motivated by the increasing computational
complexity and memory requirements. The increase of data size expands the computational complexity regarding the calculation of
kernel inverse matrix. Thus, this challenge leads to the development of combined algorithmic extensions embodying sparsification
approaches with the quantization method. In this section, two of these implementation will be described.

7.1. QALD-KRLS

This algorithm, QALD-KRLS [30], combines the ALD sparsification criterion with the quantization technique reducing the kernel
structure size. This combination is implemented by the resetting occurred in line 12 of Algorithm 8. In [30], the authors highlight
that this resetting has no effect on the operation of the previous mapping procedure. Parameter 𝜃𝑗 ∗ is a column vector of size equal
with dictionary (|𝐷𝑖−1 |) with the 𝑗 ∗ th element being 1 and all other entries being 0 (as presented in the QKRLS case). In rest of the
algorithmic procedure the principal matrices and vectors are calculated as in the ALD-KRLS case.

7.2. ANS-QKRLS

The adaptive normalized sparse QKRLS algorithm (ANS-QKRLS) [31] is a combination of four main components in order to
bring computational efficiency up and enhance the tracking capability of KRLS in time-varying environments. These combined
components are: (a) ALD sparsification criterion; (b) coherence sparsification criterion; (c) quantization, and; (d) dynamic adjustment

11
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Algorithm 8 QALD-KRLS [30].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1 −1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆 + 𝜅(𝒙1 , 𝒙1 )), 𝜶 1 = (𝑲̃ 1 𝑦1 ),
center list 𝐷1 = {}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
2: Compute kernel vector 𝒌 ̃ 𝑖 by using Eq. (28)
−1
3: ̃𝑖
𝒂̃ 𝑖 = 𝑲̃ 𝑖−1 𝒌
4: ̃ 𝑇 𝒂̃ 𝑖
𝛿𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 ) − 𝒌 𝑖
𝑗
5: Compute distance between 𝒙𝑖 and 𝐷𝑖−1 : 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) = min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
6: if 𝛿𝑖 > 𝜈, 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) > 𝜀U then
7: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register 𝒙𝑖 to dictionary
8: 𝑚𝑖 = 𝑚𝑖−1 + 1 ⊳ Update number of breaches
−1
9: Compute kernel matrix 𝑲̃ 𝑖 , 𝑷 𝑖 and 𝜶 𝑖 as in the ALD-KRLS case by using Eq. (19), (24), (20)
10: else
11: if 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U then
12: reset: 𝒂̃ 𝑖 = 𝜃𝑗 ∗
13: end if
14: 𝐷𝑖 = 𝐷𝑖−1 , 𝑚𝑖 = 𝑚𝑖−1 ⊳ Dictionary unchanged
−1 −1
15: Compute 𝒒, 𝑷 𝑖 and 𝜶 𝑖 as in the ALD-KRLS by using Eq. (25), Eq. (26) and Eq. (27) with 𝑲̃ = 𝑲̃ 𝑖 𝑖−1
16: end if
17: end for

of weights; More specifically, the first two components are combined to detect the contribution of input to the current dictionary
at each time step. This way, the informative data are registered when the contribution is higher than specific thresholds reducing
the redundant information, while at the same time the dimension of kernel matrix is effectively reduced. Also, the quantification
component serves as a mechanism that makes full use of the information, exploiting the redundant information to update the
algorithm’s parameters (see Section 6). The last component, dynamic adjustment of weights, enhances the capability of KRLS to
track time-varying characteristics making the algorithm insensitive to outliers or mutations in the environments with noise or time-
varying characteristics. The ALD criterion and the quantization follow the procedures described in Sections 4 and 6 respectively.
The coherence criterion [23] introduces the coherence coefficient 𝜇:

𝜇= max |𝜅(𝒙̃ 𝑗 ), 𝒙𝑖 )| ≤ 𝜇0 (49)


𝑗=1,2,…,𝑖−1

where 𝒙̃ 𝑗 is the 𝑗th support vector in the existing dictionary until 𝑖 − 1 and 𝜇0 belongs to interval [0, 1]. If coefficient 𝜇 is no more
than 𝜇0 , then current input 𝒙𝑖 is added to the dictionary. Regarding the dynamic adjustment of weights, this is a normalization
extension of Eq. (27) leading to the following form:
̃ −1 𝒒 𝑖 (𝑦𝑖 − 𝒌̃ 𝑇 𝜶 𝑖−1 )
𝑲 𝑖 𝑖−1
𝜶 𝑖 = 𝜶 𝑖−1 + (50)
‖ 𝑇 ‖2
𝜏 + ‖𝒌̃ 𝑖−1 ‖
‖ ‖
where 𝜏 is a small parameter so that the denominator does not become zero.
The pseudo-code of ANS-QKRLS is presented in Algorithm 9. Practically, the algorithm is divided into three operational cases.
In the first case, if the ALD criterion breaches (𝛿𝑖 > 𝜈), the coherence coefficient 𝜇𝑖 of the current input is no more than 𝜇0 and
𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) > 𝜀U , then the input is added to the dictionary expanding its size. In the second case, if the ALD criterion holds or the
current coherence coefficient 𝜇𝑖 is less than 𝜇0 but 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) < 𝜀U , then the dictionary remains unchanged and 𝒙𝑖 is quantized to
the closest center. Finally, if both of the above cases are not satisfied, then the current example can be expressed by the combinations
of existing mutual independent components in the dictionary. Therefore, this input sample is discarded but the coefficient weights
are adjusted accordingly.

8. Remaining useful life problem description

Over the last decades, there has been an intensified research interest in the field of remaining useful life (RUL) prediction [38].
This term describes the progression of faults in Prognostics and Health Management (PHM) applications. In the fault detection
concept, when a defect has occurred the objective is to identify it accurately from a list of potential failures. On the other hand, in
the prognostics area, the objective is to predict the available time (life-cycle) before a failure occurs aiming to perform informed
maintenance actions that minimize downtime and prevent critical failures.

12
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Algorithm 9 ANS-QKRLS [31].


Input: {𝑥𝑖 , 𝑦𝑖 } 𝑤𝑖𝑡ℎ 𝑖 = 1 … 𝑁
−1 −1
Initialization: choose kernel type 𝜅, kernel parameter 𝜉 and regularization parameter 𝜆, 𝑲̃ 1 = 1∕(𝜆 + 𝜅(𝒙1 , 𝒙1 )), 𝜶 1 = (𝑲̃ 1 𝑦1 ),
𝑷 1 = 1, center list 𝐷1 = {}
1: for 𝑖 = 2 … 𝑁 do ⊳ Iterate over input–output pairs
2: Compute kernel vector 𝒌 ̃ 𝑖 by using Eq. (28)
−1
3: ̃𝑖
𝒂̃ 𝑖 = 𝑲̃ 𝑖−1 𝒌
4: ̃ 𝑇 𝒂̃ 𝑖
𝛿𝑖 = 𝜆 + 𝜅(𝒙𝑖 , 𝒙𝑖 ) − 𝒌 𝑖
𝑗
5: Compute distance between 𝒙𝑖 and 𝐷𝑖−1 : 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) = min1≤𝑗≤𝑠𝑖𝑧𝑒(𝐷𝑖−1 ) ‖𝒙𝑖 − 𝐷𝑖−1 ‖
6: Compute coherence coefficient 𝜇 using Eq. (49)
7: if 𝛿𝑖 > 𝜈, 𝜇𝑖 ≤ 𝜇0 and 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) > 𝜀U then
8: 𝐷𝑖 = {𝐷𝑖−1 , 𝒙𝑖 } ⊳ Register 𝒙𝑖 to dictionary
9: 𝑚𝑖 = 𝑚𝑖−1 + 1 ⊳ Update number of breaches
−1
10: Compute kernel matrix 𝑲̃ as in Eq. (19),𝑖
[ ] [ ]
𝑨𝑖−1 𝟎 𝑷 𝑖−1 𝟎
𝑨𝑖 = ,𝑷𝑖 =
𝟎𝑇 1 𝟎𝑇 1
11: Compute 𝜶 𝑖 as in Eq. (20) ⊳ Coefficient vector is calculated as in the standard KRLS
12: else if 𝛿𝑖 ≤ 𝜈 or 𝜇𝑖 > 𝜇0 then
13: if 𝑑𝑖𝑠(𝒙𝑖 , 𝐷𝑖−1 ) ≤ 𝜀U then
𝑗∗ 𝑗∗
[𝑲 𝑖−1 ]−1 𝒂̃ 𝑖
14: 𝐷𝑖 = 𝐷𝑖−1 , 𝒒 𝑖 = 𝑗∗ 𝑗∗ 𝑗∗
, ⊳ Dictionary unchanged
1+(𝒂̃ 𝑖 )𝑇 [𝑲 𝑖−1 ]−1 𝒂̃ 𝑖
𝑗∗
15: [𝑲 𝑖 ]−1 = [𝑲 𝑖−1 ]−1 − 𝒒 𝑖 (𝒂̃ 𝑖 )𝑇 [𝑲 𝑖−1 ]−1 ,

16: 𝜶 𝑖 = 𝜶 𝑖−1 + 𝒒 𝑖 (𝑦𝑖 − (𝒂̃ 𝑗𝑖 )𝑇 𝜶 𝑖−1 ) ⊳ Quantize 𝒙𝑖 to the closest center
17: else
18: 𝐷𝑖 = 𝐷𝑖−1 , 𝑚𝑖 = 𝑚𝑖−1 , ⊳ Dictionary unchanged
19: 𝑲̃ 𝑖 = 𝑲̃ 𝑖−1 , 𝑨𝑖 = [𝑨𝑇 , 𝒂̃ 𝑖 ]𝑇
𝑖
20: Compute 𝒒 𝑖 as in Eq. (25)
21: Compute 𝜶 𝑖 as Eq. (50) ⊳ Dynamic adjustment of coefficients using the normalized version of ALD Pass Path, i.e.,
normalized form of Eq. (27)
22: end if
23: end if
24: end for

The most widely studied dataset in the RUL domain is called C-MAPSS [46]. This dataset has been created by NASA simulating
the degradation of aircraft engines (turbofan engines) utilizing the simulation tool Commercial Modular Aero-Propulsion System
Simulation. A set of multivariate signals is provided including operational settings, temperature, pressure and mainly sensory data
generated under open-loop system configurations, as depicted in Table 1. In this work, 17 of those measurements will be utilized,
as indicated in green color. The C-MAPSS involves four sub-sets (FD001, FD002, FD003, FD004) that concern different operating
conditions and fault modes. Each sub-dataset includes 21 sensory data and 3 operating settings among a training set and a testing
set. These measurements practically start at a degradation level that is considered as healthy and stop when failure is reached. In
this work, evaluation tests will be performed using FD001 which includes 100 training and 100 testing units with 20631 and 13096
total instances across all engines respectively. Instances are distributed unevenly across the 100 engines, meaning that the behavior
of each engine is represented by a different number of instances. This discrepancy is due to the varied number of operational cycles
each engine undergoes from its healthy state to failure, reflecting the distinct operational life spans and conditions specific to each
engine, thereby showcasing their unique degradation patterns. The objective is to predict the RUL of each turbofan engine at the
end of its testing record. For RUL target label, a piece-wise linear degradation function is adopted limiting the RUL value at a
maximum threshold. The maximum constant value of RUL is chosen as 125 time cycles and after that value, the engine starts to
degrade at a certain point as illustrated in Fig. 1.
The utilization of a piece-wise linear model for predicting RUL reflects the degradation characteristics typical to turbofan engines.
Initial stage of operation suggests a healthy period for the system, while degradation is increased towards its end-of-life phase. This
justifies setting a piece-wise linear degradation model, aligning with findings from other works in the literature [57–60], emphasizing
the model’s fidelity in mirroring the engines’ life stages. Thus, a constant value of RUL is used at 125 cycles for healthy stage based
on observed data. This approach mitigates the risk of RUL overestimation, crucial for ensuring reliable maintenance schedules
and operational safety. Moreover, the linear degradation model represents the most natural choice in scenarios where specific prior
knowledge of an appropriate degradation curve is lacking. By assuming a piece-wise linear degradation, the model offers a pragmatic
balance between simplicity and the ability to reflect real-world engine behavior, particularly when detailed degradation patterns
are unknown or hard to predict accurately.

13
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table 1
C-MAPSS dataset overview.
Symbol Description details
1 op_setting_1 Operational setting 1 (environment variable)
2 op_setting_2 Operational setting 2 (environment variable)
3 op_setting_3 Operational setting 3 (environment variable)
4 T2 Total temperature at fan inlet
5 T24 Total temperature at LPC outlet
6 T30 Total temperature at HPC outlet
7 T50 Total temperature at LPT outlet
8 P2 Pressure at fan inlet
9 P15 Total pressure in bypass-duct
10 P30 Total pressure at HPC outlet
11 Nf Physical fan speed
12 Nc Physical core speed
13 Epr Engine pressure ratio (P50/P2)
14 Ps30 Static pressure at HPC outlet
15 Phi Ratio of fuel flow to Ps30
16 NRf Corrected fan speed
17 NRc Corrected core speed
18 BPR Bypass Ratio
19 farB Burner fuel-air ratio
20 htBleed Bleed Enthalpy
21 Nf_dmd Demanded fan speed
22 PCNfR_dmd Demanded corrected fan speed
23 W31 HPT coolant bleed
24 W32 LPT coolant bleed

Fig. 1. Piece-wise linear RUL function.

Fig. 2. Three sub-scenarios of handling training data.

8.1. Data pre-processing

In the training stage, we consider two distinct cases where features are scaled using conventional normalization and standard-
ization (z-score normalization) operations. For data normalization the input data are scaled to [−1, 1], whereas data standardization
follows classic zero-mean and unit-variance. In both scaling cases, we distinct three sub-scenarios of handling data based on their
sequence length during training phase. It is reminded that 100 engine cases constitute training data, but each turbofan engine

14
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 3. Visual representation of Score, RMSE and State (shaded areas) evaluation metrics.

includes different length of records. Thus, the three sub-scenarios are: (a) natural handling of data with engines being unsorted; (b)
sorted engine instances based on their sequence length in a descending order; (c) sorted engine instances by their sequence length in
an ascending order; The latter two scenarios differ only in the order in which the algorithms will receive the input data (see Fig. 2).
The objective is to investigate whether the performance of KAF-based algorithms is affected when more importance is given during
the training process to engines with either larger or smaller number of sequence length. It is important to evaluate the resiliency
of KAF-based models under these cases that may lead them to local-minima, while also to study their behavior during the process
of storing support vectors into their dictionaries. The idea of evaluating the performance of KAF-based algorithms under different
sorted engine data emerged from Convolutional Neural Network (CNN) implementations that may follow sorting of training data
based on engine sequence length in this application. Sorted data based on length of sequence data in CNNs is a way to reduce the
amount of padding choosing a mini-batch size that divides the training dataset evenly. Therefore, we study the performance of KAF
algorithms also under a recognized technique that enhances the training procedure of CNNs, although intuitively due to the process
of the training mechanism in KAF-based algorithms seems like an extreme case that will degrade the prognostic performance.

8.2. Evaluation metrics

The most widely evaluation metrics used in this problem are the root mean square error (RMSE) and an asymmetric scoring
function (Score) [46]:

√ 𝑛
√1 ∑
𝑅𝑀𝑆𝐸 = √ 𝑑2 (51)
𝑛 𝑖=1 𝑖
⎧∑𝑛 −𝑑𝑖
⎪ 𝑖=1 exp ( 13 ) − 1, for 𝑑𝑖 < 0
𝑆𝑐𝑜𝑟𝑒 = ⎨∑ +𝑑
(52)
𝑛
⎪ 𝑖=1 exp ( 10𝑖 ) − 1, for 𝑑𝑖 ≥ 0

where 𝑑𝑖 stands for the prediction error with 𝑑𝑖 = 𝑅𝑈 𝐿𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 − 𝑅𝑈 𝐿𝑡𝑟𝑢𝑒 and 𝑛 is the total number of instances. Positive prediction
error means late prediction, while negative error is connected with early prediction. The RMSE induces equal penalization to both
early and late predictions for the same absolute value of 𝑑𝑖 . On the contrary, the asymmetric scoring function penalizes more late
predictions introducing a more fair evaluation metric (see Fig. 3).
Indeed, from the maintenance point of view and in line with the risk-averse attitude in the aerospace industries, late forecasts do
not allow maintenance to take place (thus more penalty applied), while very early forecasts may not be associated with major damage
although they could waste maintenance resources (hence less penalization compared with late ones). Although, the scoring function
is the most widely used evaluation metric in this application, there are some drawbacks that have been briefly identified [57]. First
and foremost, much late predictions may dominate the performance of this metric. Also, there is lack of prognostic horizon enabling
algorithms to be evaluated at a specific confidence level. Lastly, the scoring function favors those algorithms that artificially increase
the performance (lowering Score) by underestimating the remaining useful life. For this reason, we utilize a supplementary metric
to assess RUL predictions as {early}, {on time} or {late} as originally presented in [61] and later in [62,63]. This metric is used
in conjunction with RMSE and Score to provide an oversight and explainable view of the under-examination algorithms’ behavior.
Thus, this metric serves as a measurement of the state of prediction and this is given by:

⎧𝑂𝑛𝑡𝑖𝑚𝑒, for − 13 < 𝑑𝑖 < 10



𝑆𝑡𝑎𝑡𝑒 = ⎨𝐿𝑎𝑡𝑒, for 𝑑𝑖 ≥ 10 (53)

⎩𝐸𝑎𝑟𝑙𝑦, for 𝑑𝑖 ≤ −13
It should be noted that someone could use more strict bounds or alternatively create more levels to discretize State. However, in this
work we do not use solely this metric to evaluate the performance of KAF-based implementations. We use State as a supplementary

15
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

metric to explain to some extent the performance difference between RMSE and Score, if any, while also to explain the behavior
of KAF-based algorithms. This is an important feature that shows if an algorithm underestimates RUL (increased number of early
predictions) or if late predictions increase dramatically the Score metric. Note also that even few but heavily late predictions may
dominate the Score leading to a degraded performance. Fig. 3 illustrates the levels of State as shaded areas for a given interval of
prediction error, i.e. [−50, 50].
However, there are more evaluation metrics that have been reported in the literature. Prediction interval coverage probability
(PICP) [64] measures whether the observed target RUL lies within the prediction interval with a probability (1 − 𝛼) including lower
and upper bounds of prediction interval for every test sample. NMPIW [64] is the normalized version of mean prediction interval
width (MPIW) which gives a percentage of a range of expected RUL. PICP and NMPIW are conflicting, and a higher PICP will result
in a wider NMPIW. The continuous ranked probability score (CRPS) and its weighted extension, CRPS (𝐶𝑅𝑃 𝑆 𝑊 ) [65], are metrics
that evaluate the accuracy and sharpness of the estimated RUL distributions. The 𝛼-coverage and reliability score (RS) [65] evaluate
the reliability of the RUL prognostics by quantifying overestimation and underestimation.

9. Evaluating KAF approaches in remaining useful life: intra-comparison analysis

In this work, we distinguish two evaluation cases based on feature scaling operations of normalization and standardization,
as mentioned above. Three sub-scenarios stem under each case based on the method of treating the order of turbofan engines
towards constructing the training data. More specifically, in both scaling cases we handle data either as they are given in the
dataset (unsorted-natural order of engines) or we sort training examples by their sequence length in a descending and ascending
order as described in 8.1. In all scenarios, 17 features are selected (colored features in Table 1). Regarding the selected KAF
algorithms for comparison purposes, the most dominant sparsification candidate is chosen (ALD-KRLS), both strategies of growing
and pruning (SW-KRLS and FB-KRLS), both quantification approaches (QKRLS and QKLMS) and lastly the two combined approaches
that encapsulate sparsification and quantification (QALD-KRLS and ANS-QKRLS). For each algorithm, we vary the parameters
that control the dictionary size. This approach was informed by the direct impact these parameters have on the size of the KAF
networks and their overall performance. It is reminded that the term dictionary size typically refers to the number of basis
functions (kernels) that are used in the model. For KAFs, each entry in the dictionary corresponds to a center of an RBF unit.
The dictionary size therefore determines the complexity of the model: a larger dictionary can capture more complex functions
but may also be more prone to overfitting and will require more computational resources to update and evaluate. Our evaluation
explores these parameters, offering a comprehensive understanding of each algorithm’s operational details and performance under
various configurations. The performance of each KAF-based algorithm is evaluated using three metrics at different network sizes,
with training time also considered as an additional indicative factor. This multi-metric assessment provides a holistic view of the
algorithms’ capabilities, necessary for understanding their behavior in practical RUL prediction scenarios. Our extensive evaluation,
conducted on a conventional laptop with an AMD Ryzen 9 4900HS, 16 GB RAM, and GeForce RTX 2060 Max-Q, aims not only to
benchmark the current capabilities of KAF architectures in RUL prediction but also to guide future developments in the domain of
predictive maintenance, particularly in scenarios where computational resources are a limiting factor.

9.1. Normalization scaling case

Table A.5 provides a comparison study for the case of normalization scaling and natural handling (unsorted data) of training
engine data. The best performed network configurations under Score=650 are indicated in bold. The budget-based approaches (SW-
KRLS with 𝑚 = 1000 and FB-KRLS with 𝑚 = 450), as well as the ANS-QKRLS method with 𝑚 = 1140 produced the best results. Between
these three, FB-KRLS produced less {late} and {on time} predictions leading to an increased underestimation of RUL (37 {early}
predictions). Note that QALD-KRLS balances between two sub-operations (ALD side and Quantization side) based on parameters
configuration. Fig. 4 illustrates the main Score results of Table A.5 against the network size for all algorithms. As presented, the
algorithms ANS-QKRLS, QKLMS, ALD-KRLS and QALD-KRLS (ALD side) produce less oscillations as dictionary increases (network
size). Moreover, the quantization approaches verify the property of reducing the numerical complexity as they present better results
for smaller network size than the baseline ALD-KRLS algorithm. The fastest training time is observed in QKLMS as well as the ability
of curbing the dictionary growth for the same testing performance.
Table A.6 summarizes the performance analysis of all aforementioned algorithms for similar network sizes as in Table A.5. The
specificity here is that the training set has been composed sorting engines’ order, based on their sequence length, in a descending
(results in parentheses) and ascending manner (results in brackets). Take in mind that for the same amount of data samples
considered during training phase, in the ascending sorted case more sub-sets of engine behaviors have been included (see Fig. 2).
A general observation is that when descending sorting is adopted the algorithms tend to provide more {early} predictions, whereas
when training data are sorted in ascending order more {late} predictions emerge. More specifically, the ALD-KRLS method slightly
enhances its performance when using descending formulated data for training compared with unsorted case and it is strongly affected
by ascending sorting case producing worst results. QKRLS seems to be negatively influenced by both sorting strategies compared
with the unsorted case performance depicted in Table A.5. QKLMS outcomes a slightly enhancement with descending case while the
opposite is observed for the ascending case. Both budget-based approaches (SW-KRLS, FB-KRLS) produce worse results with data
sorting cases. Although, ANS-QKRLS algorithm presented worst results when using descending sorted training data, in ascending
sorted data case produced a level of resiliency and slightly enhancement (with bigger network size) compared with standard natural

16
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 4. Score vs Network size for the main results provided in Table A.5 (normalization scaling with unsorted data handling).

Fig. 5. Network size evolution curves for the normalization scaling case. (a) ALD-KRLS with 𝑚 = 326, while QALD-KRLS (ALD side) produces similar network
growth behavior; (b) QKLMS and QKRLS with 𝑚 = 660, while QALD-KRLS (Quantization side) follows similar behavior; (c) ANS-QKRLS with m = 770.

(unsorted) data handling case. Finally, QALD-KRLS (ALD side) produced a slightly better performance when using descending sorted
training data. The Quantization side of QALD-KRLS was negatively affected in both data sorting cases.
Regarding the behavior of KAF algorithms from the network size evolution perspective, Fig. 5 illustrates the network growth
curves for indicative configurations under different preprocessing scenarios (Unsorted, Sorted-Descending, Sorted-Ascending) within
the normalization scaling case. As presented in Fig. 5, irrespective of the KAF-based algorithm, in both sorting preprocessing
operations less support vectors are added to the dictionary compared with the unsorted-natural operational case until 10000 training
samples, then a ‘‘speed-up’’ is observed ending up in similar final network sizes. Also, around 6000 training samples an instant
increase of network sizes is observed in the descending sorting preprocessing scenario. It should be reminded that at each iteration
the number of samples investigated by algorithms is the same in all scenarios, but in the ascending sorting case more engines have
been included as the number of iterations increases (see Fig. 2). Note that SW-KRLS and FB-KRLS algorithms have not been taken
into account in Fig. 5 as their mechanism includes a pre-defined fixed dictionary size in which the oldest input sample and the least
significant one is pruned respectively.
The best two performances are selected in order to inspect in more detail their performance outcomes. The first candidate
for this comparison duel is SW-KRLS utilizing natural handling of training data with dictionary size of 1000 that produced
𝑆𝑐𝑜𝑟𝑒 = 545.82. The second approach is ANS-QKRLS using sorted training data in ascending order with network size of 780 and
𝑆𝑐𝑜𝑟𝑒 = 537.79. Fig. 6 depicts in an unfolded way the results of both approaches providing insights about their performance from

17
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 6. Concentration of individual prediction errors for the two best reported KAF algorithms in the normalization case. ANS-QKRLS (sorted training data in
ascending order with 780 dictionary size and 𝑆𝑐𝑜𝑟𝑒 = 537.79) vs SW-KRLS (natural handling of training data with dictionary size of 1000 and 𝑆𝑐𝑜𝑟𝑒 = 545.82).

the Score evaluation perspective along 100 test engines. Although, ANS-QKRLS presents worse RMSE (18.36) than SW-KRLS case
(RMSE=17.83), practically produces better concentration of individual predictions with four less {late} predictions. On the contrary,
in SW-KRLS case there are 2 {late} predictions that contribute more than 150 to the total Score (the two upper right points contribute
152.85 to the total of 545.82, i.e. around 28% increase in Score from these two {late} predictions).

9.2. Standardization scaling case

The same rationale as before is followed aiming to produce the performance of KAF-based algorithms under different parameter
settings for each preprocessing scenario (Unsorted, Sorted-Descending, Sorted-Ascending). The main reason for adopting this scaling
case, apart from an exploratory point of view, is to verify if KAF algorithms produce similar behavior regarding training data sorting
scenarios as in the normalization scaling case. Table A.7 presents the KAF algorithms’ performance for the conventional scenario of
handling training data (Unsorted-natural). Again, the best Score under the level of 650 is given in bold. The best performed method
is ANS-QKRLS utilizing 𝑚 = 160 providing 𝑆𝑐𝑜𝑟𝑒 = 468.58. Also, this model provides less oscillations in performance for all network
sizes sustaining dominant performance for all network size configurations.
Compared with Table A.5, ALD-KRLS algorithm provides similar results, QKRLS provides a slightly enhanced performance,
QKLMS similar results, SW-KRLS and FB-KRLS produce worse results, ANS-QKRLS outcomes an increased performance and QALD-
KRLS acts similarly. Table A.8 summarizes the performance of KAF-based approaches under the two sorting strategies. More
specifically, ALD-KRLS provided an increased performance using descending sorting compared with unsorted scenario, while in the
ascending scenario of handling training data no significant change is observed. QKRLS presented resiliency for some network sizes in
both sorting scenarios, while also an enhanced performance is produced for small network size (𝑆𝑐𝑜𝑟𝑒 = 601.22 and 𝑆𝑐𝑜𝑟𝑒 = 633.32
for 𝑚 = 22 and 𝑚 = 23 respectively). In the ascending scenario, QKLMS is strongly affected compared to the unsorted one, while
in the descending scenario, an enhanced performance is observed for a small network size, achieving a (𝑆𝑐𝑜𝑟𝑒 = 556.1 for 𝑚 = 22).
SW-KRLS and FB-KRLS are both influenced negatively by sorting scenarios, but mainly in the ascending one. ANS-QKRLS provides
a degraded model in the descending scenario, while in the ascending case showed low level of resiliency for low network sizes but
no better performance than the baseline (Unsorted-natural). Finally, QALD-KRLS follows similar behavior as ALD-KRLS and QKLMS
for the corresponding sides (ALD side and Quantization side) respectively.
The best two models here that formulate the comparison duel are the ANS-QKRLS utilizing unsorted training data with dictionary
size 𝑚 = 160 producing 𝑆𝑐𝑜𝑟𝑒 = 468.58 and QKLMS using sorted data in descending order with dictionary size 𝑚 = 22 leading to
𝑆𝑐𝑜𝑟𝑒 = 556.1. Fig. 7 illustrates, in an analytical way, the contribution of each test engine prediction to the Score evaluation metric
for the best reported models. QKLMS algorithm slightly promotes {early} predictions in respect to ANS-QKRLS, while the latter
presents RUL error concentration in a narrower band around zero. Also, QKLMS is dominated by one severe {early} prediction
(upper left point in Fig. 7) which corresponds to engine #45 for this algorithm.

9.3. Performance comparison within KAF domain

The predicted RUL for each engine in the testing set, utilizing the best four KAF outcomes, is illustrated in Fig. 8(a). An indicative
comparative example between the predicted RUL values and the actual RUL values is depicted in Fig. 8(b). SW-KRLS method includes
most fluctuating values during life cycle evolution compared with the other approaches, for this engine number. On the contrary,
ANS-QKRLS (standardization) case presents the best tracking performance providing the most smooth curve. In all approaches,
during early stage operation the prediction error is larger than in the late life phase of the engine. This is due to the fact that as the
engine operates closer to failure, more degradation information becomes available, resulting in high late-stage prediction accuracy.
Summarizing, Fig. 8(b) shows in higher level of detail the RUL tracking performance produced under testing engine #24, while
Fig. 8(a) depicts from an oversight perspective, unfolding the cumulative results of evaluation metrics, the overall performance for
all testing engines.

18
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 7. Concentration of individual prediction errors for the two best reported KAF algorithms in the standardization case. ANS-QKRLS (natural handling of
training data with 160 dictionary size and 𝑆𝑐𝑜𝑟𝑒 = 468.58) vs QKLMS (sorted training data in descending order with dictionary size of 22 and 𝑆𝑐𝑜𝑟𝑒 = 556.1).

Fig. 8. Performance visualization for the best reported KAF approaches. (a) Predicted RUL for each testing engine against actual cycles; (b) Comparison between
predicted and actual RUL for testing engine #24; Curve markers: ANS-QKRLS (normalization), SW-KRLS (normalization), ANS-QKRLS (standardization),
QKLMS (standardization).

19
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

9.4. Hyperparameter tuning in KAF

Within the framework of KAF algorithms, a typical challenge is related with the proper selection of hyperparameters. One critical
parameter is the bandwidth of Gaussian kernels, which significantly affects the smoothness of the model’s output and its ability to
accurately represent complex, non-linear trends in the data. Beyond these, each KAF variant operates with intrinsic mechanisms that
further influence the model’s dictionary size — essentially the number of kernels or corresponding weights, given their RBF-like
nature. This dictionary size, or network size, is directly impacted by the adjustment of specific hyperparameters, notably 𝜈 for
algorithms leaning towards sparsity, and 𝜀U for those inclined towards a quantization approach. Indicatively, Fig. 9 demonstrates
the primary hyperparameter’s effect on each algorithm under normalization scaling and natural handling conditions. Through these
visual analysis, convergence towards lower Score values is observed as the dictionary size increases, underlining the influence of
primal hyperparameters on the performance and complexity of KAF models.
In contexts where KAF algorithms are treated with a batch learning approach, techniques such as grid search, random search, and
Bayesian optimization with cross-validation serve as standard methods for hyperparameter tuning. Among these, Gaussian process
(GP) regression offers a more efficient and principled method, where optimal hyperparameters are identified by maximizing the log
marginal likelihood. In particular maximum likelihood type-II (marginal likelihood maximization or evidence maximization) is a
powerful generic way of adjusting hyperparameters via nonlinear optimization which scales linearly in the number of parameters.
Adaptive methods, including adaptive kernel size [17], multikernel adaptive filtering [18], and Gaussian KAFs with adaptive kernel
bandwidth [19], present tailored solutions in online learning scenarios. These techniques allow for real-time adjustments of the
kernel size or bandwidth, ensuring the KAF model remains optimally tuned to the evolving data stream, thus enhancing both
predictive performance and computational efficiency.

9.5. Impact of preprocessing operations on KAF algorithms performance

This section examines the influence of preprocessing operations on the performance of KAF algorithms in the context of RUL
prediction. We explore the effects of two feature scaling methods, normalization and standardization, alongside the three distinct
data handling strategies: unsorted (natural handling), descending, and ascending order sorting of training data. Performance
metrics, such as Score distribution and the distribution of {early}, {on-time}, and {late} predictions, are evaluated to establish
a comprehensive understanding of these preprocessing operations on the efficacy of KAF algorithms. More specifically, Fig. 10
illustrates the spread and central tendency of final RUL prediction scores for each KAF algorithm, categorized by natural, descending
order, and ascending order data handling methods, highlighting the influence of preprocessing strategies on algorithmic performance
consistency. The stacked bar charts in Fig. 11 present the proportional distribution of {early}, {on-time}, and {late} RUL predictions
for various KAF algorithms under different data handling strategies.

9.5.1. Feature scaling and data sorting strategy impact


Normalization and standardization preprocessing operations have been utilized to reshape the data landscape, impacting the
performance trajectory of KAF algorithms. From Fig. 10, it is evident that normalization contributes to a broader spread in
final scores for algorithms like SW-KRLS and FB-KRLS. In contrast, ANS-QKRLS and ALD-KRLS exhibit a more condensed score
distribution, which may indicate a more consistent performance across different dictionary sizes. The Score distribution unveils a
more complex narrative: although a similar behavior between normalization and standardization can be observed in Fig. 11, the
impact of {late} predictions on the final score is more pronounced under normalization (Fig. 10). This is manifested in the wider
interquartile ranges and the presence of more outliers, indicative of certain late predictions being markedly later in the normalized
scenario, thereby imposing a steeper penalty on the final score. Standardization generally compresses the score distribution, as
observed for all algorithms, particularly in the context of natural handling. This finding suggests that standardization might be
providing a more discriminate feature space, which is especially beneficial for these algorithms even when operating with smaller
dictionary sizes. The stacked bar charts (Fig. 11) do not exhibit a stark contrast in the proportion of {early}, {on-time}, and {late}
predictions between normalization and standardization. However, it can be observed that descending sorting tends to provide more
{early} predictions, whereas when training data are sorted in ascending order more {late} predictions emerge.

9.5.2. Algorithm-specific observations, insights and implications


The conducted analysis reveals algorithm-specific behaviors under different preprocessing regimes. ANS-QKRLS demonstrates
an enhanced robustness under normalization across varied network sizes, suggesting an adaptive-related behavior in capturing
degradation signals with minimal oscillation. SW-KRLS and FB-KRLS, while exhibiting a wider variability in scores, could potentially
benefit from tailored preprocessing operations to leverage their algorithmic strengths. ALD-KRLS and QALD-KRLS (ALD side)
maintain a tighter score distribution, which might indicate a resilience to preprocessing variations and an inherent stability across
different operational conditions.
The insights produced from this analysis are multifaceted. The choice of feature scaling method has a profound impact on the KAF
algorithms’ ability to generalize across different operational behaviors and to accurately detect degradation patterns. Normalization,
while beneficial for stability, may lead to greater Score variability due to the magnified impact of late predictions. Standardization,
conversely, reduces score dispersion, potentially offering a more consistent performance benchmark but requiring careful calibration
to optimize early degradation detection capabilities.

20
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 9. Score vs. Dictionary size under Natural handling — Normalization. (a) ALD-KRLS; (b) QKRLS; (c) QKLMS; (d) ANS-QKRLS; (e) QALD-KRLS (ALD side);
(f) QALD-KRLS (Quantization side); (g) SW-KRLS; (h) FB-KRLS;.

Descending order sorting indeed starts with training data sequences that are longer, meaning a larger number of cycles before
reaching the failure point. This approach effectively exposes algorithms to more extensive historical data right from the start, which
can include stages closer to failure towards the end of each sequence but not necessarily in the initial stages of training. As training
progresses with descending order sorting, algorithms are then exposed to shorter sequences. These shorter sequences, appearing
later in the training process, represent engines with fewer cycles to failure from the outset of their data. This transition from longer
to shorter sequences essentially means that, towards the later stages of training, the algorithms are operating with data that are
closer to the failure events in terms of operational cycles remaining, which explains the increased number of {early} predictions.
This approach contrasts with ascending order sorting, where algorithms start with shorter sequences, possibly making it harder

21
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 10. Boxplot distribution of final Scores across KAF algorithms. (a) Normalization; (b) Standardization;.

Fig. 11. Distribution of prediction timings for KAF algorithms. (a) Normalization; (b) Standardization;.

22
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 12. Training time vs. Score comparison for KAF algorithms. (a) Natural handling — Normalization; (b) Natural handling — Standardization; (c) Descending
— Normalization; (d) Descending — Standardization; (e) Ascending — Normalization; (f) Ascending — Standardization;.

initially to learn from more extended historical data since they begin with instances closer to failure, potentially making the {early}
detection of degradation signs more challenging (more {late} predictions in this case overall).
In sum, preprocessing operations – feature scaling and data sorting – are not merely data manipulation techniques but pivotal
factors that shape the very foundation of algorithmic performance in RUL prediction. They influence the algorithms’ prediction
timing, score variability, and in general their behavior. The insights from this study serve as a lessons learnt study regarding the
importance of an informed selection of preprocessing techniques to harness the full potential of KAF algorithms in RUL, while also
as an analysis to assess the performance of KAF algorithms with an increased difficulty level through data handling methods.

9.6. Training time vs. score analysis

This analysis aims to unravel the efficiency and efficacy trade-offs inherent in these algorithms, from the perspective of
training time (Fig. 12), under various preprocessing schemes (natural, descending, and ascending orders coupled with normalization
and standardization techniques). More specifically, ALD-KRLS showcases a balanced profile between accuracy and computational

23
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

demand across preprocessing configurations. This algorithm maintains robust performance, particularly in scenarios where data
normalization is applied, signifying its adeptness at handling standardized datasets with minimal compromise on training speed.
QKRLS demonstrates a noteworthy proficiency in speed, but this comes at the cost of a slight dip in predictive accuracy. This
trade-off is more pronounced under descending order preprocessing, suggesting that while QKRLS accelerates learning, it might
miss finer details in rapidly changing data sequences. QKLMS emerges as the most time-efficient among the assessed algorithms,
but with varying results. Its performance peaks under standardization procedures, indicating a preference for data with consistent
variance, enabling faster convergence without substantial loss in accuracy. SW-KRLS and FB-KRLS both exhibit a steady increase
in performance with increased training time, underscoring their suitability for applications where longer training periods are
permissible for achieving higher accuracy. Specifically, SW-KRLS appears to leverage the sequential nature of data more effectively
in descending order sorting, aligning with its inherent design. ANS-QKRLS presents an interesting dynamic where it excels in
environments with ascending order preprocessing, suggesting an intrinsic capability to adapt to evolving data trends, which could
be pivotal for real-time applications requiring timely updates. QALD-KRLS (ALD side) and QALD-KRLS (Quantization side) both
illustrate distinct advantages in terms of precision and training duration. The ALD aspect enhances adaptability, making it well-suited
for datasets exhibiting gradual degradation patterns, whereas the Quantization facet emphasizes speed, ideally servicing scenarios
with stringent time constraints.
The insights from this analysis underscore the balance between training time and prediction accuracy across different KAF
algorithms and preprocessing methods. The observed trends advocate for a tailored approach in selecting the appropriate KAF
algorithm based on the specific requirements of the task at hand, such as urgency (reflected in training time) versus precision
(mirrored in the Score). Furthermore, this examination aids in establishing a more informed framework for future research and
practical applications, especially in areas demanding quick and reliable predictions, like Remaining Useful Life (RUL) estimation.

9.7. Advantages and limitations of KAF algorithms in RUL prediction

An extensive evaluation study has been conducted regarding the predicting properties and learning capabilities of KAF-based
algorithms in a well-known RUL application. Seven candidates have been selected for intra-comparison purposes within KAF universe
in terms of performance, behavior, training time, smoothness and reliability. Also, the impact of different feature scaling scenarios
and diverse preprocessing methods has been assessed, presenting ANS-QKRLS as the most resilient approach.
Generally, online learning algorithms operate adapting their weights, in real-time, per training sample. This leads to requirements
regarding computational and memory resources, especially in large time series applications. While KRLS is a straightforward
extension of the RLS method, it faces scalability issues due to its computational and memory burden, which scale linearly with
the dataset size. To address these challenges, the ALD-KRLS variant enhances KRLS by introducing a sparsification mechanism that
limits the growth of the dictionary. Several research efforts led to sparsification, quantization, dictionary budget and combined
approaches, as presented in previous sections, overcoming scaling limitations.
For instance, quantized algorithms streamline parameter count and training duration at a slight accuracy cost. Budget-based
methods behave similarly, and have the advantage that they may be suited for tracking changes in the model underlying the
observed data. Notably, SW-KRLS has demonstrated strong performance in this context, though its reliance on a fixed dictionary
size increases its sensitivity to data preprocessing techniques. In general, ALD-KRLS outperforms quantized variants by optimizing
a square cost function directly on input data, leading to more judicious center selection. However, quantized versions still deliver
adequate outcomes with reduced training time. The hybrid models, QALD-KRLS and ANS-QKRLS, amalgamate ALD-KRLS’s efficient
center selection with the reduced parameterization of quantized methods, offering an improved balance between computational
efficiency and performance. A summary table (Table 2) and the following discussion outline the advantages, limitations, and
situational suitability of each KAF algorithm:

• ALD-KRLS is recognized for its stability and adaptability across diverse operational conditions, effectively capturing degra-
dation patterns in RUL prediction tasks. Despite its robustness, ALD-KRLS may not always achieve the highest performance
levels compared to other algorithms and requires more computational resources, which could be restrictive in time-sensitive or
resource-limited settings. This algorithm is ideally suited for scenarios where consistent and reliable performance is prioritized
over computational efficiency, particularly in complex environments where accurate degradation tracking is crucial.
• QKRLS offers good approximation abilities with reduced computational complexity through its quantization feature, which
controls dictionary size growth. However, quantization can lead to a loss of information, impacting the precision and reliability
of predictions. While QKRLS provides some computational advantages, its operational complexity is higher than that of QKLMS,
as its computational demands scale quadratically with the dictionary size. This characteristic makes it less ideal for extremely
resource-limited applications. QKRLS is suitable for applications that can handle its computational requirements.
• QKLMS excels in computational efficiency and simplicity of implementation, characterized by rapid update rules and no
need for matrix inversion, making it highly scalable and suitable for non-stationary environments and incremental learning.
However, it tends to exhibit lower prediction accuracy and is more sensitive to outliers compared to more complex KAF models.
Furthermore, its quantization process can compromise data integrity, especially in noisy settings. QKLMS is most effective in
large-scale systems requiring swift processing and real-time updates, where its operational speed and efficiency are crucial
advantages.

24
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table 2
Summary of advantages, limitations, and contextual fit for each KAF algorithm.
Algorithm Advantages Limitations Contextual Fit
ALD-KRLS Robustness, adaptability Computational demand Complex environments needing
reliability
QKRLS Good approximation, low computational Higher computational demand than Suited for systems that can manage
complexity QKLMS, potential information loss higher computational needs
QKLMS Computational efficiency, scalability May produce lower accuracy, outlier Effective in large-scale systems requiring
sensitivity fast processing
SW-KRLS Adapts quickly to changes, prioritizes May obscure long-term trends, less Best for environments needing
recent data historical precision immediate responsiveness with limited
memory
FB-KRLS Stable computational demands, Limited flexibility in adapting to new Ideal for stable environments where
optimized vector selection data trends predictability is crucial
ANS-QKRLS Robustness, balance between accuracy Requires careful tuning Varied operational conditions, complex
and computational efficiency degradation patterns
QALD-KRLS Robustness, computational efficiency Tuning complexity, potential Dynamic scenarios, Efficiency-required
information loss scenarios

• SW-KRLS effectively adapts to rapid changes by using a sliding-window mechanism that prioritizes recent data, making it
ideal for environments with time-varying data models. This focus on new information, however, can reduce consistency and
obscure long-term trends, potentially compromising overall accuracy. SW-KRLS is best suited for applications with limited
memory where immediate responsiveness is more critical than historical precision.
• FB-KRLS maintains effective dictionary management with a fixed size, ensuring stable computational demands and optimizing
vector selection over time, unlike the potentially still unbounded dictionary growth in ALD-KRLS. Its fixed-budget nature,
however, may limit flexibility in adapting to new and diverse data trends. FB-KRLS is best suited for stable environments
where computational predictability and maintaining performance with known overheads are crucial, without the need to
handle highly dynamic data shifts.
• ANS-QKRLS balances computational efficiency with prediction accuracy and is equipped with sparsification and quantization
techniques suitable for time-varying environments. It maintains stable performance across various network sizes and adapts
effectively to new data, making it practical for RUL prediction tasks. The algorithm keeps a manageable dictionary size,
optimizing efficiency without significantly impacting accuracy. However, ANS-QKRLS requires careful tuning of its adaptive
techniques to maintain performance. It is well-suited for environments with changing operational conditions and complex
degradation patterns, where stability and adaptability are necessary.
• QALD-KRLS combines quantization and ALD to balance efficiency with accuracy, suitable for dynamic environments with
variable data. However, tuning complexity and potential information loss from quantization may impact precision. This model
is ideal for large-scale systems where adaptability must align with limited computational resources.

10. Comparisons with other approaches and discussion: inter-comparison analysis

An inter-comparison analysis is needed to rank KAF performance with other machine learning (ML) and deep learning (DL)
approaches reported in literature. For this reason, Table 3 summarizes KAF-based and other, mostly network-based approaches,
providing a comprehensive performance report in terms of RMSE and Score evaluation metrics. It is evident from Table 3, that
KAFs outperform more than half of the neural network models in terms of Score. The average Score of the 12 KAF-based approaches
presented in Table 3 is around 579. Especially, ANS-QKRLS algorithm is ranked as 21st among 58 neural network approaches in
terms of Score metric. It should be underlined that in this work, a plain window size of just 1 is used (no time window), while
most approaches in the literature reported in Table 3 use a sliding window with length equal to 30. Moreover, Fig. 13 illustrates a
visualization mapping of different methods applied in RUL problem across Score metric. Note that the training time is not included
in Table 3 due to the different simulation equipment adopted in each literature reported case. Moreover, most approaches reported
in literature do not provide information regarding training time. Indicatively, the best performed KAF algorithm (ANS-QKRLS with
network size 160) used 34.53 s for training utilizing a conventional laptop with AMD Ryzen 9 4900HS and 16 GB RAM. At the same
time, AGCNN [66], one of the most dominant models in terms of Score, used 475.47 s for training, in the same C-MAPSS sub-dataset
(FD001), utilizing a much better desktop simulation machine with Intel Xeon W-2155 CPU and 64 GB RAM. Indeed, the performance
difference between these two models exists, while also complexity resulting in training time gap difference. Other approaches adopt
more exhaustive training mechanisms that apply genetic algorithm to tune the diverse amount of hyper-parameters resulting in 60 h
of training time [67]. KAF architectures follow a simple RBF network topology, while also they are suitable for online applications.
Compared with deep learning, kernel-based methods do not utilize a large number of weight parameters. Their trainable
parameters are mainly related with the dictionary length 𝑚, i.e., the network size reported in Table 3. Note that, in the context
of KAFs the dictionary size means the number of kernel centers or basis functions that are actively used in the model. In RBF-
like networks, each of these centers corresponds to the center of an RBF kernel. At the same time, the coefficients or weights are

25
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table 3
Comparison of best performed KAF-based algorithms with other approaches reported in the literature regarding FD001 dataset.
KAF-based approaches
Algorithm Scaling mode Data handling Network size TW RMSE Score
SW-KRLS Normalization Unsorted-Natural 1000 1 17.83 545.82
FB-KRLS Normalization Unsorted-Natural 450 1 19.14 622.99
ANS-QKRLS Normalization Unsorted-Natural 1140 1 18.62 627.55
ALD-KRLS Normalization Sorted-Descending 341 1 17.18 571.81
ANS-QKRLS Normalization Sorted-Ascending 780 1 18.36 537.79
QALD-KRLS Normalization Sorted-Descending 341 1 17.18 571.72
ANS-QKRLS Standardization Unsorted-Natural 160 1 17.58 468.58
QKRLS Standardization Sorted-Descending 22 1 19.02 601.22
QKLMS Standardization Sorted-Descending 22 1 19.28 556.1
QKRLS Standardization Sorted-Ascending 23 1 18.95 633.32
ANS-QKRLS Standardization Sorted-Ascending 172 1 18.22 626.45
QALD-KRLS Standardization Sorted-Descending 22 1 18.23 586.5
Machine learning and Deep learning approaches reported in literature
Algorithm TW RMSE Score
SVM [68] 30 40.72 7703.33
Echo State Network with Kalman Filter [69] – 63.46 –
DW-RNN [70] 20 22.52 –
ESN-FCN [71] 1 21.67 3555
SVR [72] 1 21.74 2394
LR [72] 1 23.45 2200
MTL-RNN [70] 20 21.47 –
ETR [68] 30 23.76 1667.86
RVR [57] 1 23.79 1502.9
SVR [57] 1 20.96 1381.5
CNN [72] – 19.7 1372
ETR [68] 1 22.05 1359.38
First CNN attempt [57] 15 18.45 1286.7
AE-FCN [71] 1 19.28 1014
DBN [68] 1 18.48 1001.44
LSTM [72] – 18.98 983
MLP [68] 1 18.48 959.63
LASSO [68] 1 22.43 894.21
SVM [68] 1 20.58 852.07
CNN+RNN [73] 31 16.89 820.67
Random Forest [68] 1 20.23 802.23
Multi-Stage-RUL GB [72] – 17.92 772
Multi-Stage-RUL SVM [72] – 17.12 765
SKF [68] 1 19.24 762.85
Multi-Stage-RUL LSTM [72] – 17.26 748
Extreme Learning Machine [68] 1 19.40 740.52
Multi-Stage-RUL CNN [72] – 16.89 732
KNR [68] 30 20.46 729.32
CNN-FCN [71] 1 16.35 706
LASSO [68] 30 19.74 653.85
MODBNE [68] 1 17.96 640.27
KNR [68] 1 19.73 604.26
GB [68] 1 18.80 575.04
MLP [68] 30 16.78 560.59
Extreme Learning Machine [68] 30 17.27 523
LSTMBS [74] 31 14.89 481.1
Random Forest [68] 30 17.91 479.75
GB [68] 30 15.67 474.01
DBN [68] 30 15.21 417.59
2-layer LSTM [75] 31 16.74 388.68
Deep LSTM [76] – 16.14 338
Trend attention Fully Convolutional Network [77] 31 13.99 336
MODBNE [68] 30 15.04 334.23
Attention-Based LSTM [78] 30 14.53 322.44
BiLSTM [79] 30 13.65 295
CNN-LSTM [80] 32 14.40 290
GHDR-FL [81] 30 11.58 281.65
CapsNet [82] 30 12.58 276.34
(continued on next page)

26
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table 3 (continued).
DCNN [58] 30 12.61 273.7
GHDR+LSTM+FC [81] 30 11.45 268.72
LSTM-MLSA [83] – 11.57 252.86
Embedded Attention-based Parallel Network-ResLSTMa [84] 40 12.11 245.32
HDNN [60] 30 13.02 245
LSTM-RBM [67] – 12.56 231
Distributed Attention-Based TCN [59] 40 11.78 229.48
AGCNN [66] 30 12.42 225.51
HAGCN [85] 30 11.93 222.3
BiGRU-TSAM [86] 30 12.56 213.35

Fig. 13. Visualization of Table 3 in terms of Score in logarithmic scale. KAF family ML and DL approaches.

Table 4
Total number of trainable parameters for a sub-set of DL approaches reported in Table 3.
Neural Network approaches reported in literature Trainable weights
First CNN attempt [57] ≈15,000
CNN-FCN [71] ≈566,000
LSTMBS [74] ≈28,000
2-layer LSTM [75] ≈20,000
Deep LSTM [76] ≈55,000
BiLSTM [79] ≈75,000
CNN-LSTM [80] ≈60,000
GHDR-FL [81] ≈1,205,000
DCNN [58] ≈45,000
GHDR+LSTM+FC [81] ≈1,208,000
LSTM-MLSA [83] ≈ 42,000
Embedded Attention-based Parallel Network-ResLSTMa [84] ≈ 90,000
HDNN [60] ≈51,000
LSTM-RBM [67] ≈67,000
Distributed Attention-Based TCN [59] ≈105,000
AGCNN [66] ≈17,000
BiGRU-TSAM [86] ≈2,825,000

multiplied by the kernel functions’ outputs before being summed to form the final output of the KAF. There is typically one weight
per kernel function, so while ‘‘dictionary size’’ technically refers to the number of kernels, it also dictates the number of weights
since each kernel will have one associated weight. In deep neural network implementations, the trainable parameters typically
range from thousands to even millions of weights for prediction purposes. For this reason, we illustrate indicatively in Table 4 the
number of trainable parameters for a set of neural network-based approaches applied in FD001 set of C-MAPSS. Also, a qualitative
comparison in terms of the number of trainable weights is depicted in Fig. 14, in logarithmic scale, serving as a visualized form of
Table 4. Note that, if Fig. 14 were in linear scale with points being proportional to the size of trainable weights, then a few of the
orange rectangles would be larger than the page itself. Indeed, the number of trainable weights in KAF approaches is significantly
lower. In KAF-related algorithms, higher training throughput can be obtained regarding online learning problems due to the lower
number of trainable weights.

27
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Fig. 14. Qualitative representation of total number of trainable parameters versus Score in logarithmic scale. KAF family DL approaches.

It is important for future KAF implementations to exploit more weights in providing an enhanced performance, as a consequence
of a more balanced trade-off between the competing objectives of computational burden and performance. Before examining the
potential future directions for KAF architectures, it is pertinent to highlight their key advantages over neural network approaches,
which are fundamental to understanding their role in prognostic engineering. The merits of KAF in comparison to neural networks
are encapsulated in several key aspects:

1. Computational Simplicity: KAFs typically require fewer parameters, simplifying model complexity and computational de-
mands.
2. Adaptability in Online Learning: KAFs are inherently suited for online learning environments, providing efficient real-time
data processing and model adaptability.
3. Efficient Nonlinear Modeling: The kernel trick in KAFs facilitates effective handling of nonlinear relationships without the
need for deep or complex architectures.
4. Reduced Overfitting Risks: The lower number of parameters in KAFs can lead to less overfitting, particularly in limited data
scenarios.
5. Efficient Use of Training Data: KAFs can often generalize effectively from smaller datasets, an advantage in cases where data
collection is challenging or limited.
6. Rapid Training and Robustness: KAFs offer quicker training times and robustness to data variability, crucial in non-stationary
environments.

These advantages not only underscore the potential of KAFs in applications where computational efficiency and adaptability are
crucial but also open exciting avenues for future research.
However, while the comparative efficiency and simplicity of KAF architectures are apparent, it is critical to identify inherent
challenges and limitations when juxtaposed against more complex neural network models. The simplicity of KAFs, primarily
reflected through their straightforward RBF network topology and lower number of trainable weights, while advantageous for
online applications and computational demands, may also constrain the depth of data representation and the produced performance
compared to deep learning approaches. This limitation is particularly pronounced in scenarios requiring the capture of detailed,
multi-dimensional relationships within large datasets, where neural networks, with their extensive trainable parameters, can produce
higher performance results. While KAFs can scale to a degree by increasing their dictionary size, this scalability does not fully
bridge the substantial gap in model complexity when compared to neural networks. Essentially, even as KAFs grow in size and
capability, they may still fall short of achieving the same level of detailed data representation and processing power that is inherent
to neural network models, particularly those with deep architectures and vast numbers of trainable parameters. Furthermore, the
training mechanisms of KAFs, though generally less resource-intensive, might not afford the same level of optimization granularity
provided by the extensive hyperparameter tuning and deep architectural configurations characteristic of advanced neural network
models. Therefore, the merits of KAF models in conjunction with the decent produced performance may be further exploited in
hybrid implementations with neural networks and the exploration of deep-learning-inspired hierarchical KAF structures arises.
Such approaches could leverage the strengths of both KAFs and neural networks, potentially leading to more sophisticated and
efficient prognostic tools. For instance, the conceptual idea of reorganizing the typical single-layer kernel-based model into a deep
hierarchical structure has been implemented in [87] proposing deep kernel recursive least squares for two- and three-dimensional
problems. In summary, the evaluation results pave the way for establishing KAF-based implementations in prognostic engineering
applications, promoting an intensified interest in formulating KAF algorithms in hybrid structures or exploring deep learning-inspired
KAF architectures. These developments could enhance the performance of KAFs while maintaining their advantageous trade-off
between computational efficiency and prediction accuracy, potentially leading to more dominant models in the prognostics area.

28
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

11. Conclusion

This study has presented a comprehensive evaluation of Kernel Adaptive Filtering (KAF) algorithms in the context of Remaining
Useful Life (RUL) prediction for aircraft engines, juxtaposed with an extensive range of neural network approaches, encompassing
around 60 different models. The experiments are performed on the well-known, and publicly available, C-MAPSS dataset. Our
findings reveal that KAF algorithms outperform more than half of the neural network models reported in the literature, with
ANS-QKRLS outperforming two-thirds of those models in terms of Score metric. However, it is crucial to acknowledge the
inherent limitations associated with KAFs, particularly when faced with the requirement to capture complex, multidimensional data
relationships, a domain where deep learning models often exhibit superior performance due to their extensive parameter sets and
deep architectures. Although, KAF architectures may not surpass the most advanced neural networks in performance metrics, they
demonstrate decent prognostic accuracy that is coupled with important merits in terms of computational efficiency and training time.
The evaluation of seven KAF variants highlighted the resilience and adaptability of these algorithms, with ANS-QKRLS emerging as a
notably robust approach within the KAF family. The study’s comparative analysis underscores the trade-offs between computational
burden and predictive accuracy, showcasing KAFs as a viable option in applications where model simplicity and rapid training are
advantageous.
Our research enriches the field with a detailed analysis of the operational spectrum of KAF algorithms, shedding light on
their efficacy and applicability. The inherent characteristic of KAFs, having fewer trainable parameters, in conjunction with
their commendable performance, renders them particularly suitable for prediction contexts where computational resources are
constrained. Looking ahead, the results from this study pave the way for future research into hybrid models that blend the strengths
of KAFs with the depth and complexity of neural networks. Such explorations could potentially lead to the development of more
sophisticated prognostic tools, enhancing performance while maintaining computational efficiency. The possibility of deep-learning-
inspired hierarchical KAF structures presents an intriguing avenue for further investigation, promising advancements in the field
of prognostic engineering. In conclusion, this study not only benchmarks the current state of KAF architectures in RUL prediction
but also opens up new horizons for their application in prognostic engineering, advocating for their consideration in hybrid and
advanced algorithmic structures. The balance between computational efficiency and prediction accuracy that KAFs offer is likely to
make them an increasingly relevant choice in the evolving landscape of machine learning applications in mechanical systems.

CRediT authorship contribution statement

Georgios D. Karatzinis: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology,
Formal analysis, Conceptualization. Yiannis S. Boutalis: Writing – review & editing, Supervision. Steven Van Vaerenbergh:
Writing – review & editing, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.

Data availability

The data that has been used is confidential.

Appendix. Comparative results of feature scaling techniques and engine data ordering by sequence length across various
KAF algorithms

Tables A.5–A.8 summarize the performance of the evaluated KAF algorithms under different feature scaling and data sorting
strategies. The first pair of tables, Tables A.5 and A.6, provides the results for the normalization scaling case, while Tables A.7 and
A.8 are dedicated to the standardization scaling case. From the data sorting perspective, Tables A.5 and A.7 give the results for
natural handling (unsorted data) of training engine data, while Tables A.6 and A.8 concern sorted engine data (in descending and
ascending order).

29
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table A.5
Performance comparison of the KAF-based algorithms using normalization scaling method and natural handling of engine order in the training phase.
Algorithm ℎ 𝜆 𝜈 𝜂 𝜀U 𝜇0 𝜏 𝑚 RMSE Score Early - On time - Late Training time (s)
ALD-KRLS 1 0.01 0.5 – – – – 73 18.56 1080 13-59-28 2.29
1 0.01 0.4 – – – – 122 17.32 979.89 10-65-25 4.46
1 0.01 0.3 – – – – 187 17.36 1090 10-65-25 9.36
1 0.01 0.2 – – – – 326 17.73 1178 9-65-26 33.79
1 0.01 0.1 – – – – 776 17.6 1052 8-63-29 304.78
1 0.01 0.08 – – – – 993 17.34 910.7 10-62-28 616.84
QKRLS 1 0.01 – – 2 – – 21 20.89 1154 18-38-44 0.64
1 0.01 – – 1 – – 129 21.08 1742 8-47-45 4.75
1 0.01 – – 0.8 – – 262 18.78 985.43 9-53-38 14.53
1 0.01 – – 0.6 – – 660 17.62 779.83 8-62-30 165.6
1 0.01 – – 0.5 – – 1127 17.43 894.3 10-60-30 672.35
QKLMS 1 – – 0.01 2 – – 21 19.85 803.67 28-53-19 0.35
1 – – 0.01 1 – – 129 19.81 884.27 30-53-17 0.84
1 – – 0.01 0.8 – – 262 18.86 811.3 29-53-18 1.19
1 – – 0.01 0.6 – – 660 18.6 758.19 25-57-18 2.1
1 – – 0.01 0.5 – – 1127 18.59 807.77 26-58-16 2.87
1 – – 0.01 0.45 – – 1576 18.58 744.9 25-58-17 3.63
SW-KRLS 1 0.1 – – – – – 100 31.49 8693 47-37-16 2.44
1 0.1 – – – – – 200 22.64 1354 25-41-34 6.6
1 0.1 – – – – – 600 23.88 1308 42-44-14 90.3
1 0.1 – – – – – 770 23.06 1125 43-43-14 144.5
1 0.1 – – – – – 1000 17.83 545.82 23-54-23 242.8
1 0.1 – – – – – 1500 17.02 930.19 13-58-29 525.1
FB-KRLS 1 0.1 – 0.00001 – – – 100 21.79 1737 31-43-26 3.24
1 0.1 – 0.00001 – – – 200 22.34 1420 30-42-28 7.33
1 0.1 – 0.00001 – – – 450 19.14 622.99 37-47-16 55.63
1 0.1 – 0.00001 – – – 600 20.51 644.22 41-45-14 95.8
1 0.1 – 0.00001 – – – 1000 21.71 660.09 49-41-10 252.5
1 0.1 – 0.00001 – – – 1500 23.5 765.83 55-37-8 536
ANS-QKRLS 1 0 0.4 – 0.05 0.95 0.00001 115 17.71 753.62 17-61-22 17.94
1 0 0.2 – 0.05 0.95 0.00001 293 18.25 890.71 19-60-21 65.83
1 0 0.1 – 0.05 0.95 0.00001 624 18.62 629.68 25-51-24 253.92
1 0 0.08 – 0.05 0.95 0.00001 770 18.9 634.4 28-53-19 400.88
1 0 0.065 – 0.05 0.95 0.00001 926 18.78 630.38 26-55-19 628.33
1 0 0.05 – 0.05 0.95 0.00001 1140 18.98 627.55 31-52-17 1046.1
QALD-KRLS 1 0.01 0.4 – 0.1 – – 122 17.32 979.89 10-65-25 5.41
1 0.01 0.2 – 0.1 – – 326 17.73 1178 9-65-26 35.73
1 0.01 0.1 – 0.1 – – 776 17.6 1052 8-63-29 309.77
1 0.01 0.04 – 2 – – 21 19.86 1168 17-55-28 0.95
1 0.01 0.04 – 1 – – 129 19.46 1518 10-60-30 4.82
1 0.01 0.04 – 0.8 – – 262 18.5 1047 11-59-30 16.69
1 0.01 0.04 – 0.6 – – 659 18.1 1487 12-57-31 159.87
1 0.01 0.04 – 0.5 – – 1107 18.12 901.83 13-54-33 642.37

30
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table A.6
Performance comparison of the KAF-based algorithms using normalization scaling method and sorted training examples by their sequence length in a descending
(⋅) and ascending order [⋅].
Algorithm ℎ 𝜆 𝜈 𝜂 𝜀U 𝜇0 𝜏 𝑚 RMSE Score Early - On time - Late Training time (s)

ALD-KRLS 1 0.01 0.5 – – – – (75) [77] (18.39) [18.5] (1000) [1099.2] (17-59-24) [16-55-29] (1.96) [2.06]
1 0.01 0.4 – – – – (113) [123] (17.36) [18.16] (707.95) [1131.4] (19-56-25) [14-59-27] (3.67) [4.22]
1 0.01 0.3 – – – – (189) [194] (17.49) [18.45] (900.57) [1284.5] (13-62-25) [11-56-33] (8.59) [9.17]
1 0.01 0.2 – – – – (341) [343] (17.18) [18.02] (571.81) [963.67] (16-60-24) [7-58-35] (33.42) [32.9]
1 0.01 0.1 – – – – (793) [786] (17.38) [18.41] (792.51) [1461] (16-60-24) [10-56-34] (317.5) [294.75]
1 0.01 0.08 – – – – (1023) [1021] (17.47) [18.61] (729.24) [1552.5] (13-64-23) [11-57-32] (631.13) [588.53]
QKRLS 1 0.01 – – 2 – – (18) [20] (23.75) [23.13] (1483.5) [1325.4] (33-25-42) [24-43-33] (0.58) [0.54]
1 0.01 – – 1 – – (128) [140] (21.61) [21.42] (2898.2) [2450.4] (7-49-44) [11-46-43] (3.99) [4.45]
1 0.01 – – 0.8 – – (288) [271] (19.74) [19.69] (1707.6) [1670.3] (9-49-42) [8-56-36] (16.52) [15.44]
1 0.01 – – 0.6 – – (655) [656] (17.94) [17.94] (1025.7) [878.1] (10-60-30) [11-61-28] (156.08) [146.38]
1 0.01 – – 0.5 – – (1131) [1159] (17.33) [17.26] (787.32) [747.23] (10-62-28) [11-63-26] (663.86) [683.77]
QKLMS 1 – – 0.01 2 – – (18) [20] (22.24) [26] (755.99) [5120.3] (41-42-17) [11-33-56] (0.31) [0.31]
1 – – 0.01 1 – – (128) [140] (20.66) [24.94] (744.81) [4405.7] (39-46-15) [8-35-57] (0.84) [0.86]
1 – – 0.01 0.8 – – (288) [271] (20.67) [24.84] (737.21) [4951.8] (37-51-12) [5-38-57] (1.2) [1.22]
1 – – 0.01 0.6 – – (655) [656] (20.41) [25.06] (707.91) [5257.9] (39-47-14) [5-39-56] (2.02) [2]
1 – – 0.01 0.5 – – (1131) [1159] (20.54) [25.02] (725.23) [5479.2] (40-48-12) [4-40-56] (2.87) [2.95]
1 – – 0.01 0.45 – – (1564) [1560] (20.37) [25.1] (699.16) [5681.7] (39-50-11) [4-40-56] (3.54) [3.75]
SW-KRLS 1 0.1 – – – – – 100 (28.46) [33.81] (8962.8) [12996] (44-40-16) [40-27-33] (2.49) [2.46]
1 0.1 – – – – – 200 (27.32) [30.25] (7100.4) [12572] (21-35-44) [10-36-54] (5.76) [5.64]
1 0.1 – – – – – 600 (26.16) [24.74] (1299.9) [2998.7] (48-40-12) [4-44-52] (90.04) [89.58]
1 0.1 – – – – – 770 (26.81) [24.28] (1421) [2827.5] (51-37-12) [3-42-55] (145.78) [144.38]
1 0.1 – – – – – 1000 (24.86) [23.81] (1270) [2557] (50-36-14) [3-43-54] (258.62) [244.66]
1 0.1 – – – – – 1500 (25.48) [22.98] (1437.6) [2029.4] (52-38-10) [14-45-41] (533.57) [524.07]
FB-KRLS 1 0.1 – 0.00001 – – – 100 (23.6) [22.45] (1372.4) [2643.6] (42-38-20) [13-35-52] (3.2) [3.12]
1 0.1 – 0.00001 – – – 200 (23.87) [23.5] (868.4) [4935.9] (55-30-15) [11-44-45] (6.31) [6.55]
1 0.1 – 0.00001 – – – 450 (24.75) [23.77] (870.54) [4930.7] (58-30-12) [10-35-55] (53.36) [53.78]
1 0.1 – 0.00001 – – – 600 (25.72) [20.91] (1051.7) [2772.4] (56-35-9) [11-46-43] (92.89) [92.51]
1 0.1 – 0.00001 – – – 1000 (29.36) [21.43] (1391.3) [2510.8] (70-24-6) [11-44-45] (249.77) [257.69]
1 0.1 – 0.00001 – – – 1500 (31.87) [20.03] (1825.2) [1128.3] (74-22-4) [22-42-36] (534.13) [543.08]
ANS-QKRLS 1 0 0.4 – 0.05 0.95 0.00001 (108) [111] (20.21) [17.85] (2212.9) [647.52] (7-55-38) [20-60-20] (16.65) [16.9]
1 0 0.2 – 0.05 0.95 0.00001 (300) [298] (20.67) [17.66] (1856.2) [608.82] (6-50-44) [20-58-22] (62.29) [60.43]
1 0 0.1 – 0.05 0.95 0.00001 (642) [624] (22.44) [18.55] (2800.5) [685.86] (5-44-51) [23-59-18] (265.29) [243.22]
1 0 0.08 – 0.05 0.95 0.00001 (782) [780] (23.01) [18.36] (3018.5) [537.79] (6-39-55) [26-55-19] (405.46) [388.7]
1 0 0.065 – 0.05 0.95 0.00001 (945) [940] (22.93) [19.6] (2531.4) [776.55] (5-42-53) [25-55-20] (640.65) [602.82]
1 0 0.05 – 0.05 0.95 0.00001 (1159) [1162] (23.52) [20.24] (2902.7) [814.92] (5-42-53) [28-56-16] (1051.03) [999.44]
QALD-KRLS 1 0.01 0.4 – 0.1 – – (113) [123] (17.36) [18.16] (707.99) [1131.6] (19-56-25) [14-59-27] (4.3) [4.98]
1 0.01 0.2 – 0.1 – – (341) [343] (17.18) [18.02] (571.72) [963.59] (16-60-24) [7-58-35] (36.33) [34.92]
1 0.01 0.1 – 0.1 – – (793) [786] (17.39) [18.4] (791.24) [1461.2] (16-60-24) [10-56-34] (320.17) [298.88]
1 0.01 0.04 – 2 – – (18) [20] (21.2) [20.54] (1072.1) [1371.7] (23-49-28) [19-56-25] (0.79) [0.73]
1 0.01 0.04 – 1 – – (128) [140] (20.62) [21.84] (2284.9) [7863.3] (19-55-26) [11-54-35] (4.67) [5.02]
1 0.01 0.04 – 0.8 – – (288) [271] (19.46) [20.16] (1558.6) [2233.9] (13-59-28) [13-56-31] (20.06) [16.84]
1 0.01 0.04 – 0.6 – – (655) [656] (19.43) [19.1] (1105.1) [1369] (16-55-29) [14-58-28] (151.69) [139.6]
1 0.01 0.04 – 0.5 – – (1111) [1126] (18.2) [18.74] (957.95) [977.42] (14-62-24) [14-55-31] (618.73) [623.46]

31
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table A.7
Performance comparison of the KAF-based algorithms using standardization scaling method and natural handling of engine order in the training phase.
Algorithm ℎ 𝜆 𝜈 𝜂 𝜀U 𝜇0 𝜏 𝑚 RMSE Score Early - On time - Late Training time (s)
ALD-KRLS 7 0.01 0.1 – – – – 77 18.3 1145.2 13-61-26 2.41
7 0.01 0.07 – – – – 120 17.92 1132.4 9-64-27 4.46
7 0.01 0.05 – – – – 180 17.68 1092.9 10-65-25 9.3
7 0.01 0.03 – – – – 355 17.88 1146.6 8-63-29 52.26
7 0.01 0.02 – – – – 736 17.88 1177.3 10-62-28 306.84
7 0.01 0.017 – – – – 1052 17.79 1121 9-65-26 767.9
QKRLS 7 0.01 – – 25 – – 21 19.5 861.21 19-42-39 1.1
7 0.01 – – 13 – – 127 17.83 719.08 12-52-36 5.95
7 0.01 – – 10 – – 307 17.77 948.69 9-65-26 25
7 0.01 – – 8 – – 635 17.64 917.81 10-59-31 146.35
7 0.01 – – 7 – – 966 17.64 861.9 10-66-24 468.57
7 0.01 – – 6.8 – – 1059 17.51 778.89 11-66-23 574.52
7 0.01 – – 6.5 – – 1190 17.1 677.49 9-67-24 778.3
QKLMS 7 – – 0.01 25 – – 21 19.24 779.78 17-59-24 0.31
7 – – 0.01 13 – – 127 18.4 758.38 25-55-20 0.79
7 – – 0.01 10 – – 307 18.52 794.75 22-58-20 1.31
7 – – 0.01 8 – – 635 18.3 772.9 23-57-20 1.92
7 – – 0.01 7 – – 966 18.27 772.3 22-58-20 2.6
7 – – 0.01 6.8 – – 1059 18.28 756.06 23-58-19 2.78
7 – – 0.01 6.5 – – 1190 18.28 751.04 23-58-19 3
7 – – 0.01 6 – – 1504 18.33 795.72 23-57-20 3.65
SW-KRLS 7 0.1 – – – – – 100 24.38 993.47 45-31-24 4.07
7 0.1 – – – – – 200 23.2 1706 20-43-37 8.77
7 0.1 – – – – – 600 21.79 818.6 39-48-13 94.8
7 0.1 – – – – – 770 21.26 736.8 37-49-14 151
7 0.1 – – – – – 1000 17.85 685.44 22-59-19 248.09
7 0.1 – – – – – 1500 17.51 949.43 10-64-26 540.8
FB-KRLS 7 0.1 – 0.00001 – – – 100 20.11 1500.2 20-42-38 5.35
7 0.1 – 0.00001 – – – 200 20.3 1133.4 28-38-34 9.81
7 0.1 – 0.00001 – – – 450 22.16 756.05 43-43-14 56.37
7 0.1 – 0.00001 – – – 600 24.22 846.5 48-38-14 98.3
7 0.1 – 0.00001 – – – 1000 28.39 1268.3 65-28-7 257.6
7 0.1 – 0.00001 – – – 1500 35.91 2713.6 86-11-3 568.84
ANS-QKRLS 7 0 0.05 – 0.005 0.99 0.00001 114 18.19 562.17 27-56-17 23.91
7 0 0.03 – 0.005 0.99 0.00001 160 17.58 468.58 24-57-19 34.53
7 0 0.004 – 0.005 0.99 0.00001 598 20.59 644.7 35-46-19 283.77
7 0 0.003 – 0.005 0.99 0.00001 700 20.91 664.74 37-45-18 398.71
7 0 0.0025 – 0.005 0.99 0.00001 773 20.73 673.43 39-41-20 472
7 0 0.0015 – 0.005 0.99 0.00001 999 20.65 652.37 37-45-18 828.46
QALD-KRLS 7 0.01 0.07 – 0.1 – – 120 17.92 1132.4 9-64-27 5.32
7 0.01 0.03 – 0.1 – – 355 17.88 1146.6 8-63-29 47.81
7 0.01 0.02 – 0.1 – – 736 17.88 1177.3 10-62-28 296.87
7 0.01 0.017 – 0.1 – – 1058 17.79 1121 9-65-26 764.5
7 0.01 0.01 – 25 – – 21 18.35 828.42 20-53-27 0.8
7 0.01 0.01 – 13 – – 127 19.32 1533.6 12-54-34 4.51
7 0.01 0.01 – 8 – – 635 19.22 1754 11-57-32 141.47
7 0.01 0.01 – 6.8 – – 1059 17.73 1111.3 11-62-27 559.62

32
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

Table A.8
Performance comparison of the KAF-based algorithms using standardization scaling method and sorted training examples by their sequence length in a descending
(⋅) and ascending order [⋅].
Algorithm ℎ 𝜆 𝜈 𝜂 𝜀U 𝜇0 𝜏 𝑚 RMSE Score Early - On time - Late Training time (s)

ALD-KRLS 7 0.01 0.1 – – – – (76) [79] (18.37) [19.16] (1321.6) [1395.5] (13-63-24) [9-55-36] (2.26) [2.21]
7 0.01 0.07 – – – – (116) [122] (17.97) [19.08] (1018.9) [1212.7] (11-61-28) [13-54-33] (4.13) [4.12]
7 0.01 0.05 – – – – (181) [186] (17.28) [17.28] (691.07) [1259.1] (14-63-23) [8-58-34] (8.33) [8.52]
7 0.01 0.03 – – – – (369) [370] (17.07) [18.3] (677.58) [1145] (14-66-20) [7-61-32] (48.26) [45.13]
7 0.01 0.02 – – – – (777) [758] (17.19) [18.12] (663.46) [1205] (14-63-23) [8-62-30] (329.57) [294.38]
7 0.01 0.017 – – – – (1110) [1083] (17.39) [18.15] (710.83) [1160.5] (15-63-22) [8-60-32] (842.03) [742.84]
QKRLS 7 0.01 – – 25 – – (22) [23] (19.02) [18.95] (601.22) [633.32] (23-43-34) [25-42-33] (0.59) [0.59]
7 0.01 – – 13 – – (128) [131] (18.08) [18.46] (1017.2) [950] (11-61-28) [10-57-33] (4.09) [4.16]
7 0.01 – – 10 – – (297) [302] (18.95) [18.25] (1341.1) [901.95] (13-54-33) [12-56-32] (19.62) [19.38]
7 0.01 – – 8 – – (621) [623] (17.69) [18.01] (890.87) [880.09] (11-61-28) [12-57-31] (136.74) [135.05]
7 0.01 – – 7 – – (956) [943] (17.88) [17.71] (831.6) [801.52] (14-59-27) [11-57-32] (425.66) [393.35]
7 0.01 – – 6.8 – – (1040) [1038] (17.98) [18.27] (944.04) [896.04] (13-58-29) [9-63-28] (548.68) [520.42]
7 0.01 – – 6.5 – – (1203) [1197] (17.6) [17.94] (822.87) [952.84] (11-65-24) [11-61-28] (806.08) [722.42]
QKLMS 7 – – 0.01 25 – – (22) [23] (19.28) [25.36] (556.1) [3098.4] (31-52-17) [3-38-59] (0.3) [0.3]
7 – – 0.01 13 – – (128) [131] (20.25) [24.77] (686.23) [3737] (37-48-15) [6-34-60] (0.81) [0.8]
7 – – 0.01 10 – – (297) [302] (20.16) [24.57] (644.3) [3705] (36-49-15) [6-33-61] (1.25) [1.27]
7 – – 0.01 8 – – (621) [623] (20.31) [24.8] (664.55) [3825.3] (39-46-15) [6-32-62] (1.9) [1.99]
7 – – 0.01 7 – – (956) [943] (20.39) [24.78] (674.41) [3751.7] (38-48-14) [6-31-63] (2.5) [2.58]
7 – – 0.01 6.8 – – (1040) [1038] (20.3) [24.73] (671.01) [3842.6] (37-48-15) [6-31-63] (2.63) [2.65]
7 – – 0.01 6.5 – – (1203) [1197] (20.31) [24.71] (679.96) [3948.1] (35-50-15) [6-31-63] (3.1) [3.02]
7 – – 0.01 6 – – (1521) [1485] (20.21) [24.65] (663.97) [3818.2] (38-47-15) [6-31-63] (3.58) [3.48]
SW-KRLS 7 0.1 – – – – – 100 (23.32) [23.34] (1039.2) [1687.4] (32-38-30) [19-34-47] (2.52) [2.53]
7 0.1 – – – – – 200 (32.06) [31.31] (13580) [8916.8] (16-31-53) [3-26-71] (5.77) [5.74]
7 0.1 – – – – – 600 (22.61) [25.55] (771.55) [3567.3] (42-44-14) [4-38-58] (100.61) [91.88]
7 0.1 – – – – – 770 (23.66) [25.05] (909.5) [3310.4] (47-39-14) [4-39-57] (142.85) [147.82]
7 0.1 – – – – – 1000 (22.2) [24.6] (874.63) [3037.1] (46-42-12) [4-39-57] (242.83) [253]
7 0.1 – – – – – 1500 (23.88) [23.69] (1164.2) [2904.6] (53-37-10) [12-46-42] (537.33) [535.02]
FB-KRLS 7 0.1 – 0.00001 – – – 100 (24.32) [24.12] (864.3) [5074.3] (48-29-23) [7-40-53] (3.22) [3.24]
7 0.1 – 0.00001 – – – 200 (26.2) [23.13] (1014.5) [2767.2] (52-35-13) [8-38-54] (6.28) [6.46]
7 0.1 – 0.00001 – – – 450 (28.9) [21.12] (1393.6) [1520.5] (65-28-7) [24-38-38] (53.61) [56.3]
7 0.1 – 0.00001 – – – 600 (30.82) [20.92] (1651.5) [1194.8] (73-22-5) [27-40-33] (98.43) [96.17]
7 0.1 – 0.00001 – – – 1000 (39.12) [22.52] (3676) [825.2] (88-10-2) [41-41-18] (267.93) [254.94]
7 0.1 – 0.00001 – – – 1500 (43.53) [29.27] (5755.3) [1464.8] (95-3-2) [74-21-5] (532.2) [618.98]
ANS-QKRLS 7 0 0.05 – 0.005 0.99 0.00001 (107) [120] (22.39) [18.94] (2411.5) [645.43] (6-41-53) [24-56-20] (17.31) [18.44]
7 0 0.03 – 0.005 0.99 0.00001 (166) [172] (22.62) [18.22] (2670.6) [626.45] (5-38-57) [20-59-21] (27.79) [28.88]
7 0 0.004 – 0.005 0.99 0.00001 (621) [617] (24.67) [21.6] (2888.5) [1242.7] (4-35-61) [29-49-22] (267.46) [255.58]
7 0 0.003 – 0.005 0.99 0.00001 (727) [722] (24.71) [21.56] (3213.6) [1451.9] (4-37-59) [29-50-21] (373.62) [373.44]
7 0 0.0025 – 0.005 0.99 0.00001 (794) [800] (24.66) [20.75] (3096.9) [846.05] (4-41-55) [27-52-21] (461.2) [447.04]
7 0 0.0015 – 0.005 0.99 0.00001 (1025) [1027] (24.89) [22.64] (3461.9) [1490.7] (4-43-53) [24-51-25] (836.54) [797.43]
QALD-KRLS 7 0.01 0.07 – 0.1 – – (116) [122] (17.97) [19.08] (1018.9) [1212.7] (11-61-28) [13-54-33] (5.13) [5.39]
7 0.01 0.03 – 0.1 – – (369) [370] (17.07) [18.3] (677.58) [1145] (14-66-20) [7-61-32] (50.39) [46.62]
7 0.01 0.02 – 0.1 – – (777) [758] (17.19) [18.12] (663.45) [1205] (14-63-23) [8-62-30] (328.79) [294.48]
7 0.01 0.017 – 0.1 – – (1110) [1083] (17.39) [18.15] (710.83) [1160.5] (15-63-22) [8-60-32] (843.54) [752.04]
7 0.01 0.01 – 25 – – (22) [23] (18.23) [19.25] (586.5) [728.46] (22-52-26) [24-46-30] (0.82) [0.8]
7 0.01 0.01 – 13 – – (128) [131] (18.01) [20.43] (1060.4) [1350] (12-64-24) [11-56-33] (4.77) [4.83]
7 0.01 0.01 – 8 – – (621) [623] (17.84) [18.58] (753.7) [1148.9] (20-60-20) [11-55-34] (131.78) [131.2]
7 0.01 0.01 – 6.8 – – (1040) [1038] (17.81) [19.22] (857.55) [1506.6] (18-59-23) [8-56-36] (515.63) [502.6]

References

[1] T. Hofmann, B. Schölkopf, A.J. Smola, Kernel methods in machine learning, Ann. Statist. 36 (3) (2008) 1171–1220.
[2] J. Shawe-Taylor, N. Cristianini, et al., Cambridge University Press, 2004.
[3] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (3) (1950) 337–404.
[4] A.J. Smola, B. Schölkopf, Learning with Kernels, Vol. 4, Citeseer, 1998.
[5] W.S. Noble, What is a support vector machine? Nature Biotechnol. 24 (12) (2006) 1565–1567.
[6] J.Q. Shi, T. Choi, Gaussian Process Regression Analysis for Functional Data, CRC Press, 2011.
[7] B. Schölkopf, A. Smola, K.-R. Müller, Kernel principal component analysis, in: International Conference on Artificial Neural Networks, Springer, 1997, pp.
583–588.
[8] M. Tipping, The relevance vector machine, Adv. Neural Inf. Process. Syst. 12 (1999).
[9] J.C. Príncipe, W. Liu, S. Haykin, Kernel Adaptive Filtering: A Comprehensive Introduction, John Wiley & Sons, 2011.
[10] S. Garcia-Vega, X.-J. Zeng, J. Keane, Stock returns prediction using kernel adaptive filtering within a stock market interdependence approach, Expert Syst.
Appl. 160 (2020) 113668.
[11] L. Shi, J. Tan, J. Wang, Q. Li, L. Lu, B. Chen, Robust kernel adaptive filtering for nonlinear time series prediction, Signal Process. 210 (2023) 109090.
[12] H. Zhou, J. Huang, F. Lu, Reduced kernel recursive least squares algorithm for aero-engine degradation prediction, Mech. Syst. Signal Process. 95 (2017)
446–467.
[13] H. Zhou, J. Huang, F. Lu, J. Thiyagalingam, T. Kirubarajan, Echo state kernel recursive least squares algorithm for machine condition prediction, Mech.
Syst. Signal Process. 111 (2018) 68–86.
[14] W. Ma, J. Duan, W. Man, H. Zhao, B. Chen, Robust kernel adaptive filters based on mean p-power error for noisy chaotic time series prediction, Eng.
Appl. Artif. Intell. 58 (2017) 101–110.
[15] A.S. Eltrass, Novel cascade filter design of improved sparse low-rank matrix estimation and kernel adaptive filtering for ECG denoising and artifacts
cancellation, Biomed. Signal Process. Control 77 (2022) 103750.
[16] S. An, W. Liu, S. Venkatesh, Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression, Pattern Recognit. 40
(8) (2007) 2154–2162.
[17] B. Chen, J. Liang, N. Zheng, J.C. Príncipe, Kernel least mean square with adaptive kernel size, Neurocomputing 191 (2016) 95–106.
[18] M. Yukawa, Multikernel adaptive filtering, IEEE Trans. Signal Process. 60 (9) (2012) 4672–4682.
[19] J. Zhao, H. Zhang, J.A. Zhang, Gaussian kernel adaptive filters with adaptive kernel bandwidth, Signal Process. 166 (2020) 107270.

33
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

[20] K. Slavakis, S. Theodoridis, I. Yamada, Online kernel-based classification using adaptive projection algorithms, IEEE Trans. Signal Process. 56 (7) (2008)
2781–2796.
[21] J. Platt, A resource-allocating network for function interpolation, Neural Comput. 3 (2) (1991) 213–225.
[22] W. Liu, I. Park, J.C. Principe, An information theoretic approach of designing sparse kernel adaptive filters, IEEE Trans. Neural Netw. 20 (12) (2009)
1950–1961.
[23] C. Richard, J.C.M. Bermudez, P. Honeine, Online prediction of time series data with kernels, IEEE Trans. Signal Process. 57 (3) (2008) 1058–1067.
[24] L. Csató, M. Opper, Sparse on-line Gaussian processes, Neural Comput. 14 (3) (2002) 641–668.
[25] Y. Engel, S. Mannor, R. Meir, The kernel recursive least-squares algorithm, IEEE Trans. Signal Process. 52 (8) (2004) 2275–2285.
[26] B. Chen, S. Zhao, P. Zhu, J.C. Príncipe, Quantized kernel least mean square algorithm, IEEE Trans. Neural Netw. Learn. Syst. 23 (1) (2011) 22–32.
[27] B. Chen, S. Zhao, P. Zhu, J.C. Principe, Quantized kernel recursive least squares algorithm, IEEE Trans. Neural Netw. Learn. Syst. 24 (9) (2013) 1484–1491.
[28] S. Van Vaerenbergh, J. Via, I. Santamaría, A sliding-window kernel RLS algorithm and its application to nonlinear channel identification, in: 2006 IEEE
International Conference on Acoustics Speech and Signal Processing Proceedings, 5, IEEE, 2006, p. V.
[29] S. Van Vaerenbergh, I. Santamaría, W. Liu, J.C. Príncipe, Fixed-budget kernel recursive least-squares, in: 2010 IEEE International Conference on Acoustics,
Speech and Signal Processing, IEEE, 2010, pp. 1882–1885.
[30] J. Guo, H. Chen, S. Chen, Improved kernel recursive least squares algorithm based online prediction for nonstationary time series, IEEE Signal Process.
Lett. 27 (2020) 1365–1369.
[31] M. Han, S. Zhang, M. Xu, T. Qiu, N. Wang, Multivariate chaotic time series online prediction based on improved kernel recursive least squares algorithm,
IEEE Trans. Cybern. 49 (4) (2018) 1160–1172.
[32] K. Li, J.C. Principe, Transfer learning in adaptive filters: The nearest instance centroid-estimation kernel least-mean-square algorithm, IEEE Trans. Signal
Process. 65 (24) (2017) 6520–6535.
[33] S. Van Vaerenbergh, M. Lázaro-Gredilla, I. Santamaría, Kernel recursive least-squares tracker for time-varying regression, IEEE Trans. Neural Netw. Learn.
Syst. 23 (8) (2012) 1313–1326.
[34] M. Shen, K. Xiong, S. Wang, Multikernel adaptive filtering based on random features approximation, Signal Process. 176 (2020) 107712.
[35] T. Zhang, S. Wang, X. Huang, L. Jia, Kernel recursive least squares algorithm based on the Nyström method with 𝑘-means sampling, IEEE Signal Process.
Lett. 27 (2020) 361–365.
[36] X. Yang, Y. Mu, K. Cao, M. Lv, B. Peng, Y. Zhang, G. Wang, Robust kernel recursive adaptive filtering algorithms based on M-estimate, Signal Process.
207 (2023) 108952.
[37] J.Z. Sikorska, M. Hodkiewicz, L. Ma, Prognostic modelling options for remaining useful life estimation by industry, Mech. Syst. Signal Process. 25 (5)
(2011) 1803–1836.
[38] X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, Remaining useful life estimation–a review on the statistical data driven approaches, European J. Oper. Res. 213
(1) (2011) 1–14.
[39] J. Chen, R. Huang, Z. Chen, W. Mao, W. Li, Transfer learning algorithms for bearing remaining useful life prediction: A comprehensive review from an
industrial application perspective, Mech. Syst. Signal Process. 193 (2023) 110239.
[40] Y. Ding, M. Jia, Q. Miao, P. Huang, Remaining useful life estimation using deep metric transfer learning for kernel regression, Reliab. Eng. Syst. Saf. 212
(2021) 107583.
[41] X. Li, L. Zhang, Z. Wang, P. Dong, Remaining useful life prediction for lithium-ion batteries based on a hybrid model combining the long short-term
memory and elman neural networks, J. Energy Storage 21 (2019) 510–518.
[42] Y. Zhang, Q. Tang, Y. Zhang, J. Wang, U. Stimming, A.A. Lee, Identifying degradation patterns of lithium ion batteries from impedance spectroscopy using
machine learning, Nature Commun. 11 (1) (2020) 1706.
[43] A. El Mejdoubi, H. Chaoui, J. Sabor, H. Gualous, Remaining useful life prognosis of supercapacitors under temperature and voltage aging conditions, IEEE
Trans. Ind. Electron. 65 (5) (2017) 4357–4367.
[44] S. Zhao, F. Blaabjerg, H. Wang, An overview of artificial intelligence applications for power electronics, IEEE Trans. Power Electron. 36 (4) (2020)
4633–4658.
[45] L. Viale, A.P. Daga, A. Fasana, L. Garibaldi, Least squares smoothed k-nearest neighbors online prediction of the remaining useful life of a NASA turbofan,
Mech. Syst. Signal Process. 190 (2023) 110154.
[46] A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage propagation modeling for aircraft engine run-to-failure simulation, in: 2008 International Conference
on Prognostics and Health Management, IEEE, 2008, pp. 1–9.
[47] B. Widrow, M.E. Hoff, Adaptive Switching Circuits, Tech. Rep., Stanford Univ Ca Stanford Electronics Labs, 1960.
[48] B. Widrow, Thinking about thinking: the discovery of the LMS algorithm, IEEE Signal Process. Mag. 22 (1) (2005) 100–106.
[49] R.L. Plackett, Some theorems in least squares, Biometrika 37 (1/2) (1950) 149–157.
[50] A.H. Sayed, T. Kailath, A state-space approach to adaptive RLS filtering, IEEE Signal Process. Mag. 11 (3) (1994) 18–60.
[51] J.H. Manton, P.-O. Amblard, et al., A primer on reproducing kernel hilbert spaces, Found. Trends Signal Process. 8 (1–2) (2015) 1–126.
[52] W. Liu, P.P. Pokharel, J.C. Principe, The kernel least-mean-square algorithm, IEEE Trans. Signal Process. 56 (2) (2008) 543–554.
[53] W. Liu, I. Park, Y. Wang, J.C. Principe, Extended kernel recursive least squares algorithm, IEEE Trans. Signal Process. 57 (10) (2009) 3801–3814.
[54] B. Schölkopf, A.J. Smola, F. Bach, et al., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002.
[55] J. Yuan, L. Bo, K. Wang, T. Yu, Adaptive spherical Gaussian kernel in sparse Bayesian learning framework for nonlinear regression, Expert Syst. Appl. 36
(2) (2009) 3982–3989.
[56] X. Guo, S. Ou, M. Jiang, Y. Gao, J. Xu, Z. Cai, A new sparse kernel RLS algorithm for identification of nonlinear systems, IEEE Access 9 (2021)
163165–163177.
[57] G. Sateesh Babu, P. Zhao, X.-L. Li, Deep convolutional neural network based regression approach for estimation of remaining useful life, in: Database
Systems for Advanced Applications: 21st International Conference, DASFAA 2016, Dallas, TX, USA, April 16-19, 2016, Proceedings, Part I 21, Springer,
2016, pp. 214–228.
[58] X. Li, Q. Ding, J.-Q. Sun, Remaining useful life estimation in prognostics using deep convolution neural networks, Reliab. Eng. Syst. Saf. 172 (2018) 1–11.
[59] Y. Song, S. Gao, Y. Li, L. Jia, Q. Li, F. Pang, Distributed attention-based temporal convolutional network for remaining useful life prediction, IEEE Internet
Things J. 8 (12) (2020) 9594–9602.
[60] A. Al-Dulaimi, S. Zabihi, A. Asif, A. Mohammadi, A multimodal and hybrid deep neural network model for remaining useful life estimation, Comput. Ind.
108 (2019) 186–196.
[61] E. Ramasso, M. Rombaut, N. Zerhouni, Joint prediction of observations and states in time-series: a partially supervised prognostics approach based on
belief functions and KNN, Networks 4 (2013) 5.
[62] E. Ramasso, R. Gouriveau, Remaining useful life estimation by classification of predictions based on a neuro-fuzzy system and theory of belief functions,
IEEE Trans. Reliab. 63 (2) (2014) 555–566.
[63] K. Javed, R. Gouriveau, N. Zerhouni, A new multivariate approach for prognostics based on extreme learning machine and fuzzy clustering, IEEE Trans.
Cybern. 45 (12) (2015) 2626–2639.
[64] A. Khosravi, S. Nahavandi, D. Creighton, A.F. Atiya, Lower upper bound estimation method for construction of neural network-based prediction intervals,
IEEE Trans. Neural Netw. 22 (3) (2010) 337–346.

34
G.D. Karatzinis et al. Mechanical Systems and Signal Processing 218 (2024) 111551

[65] I. de Pater, M. Mitici, Novel metrics to evaluate probabilistic remaining useful life prognostics with applications to turbofan engines, in: PHM Society
European Conference, Vol. 7, No. 1, 2022, pp. 96–109.
[66] H. Liu, Z. Liu, W. Jia, X. Lin, Remaining useful life prediction using a novel feature-attention-based end-to-end approach, IEEE Trans. Ind. Inform. 17 (2)
(2020) 1197–1207.
[67] A.L. Ellefsen, E. Bjørlykhaug, V. Æsøy, S. Ushakov, H. Zhang, Remaining useful life predictions for turbofan engine degradation using semi-supervised
deep architecture, Reliab. Eng. Syst. Saf. 183 (2019) 240–251.
[68] C. Zhang, P. Lim, A.K. Qin, K.C. Tan, Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics, IEEE Trans. Neural
Netw. Learn. Syst. 28 (10) (2016) 2306–2318.
[69] Y. Peng, H. Wang, J. Wang, D. Liu, X. Peng, A modified echo state network based remaining useful life estimation approach, in: 2012 IEEE Conference
on Prognostics and Health Management, IEEE, 2012, pp. 1–7.
[70] K. Aggarwal, O. Atan, A.K. Farahat, C. Zhang, K. Ristovski, C. Gupta, Two birds with one network: Unifying failure event prediction and time-to-failure
modeling, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 1308–1317.
[71] G.D. Karatzinis, N.A. Apostolikas, Y.S. Boutalis, G.A. Papakostas, Fuzzy cognitive networks in diverse applications using hybrid representative structures,
Int. J. Fuzzy Syst. (2023) 1–21.
[72] J.-Y. Wu, M. Wu, Z. Chen, X. Li, R. Yan, A joint classification-regression method for multi-stage remaining useful life prediction, J. Manuf. Syst. 58 (2021)
109–119.
[73] X. Zhang, Y. Dong, L. Wen, F. Lu, W. Li, Remaining useful life estimation based on a new convolutional and recurrent neural network, in: 2019 Ieee 15th
International Conference on Automation Science and Engineering, Case, IEEE, 2019, pp. 317–322.
[74] Y. Liao, L. Zhang, C. Liu, Uncertainty prediction of remaining useful life using long short-term memory network based on bootstrap method, in: 2018 Ieee
International Conference on Prognostics and Health Management, Icphm, IEEE, 2018, pp. 1–8.
[75] C.-S. Hsu, J.-R. Jiang, Remaining useful life estimation using long short-term memory deep learning, in: 2018 Ieee International Conference on Applied
System Invention, Icasi, IEEE, 2018, pp. 58–61.
[76] S. Zheng, K. Ristovski, A. Farahat, C. Gupta, Long short-term memory network for remaining useful life estimation, in: 2017 IEEE International Conference
on Prognostics and Health Management, ICPHM, IEEE, 2017, pp. 88–95.
[77] L. Fan, Y. Chai, X. Chen, Trend attention fully convolutional network for remaining useful life estimation, Reliab. Eng. Syst. Saf. 225 (2022) 108590.
[78] Z. Chen, M. Wu, R. Zhao, F. Guretno, R. Yan, X. Li, Machine remaining useful life prediction via an attention-based deep learning approach, IEEE Trans.
Ind. Electron. 68 (3) (2020) 2521–2531.
[79] J. Wang, G. Wen, S. Yang, Y. Liu, Remaining useful life estimation in prognostics using deep bidirectional lstm neural network, in: 2018 Prognostics and
System Health Management Conference (PHM-Chongqing), IEEE, 2018, pp. 1037–1042.
[80] Z. Wu, S. Yu, X. Zhu, Y. Ji, M. Pecht, A weighted deep domain adaptation method for industrial fault prognostics according to prior distribution of
complex working conditions, Ieee Access 7 (2019) 139802–139814.
[81] X. Chen, H. Wang, S. Lu, J. Xu, R. Yan, Remaining useful life prediction of turbofan engine using global health degradation representation in federated
learning, Reliab. Eng. Syst. Saf. (2023) 109511.
[82] A. Ruiz-Tagle Palazuelos, E.L. Droguett, R. Pascual, A novel deep capsule neural network for remaining useful life estimation, Proc. Inst. Mech. Eng. O
234 (1) (2020) 151–167.
[83] J. Xia, Y. Feng, C. Lu, C. Fei, X. Xue, LSTM-based multi-layer self-attention method for remaining useful life estimation of mechanical systems, Eng. Fail.
Anal. 125 (2021) 105385.
[84] X. Zhang, Y. Guo, H. Shangguan, R. Li, X. Wu, A. Wang, Predicting remaining useful life of a machine based on embedded attention parallel networks,
Mech. Syst. Signal Process. 192 (2023) 110221.
[85] T. Li, Z. Zhao, C. Sun, R. Yan, X. Chen, Hierarchical attention graph convolutional network to fuse multi-sensor signals for remaining useful life prediction,
Reliab. Eng. Syst. Saf. 215 (2021) 107878.
[86] J. Zhang, Y. Jiang, S. Wu, X. Li, H. Luo, S. Yin, Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention
mechanism, Reliab. Eng. Syst. Saf. 221 (2022) 108297.
[87] H. Mohamadipanah, M. Heydari, G. Chowdhary, Deep kernel recursive least-squares algorithm, Nonlinear Dynam. 104 (2021) 2515–2530.

35

You might also like