A Comprehensive Overview and Comparative Analysis On Deep Learning Models
A Comprehensive Overview and Comparative Analysis On Deep Learning Models
054314
REVIEW
Learning Models
Farhad Mortezapour Shiri 1, *, Thinagaran Perumal1, Norwati Mustapha1, and Raihani
Mohamed1
1 Faculty of Computer Science and Information Technology, University Putra Malaysia (UPM), Serdang, 43400,
Malaysia
*Corresponding Author: Farhad Mortezapour Shiri. Email: [email protected]
Received: 24 May 2024 Accepted: 23 October 2024 Published: 20 November 2024
ABSTRACT
Deep learning (DL) has emerged as a powerful subset of machine learning (ML) and artificial
intelligence (AI), outperforming traditional ML methods, especially in handling unstructured and large
datasets. Its impact spans across various domains, including speech recognition, healthcare,
autonomous vehicles, cybersecurity, predictive analytics, and more. However, the complexity and
dynamic nature of real-world problems present challenges in designing effective deep learning models.
Consequently, several deep learning models have been developed to address different problems and
applications. In this article, we conduct a comprehensive survey of various deep learning models,
including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Temporal
Convolutional Networks (TCN), Transformer, Kolmogorov-Arnold networks (KAN), Generative
Models, Deep Reinforcement Learning (DRL), and Deep Transfer Learning. We examine the structure,
applications, benefits, and limitations of each model. Furthermore, we perform an analysis using three
publicly available datasets: IMDB, ARAS, and Fruit-360. We compared the performance of six renowned
deep learning models: CNN, RNN, Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated
Recurrent Unit (GRU), and Bidirectional GRU alongside two newer models, TCN and Transformer, using
the IMDB and ARAS datasets. Additionally, we evaluated the performance of eight CNN-based models,
including VGG (Visual Geometry Group), Inception, ResNet (Residual Network), InceptionResNet,
Xception (Extreme Inception), MobileNet, DenseNet (Dense Convolutional Network), and NASNet
(Neural Architecture Search Network), for image classification tasks using the Fruit-360 dataset.
KEYWORDS
Deep Learning, Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated
Recurrent Unit (GRU), Temporal Convolutional Network (TCN), Transformer, Kolmogorov-Arnold
networks (KAN), Deep Reinforcement Learning (DRL), Deep Transfer Learning (DTL).
1 Introduction
Artificial intelligence (AI) aims to emulate human-level intelligence in machines. In computer
science, AI refers to the study of "intelligent agents," which are objects capable of perceiving their
environment and taking actions to maximize their chances of achieving specific goals [1]. Machine
learning (ML) is a field that focuses on the development and application of methods capable of
learning from datasets [2]. ML finds extensive use in various domains, such as speech recognition,
This work is licensed under a Creative Commons Attribution 4.0 International License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
x JAI, 2024
computer vision, text analysis, video games, medical sciences, and cybersecurity.
In recent years, deep learning (DL) techniques, a subset of machine learning (ML), have
outperformed traditional ML approaches across numerous tasks, driven by several critical
advancements [3]. The proliferation of large datasets has been pivotal in enabling models to learn
intricate patterns and relationships, thereby significantly enhancing their performance [4].
Concurrently, advancements in hardware acceleration technologies, notably Graphics Processing
Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) [5] have markedly reduced model
training times by facilitating rapid computations and parallel processing capabilities. These
advancements have substantially accelerated the training process. Moreover, enhancements in
algorithmic techniques for optimization and training have further augmented the speed and
efficiency of deep learning models, leading to quicker convergence and superior generalization
capabilities [4]. Deep learning techniques have demonstrated remarkable success across a wide
range of applications, including computer vision (CV), natural language processing (NLP), and
speech recognition. These applications underscore the transformative impact of DL in various
domains, where it continues to set new performance benchmarks [6, 7].
Deep learning models draw inspiration from the structure and functionality of the human
nervous system and brain. These models employ input, hidden, and output layers to organize
processing units. Within each layer, the nodes or units are interconnected with those in the layer
below, and each connection is assigned a weight value. The units sum the inputs after multiplying
them by their corresponding weights [8]. Fig. 1 illustrates the relationship between AI, ML, and
DL, highlighting that machine learning and deep learning are subfields of artificial intelligence.
The objective of this research is to provide a comprehensive overview of various deep learning
models and compare their performance across different applications. In Section 2, we introduce a
fundamental definition of deep learning. Section 3 covers supervised deep learning models,
including Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN), Temporal Convolutional Networks (TCN), and Kolmogorov-Arnold
Networks (KAN). Section 4 reviews generative models such as Autoencoders, Generative
Adversarial Networks (GANs), and Deep Belief Networks (DBNs). Section 5 presents a
comprehensive survey of the Transformer architecture. Deep Reinforcement Learning (DRL) is
discussed in Section 6, while Section 7 addresses Deep Transfer Learning (DTL). The principles
of hybrid deep learning are explored in Section 8, followed by a discussion of deep learning
applications in Section 9. Section 10 surveys the challenges in deep learning and potential
alternative solutions. In Section 11, we conduct experiments and analyze the performance of
different deep learning models using three datasets. Research directions and future aspects are
covered in Section 12. Finally, Section 13 concludes the paper.
Artificial
Machine Deep
Intelligent
Learning Learning
Figure 1. Relationship between artificial intelligence (AI), machine learning (ML), and deep
learning (DL).
JAI, 2024 x
2 Deep Learning
Deep learning (DL) involves the process of learning hierarchical representations of data by
utilizing architectures with multiple hidden layers. With the advancement of high-performance
computing facilities, deep learning techniques using deep neural networks have gained increasing
popularity [9]. In a deep learning algorithm, data is passed through multiple layers, with each layer
progressively extracting features and transmitting information to the subsequent layer. The initial
layers extract low-level characteristics, which are then combined by later layers to form a
comprehensive representation [6].
In traditional machine learning techniques, the classification task typically involves a
sequential process that includes pre-processing, feature extraction, meticulous feature selection,
learning, and classification. The effectiveness of machine learning methods heavily relies on
accurate feature selection, as biased feature selection can lead to incorrect class classification. In
contrast, deep learning models enable simultaneous learning and classification, eliminating the
need for separate steps. This capability makes deep learning particularly advantageous for
automating feature learning across diverse tasks [10]. Fig. 2 visually illustrates the distinction
between deep learning and traditional machine learning in terms of feature extraction and learning.
In the era of deep learning, a wide array of methods and architectures have been developed.
These models can be broadly categorized into two main groups: discriminative (supervised) and
generative (unsupervised) approaches. Among the discriminative models, two prominent groups
are Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Additionally,
generative approaches encompass various models such as Generative Adversarial Networks (GANs)
and Auto-Encoders (AEs) [11]. In the following sections, we provide a comprehensive survey of
different types of deep learning models.
Figure 2. Visual illustration of the distinction between deep learning and traditional machine
learning in terms of feature extraction and learning [10].
(ANN) that serves as a foundation architecture for deep learning or deep neural networks (DNNs)
[11]. It operates as a supervised learning approach. The MLP consists of three layers: the input
layer, the output layer, and one or more hidden layers [12]. It is a fully connected network, meaning
each neuron in one layer is connected to all neurons in the subsequent layer.
In an MLP, the input layer receives the input data and performs feature normalization. The
hidden layers, which can vary in number, process the input signals. The output layer makes
decisions or predictions based on the processed information [13]. Fig. 3 (a) depicts a single-neuron
perceptron model, where the activation function φ (Eq. (1)) is a non-linear function used to map
the summation function (𝑥𝑤 + 𝑏) to the output value 𝑦.
𝑦 = 𝜑(𝑥𝑤 + 𝑏) (1)
In Eq. (1), the terms 𝑥, 𝑤, 𝑏, and 𝑦 represent the input vector, weighting vector, bias, and
output value, respectively [14]. Fig. 3 (b) illustrates the structure of the multilayer perceptron (MLP)
model.
(a) (b)
Figure 3. (a) Single-neuron perceptron model. (b) Structure of the MLP [14].
Class 1
Input
Data
Class N
X1
w1
X2
w2
Output
. 𝒛 = σ𝒊 𝒘𝒊 + 𝒙𝒊 + 𝒃 f (z)
.
. WN
XN
Fully Connected (FC) Layer: The FC layer is typically located at the end of a CNN
architecture. In this layer, every neuron is connected to all neurons in the preceding layer, adhering
to the principles of a conventional multi-layer perceptron neural network. The FC layer receives
input from the last pooling or convolutional layer, which is a vector created by flattening the feature
maps. The FC layer serves as the classifier in the CNN, enabling the network to make predictions
[10].
Activation Functions: Activation functions are fundamental components in convolutional
neural networks (CNNs), indispensable for introducing non-linearity into the network. This non-
linearity is crucial for CNN’s ability to model complex patterns and relationships within the data,
allowing it to perform tasks beyond simple linear classification or regression. Without non-linear
activation functions, a CNN would be limited to linear operations, significantly constraining its
capacity to accurately represent the intricate, non-linear behaviors typical of many real-world
phenomena [32].
Fig. 7 typically illustrates how these activation functions modulate input signals to produce
output, highlighting the non-linear transformations applied to the input data across different regions
of the function curve. In this figure, 𝑥𝑖 represents the input feature, while 𝑤𝑖𝑗 denotes the weight
associated with the connection between the input feature 𝑥𝑖 and neuron 𝑗. The figure shows that
neuron 𝑗 receives 𝑛 features simultaneously. The output from neuron 𝑗 is labeled by 𝑦𝑗 , and its
internal state, or bias, is indicated by 𝑏𝑗 . The activation function, depicted as 𝑓(. ), could be any
one of several types such as the Rectified Linear Unit (ReLU), hyperbolic tangent (Tanh), Sigmoid
function, or others [33, 34].
These various activation functions are shown in Fig. 8, with emphasis on their distinct
characteristics and profiles. These activation functions are essential for convolutional neural
networks (CNNs) to be more effective in a variety of applications by allowing them to recognize
intricate patterns and provide accurate predictions. Sigmoid and Tanh functions are frequently
referred to as saturating nonlinearities due to the way they act when inputs are very large or small.
As per the reference, the Sigmoid function approaches values of 0 or 1, whereas the Tanh function
leans towards -1 or 1[17]. Different alternative nonlinearities have been suggested for reducing
problems associated with these saturating effects, including Rectified Linear Unit (ReLU) [35],
Leaky ReLU [36], Parametric Rectified Linear Units (PReLU) [37], Randomized Leaky ReLU
(RReLU) [38], S-shaped ReLU (SReLU) [39], and Exponential Linear Units (ELUs) [40], Gaussian
Error Linear Units (GELUs) [41].
JAI, 2024 x
ReLU (Rectified Linear Unit) is one of the most often used activation functions in modern
CNNs because of how well it solves the vanishing gradient issue during training. The definition of
ReLU in mathematics is as Eq. (2), where the input to the neuron is represented by 𝑥 [34].
𝑥 , 𝑖𝑓 𝑥𝑖 ≥ 0
𝑓(𝑥) = max(0, 𝑥) = { 𝑖 (2)
0, 𝑖𝑓 𝑥𝑖 < 0
This feature helps CNN learn complicated features more efficiently by effectively "turning
off" any negative input values while maintaining positive values. It also keeps neurons from being
saturated during training.
As an alternative, the definition of the Sigmoid function is represented by Eq. (3), where 𝑥
stands for the input of the neuron.
1
𝑓(𝑥) = 𝑒 −𝑥 (3)
Although the sigmoid distinctive S-shape and capacity to condense real numbers into a range
between 0 and 1 make it useful for binary classification, its propensity to saturate can hinder
training by causing the vanishing gradient problem in deep neural networks.
Convolutional Neural Networks (CNNs) are extensively used in various fields, including
natural language processing, image segmentation, image analysis, video analysis, and more.
Several CNN variations have been developed, such as AlexNet [42], VGG (Visual Geometry Group)
[43], Inception [44, 45], ResNet (Residual Networks) [46, 47], WideResNet [48], FractalNet [49],
SqueezeNet [50], InceptionResNet [51], Xception (Extreme Inception) [52], MobileNet [53, 54],
DenseNet (Dense Convolutional Network) [55], SENet (Squeeze-and-Excitation Network) [56],
Efficientnet [57, 58] among others. These variants are applied in different application areas based
on their learning capabilities and performance.
x JAI, 2024
Fig. 9 depicts a simple recurrent neural network, where the internal memory (ℎ𝑡 ) is computed
using Eq. (4) [70]:
ℎ𝑡 = 𝑔(𝑊𝑥𝑡 + 𝑈ℎ𝑡 + 𝑏) (4)
In this equation, 𝑔() represents the activation function (typically Tanh), 𝑈 and 𝑊 are
adjustable weight matrices for the hidden state (ℎ), 𝑏 is the bias, and 𝑥 denotes the input vector.
RNNs have proven to be powerful models for processing sequential data, leveraging their
ability to capture dependencies over time. The various types of RNN models, such as LSTM,
bidirectional LSTM, GRU, and bidirectional GRU, have been developed to address specific
challenges in different applications.
(a) (b)
Figure 10. (a) The high-level architecture of LSTM. (b) The inner structure of LSTM unit [60].
Fig. 10 (b) illustrates the update mechanism within the inner structure of an LSTM. The update
for the LSTM unit is expressed by Eq. (5):
(𝑡)
ℎ(𝑡) = 𝑔𝑜 𝑓ℎ (𝑠 (𝑡) )
(𝑡) (𝑡)
𝑠 (𝑡−1) = 𝑔𝑓 𝑠 (𝑡−1) + 𝑔𝑖 𝑓𝑠 (𝑤ℎ(𝑡−1) ) + 𝑢𝑋 (𝑡) + 𝑏
(𝑡)
𝑔𝑖 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 (𝑤𝑖 ℎ(𝑡−1) + 𝑢𝑖 𝑋 (𝑡) + 𝑏𝑖 ) (5)
(𝑡)
𝑔𝑓 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 (𝑤𝑓 ℎ(𝑡−1) + 𝑢𝑓 𝑋 (𝑡) + 𝑏𝑓 )
(𝑡)
{𝑔𝑜 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 (𝑤𝑜 ℎ(𝑡−1) + 𝑢𝑜 𝑋 (𝑡) + 𝑏𝑜 )
where 𝑓ℎ and 𝑓𝑠 represent the activation functions of the system state and internal state,
typically utilizing the hyperbolic tangent function. The gating operation, denoted as g, is a
feedforward neural network with a sigmoid activation function, ensuring output values within the
range of [0, 1], which are interpreted as a set of weights. The subscripts 𝑖, 𝑜, and 𝑓 correspond to
the input gate, output gate, and forget gate, respectively.
While standard LSTM has demonstrated promising performance in various tasks, it may
struggle to comprehend input structures that are more complex than a sequential format. To address
this limitation, a tree-structured LSTM network, known as S-LSTM, was proposed by [74]. S-
LSTM consists of memory blocks comprising an input gate, two forget gates, a cell gate, and an
output gate. While S-LSTM exhibits superior performance in challenging sequential modeling
problems, it comes with higher computational complexity compared to standard LSTM [75].
During the training phase, the forward and backward LSTM layers independently extract
features and update their internal states based on the input sequence. The output of each LSTM
layer at each time step is a prediction score. These prediction scores are then combined using a
weighted sum to generate the final output result [78]. By incorporating information from both
directions, Bi-LSTM models can capture a broader context and improve the model's ability to
model temporal dependencies in sequential data.
Bi-LSTM has been widely applied in various sequence modeling tasks such as natural
language processing, speech recognition, and sentiment analysis. It has shown promising results in
capturing complex patterns and dependencies in sequential data, making it a popular choice for
tasks that require an understanding of both past and future context.
Finally, the final memory state ℎ𝑡 is determined by a combination of the previous hidden
state and the candidate activation (Eq. (9)). The update gate determines the balance between the
previous hidden state and the candidate activation. Additionally, an output gate 𝑜𝑡 can be
introduced to control the information flow from the current memory content to the output (Eq. (10)).
The output gate is computed using the current memory state ℎ𝑡 and is typically followed by an
activation function, such as the sigmoid function.
ℎ𝑡 = (1 − 𝑧𝑡 )ℎ𝑡−1 + 𝑧𝑡 ℎ̃𝑡 (9)
𝑜𝑡 = 𝜎𝑜 (𝑊𝑜 ℎ𝑡 + 𝑏𝑜 ) (10)
where the weight matrix of the output layer is 𝑊𝑜 and the bias vector of the output layer is
𝑏𝑜 .
GRU offers a simpler alternative to LSTM with fewer tensor operations, allowing for faster
training. However, the choice between GRU and LSTM depends on the specific use case and
problem at hand. Both architectures have their advantages and disadvantages, and their
performance may vary depending on the nature of the task [59].
Causal Convolution:
The TCN architecture is built upon two foundational principles. To adhere to the first principle,
the initial layer of a TCN is a one-dimensional fully convolutional network, wherein each hidden
layer maintains the same length as the input layer, achieved through zero-padding. This padding
ensures that each successive layer remains the same length as the preceding one. To satisfy the
second principle, TCN employs causal convolutions. A causal convolution is a specialized one-
dimensional convolutional network where only elements from time 𝑡 and earlier are convolved to
produce the output at time 𝑡. Fig. 15 demonstrates the structure of a causal convolutional network.
Dilated Convolution:
TCN aims to effectively capture long-range dependencies in sequential data. A simple causal
convolution can only consider a history that scales linearly with the depth of the network. This
limitation would necessitate the use of large filters or an exceptionally deep network structure,
which could hinder performance, particularly for tasks requiring a longer history.
JAI, 2024 x
The depth of the network could lead to issues such as vanishing gradients, ultimately degrading
network performance or causing it to plateau. To address these challenges, TCN employs dilated
convolutions [90], which exponentially expand the receptive field, allowing the network to process
large time series efficiently without a proportional increase in computational complexity. The
architecture of a dilated convolutional network is depicted in Fig. 16.
By inserting gaps between the weights of the convolutional kernel, dilated convolutions
effectively increase the network's receptive field while maintaining computational efficiency. The
mathematical formulation of a dilated convolution is given by Eq. (11).
𝑘−1
Residual Connections:
To construct a more expressive TCN model, it is essential to use small filter sizes and stack
multiple layers. However, stacking dilated and causal convolutional layers increases the depth of
the network, potentially leading to problems such as gradient decay or vanishing gradients during
training. To mitigate these issues, TCN incorporates residual connections into the output layer.
Residual connections facilitate the flow of data across layers by adding a shortcut path, allowing
the network to learn residual functions, which are modifications to the identity mapping, rather than
learning a full transformation. This approach has been shown to be highly effective in very deep
networks.
A residual block [46] has a branch that lead to a set of transformations F, whose outputs are
appended to block's input x, as shown in Eq. (12).
𝑜 = 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 (𝑥 + 𝐹(𝑥)) (12)
This method enables the network to focus on learning residual functions rather than the entire
mapping. The TCN residual block typically consists of two layers of dilated causal convolutions
followed by a non-linear activation function, such as Rectified Linear Unit (ReLU). The
convolutional filters within the TCN are normalized using weight normalization [91], and dropout
[92] is applied to each dilated convolution layer for regularization, where an entire channel is zeroed
out at each training step. In contrast to a conventional ResNet, where the input is directly added to
the output of the residual function, TCN adjusts for differing input-output widths by performing an
additional 1 × 1 convolution to ensure that the element-wise addition ⊕ operates on tensors of
matching dimensions.
4.1 Autoencoder
The concept of an autoencoder originated as a neural network designed to reconstruct its input
data. Its fundamental objective is to learn a meaningful representation of the data in an unsupervised
manner, which can have various applications, including clustering [104].
An autoencoder is a neural network that aims to replicate its input at its output. It consists of
an internal hidden layer that defines a code representing the input data. The autoencoder network
is comprised of two main components: an encoder function, denoted as 𝑧 = 𝑓(𝑥), and a decoder
function that generates a reconstruction, denoted as 𝑟 = 𝑔(𝑧) [108]. The function 𝑓(𝑥)
transforms a data point 𝑥 from the data space to the feature space, while the function 𝑔(𝑧)
transforms 𝑧 from the feature space back to the data space to reconstruct the original data point
𝑥 . In modern autoencoders, these functions 𝑧 = 𝑓(𝑥) and 𝑟 = 𝑔(𝑧) are considered as
stochastic functions, represented as 𝑝𝑒𝑛𝑐𝑜𝑑𝑒𝑟 (𝑧|𝑥) and 𝑝𝑑𝑒𝑛𝑐𝑜𝑑𝑒𝑟 (𝑟|𝑧), respectively, where 𝑟
denotes the reconstruction of 𝑥 [109]. Fig. 18 illustrates an autoencoder model.
Autoencoder models find utility in various unsupervised learning tasks, such as generative
modeling [110], dimensionality reduction [111], feature extraction [112], anomaly or outlier
detection [113], and denoising [114].
Reconstructed
Original data
data
Compressed
Representation
In general, autoencoder models can be categorized into two major groups: Regularized
Autoencoders, which are valuable for learning representations for subsequent classification tasks,
and Variational Autoencoders [115], which can function as generative models. Examples of
regularized autoencoder models include Sparse Autoencoder (SAE) [116], Contractive
Autoencoder (CAE) [117], and Denoising Autoencoder (DAE) [118].
Variational Autoencoder (VAE) is a generative model that employs probabilistic distributions,
such as the mean and variance of a Gaussian distribution, for data generation [104]. VAE provide
a principled framework for learning deep latent-variable models and their associated inference
models. The VAE consists of two coupled but independently parameterized models: the encoder
or recognition model and the decoder or generative model. During "expectation maximization"
JAI, 2024 x
learning iterations, the generative model receives an approximate posterior estimation of its latent
random variables from the recognition model, which it uses to update its parameters. Conversely,
the generative model acts as a scaffold for the recognition model, enabling it to learn meaningful
representations of the data, such as potential class labels. In terms of Bayes' rule, the recognition
model is roughly the inverse of the generative model [119].
As previously mentioned, GANs operate based on principles derived from neural networks,
utilizing a training set as input to generate new data that resembles the training set. In the case of
GANs trained on image data, they can generate new images exhibiting human-like characteristics.
The following outlines the step-by-step operation of a GAN [122]:
1. The generator, created by a discriminative network, generates content based on the real data
distribution.
2. The system undergoes training to increase the discriminator's ability to distinguish between
synthesized and real candidates, allowing the generator to better fool the discriminator.
3. The discriminator initially trains using a dataset as the training data.
4. Training sample datasets are repeatedly presented until the desired accuracy is achieved.
5. The generator is trained to process random input and generate candidates that deceive the
discriminator.
x JAI, 2024
6. Backpropagation is employed to update both the discriminator and the generator, with the
former improving its ability to identify real images and the latter becoming more adept at
producing realistic synthetic images.
7. Convolutional Neural Networks (CNNs) are commonly used as discriminators, while
deconvolutional neural networks are utilized as generative networks.
Generative Adversarial Networks (GANs) have introduced numerous applications across
various domains, including image blending [123], 3D object generation [124], face aging [125],
medicine [126, 127], steganography [128], image manipulation [129], text transfer [130], language
and speech synthesis [131], traffic control [132], and video generation [133].
Furthermore, several models have been developed based on the Generative Adversarial
Network (GAN) framework to address specific tasks. These models include Laplacian GAN (Lap-
GAN) [134], Coupled GAN (Co-GAN) [120], Markovian GAN [135], Unrolled GAN [136],
Wasserstein GAN (WGAN) [137], and Boundary Equilibrium GAN (BEGAN) [138], CycleGAN
[139], DiscoGAN [140], Relativistic GAN [141], StyleGAN [142], Evolutionary GAN (E-GAN)
[121], Bayesian Conditional GAN [143], Graph Embedding GAN (GE-GAN) [132].
5 Transformer Architecture
The Transformer architecture was originally introduced by Vaswani et al. [148] in 2017 for
machine translation and has since become a foundational model in deep learning, especially for
natural language processing (NLP). The transformer functions as a self-attention encoder-decoder
structure. The encoder consists of a stack of identical layers, and each layer consists of two
sublayers. A multi-head self-attention mechanism is the first layer, while the other layer is a
position-wise fully connected feed-forward network. Also, A normalizing layer [149] and residual
connections [46] connect the multi-headed self-attention module's inputs and output. After that, a
decoder uses the representation that the encoder produced to create an output sequence. A stack of
identical layers makes up the decoder as well. The decoder adds a third sub-layer to each encoder
layer in addition to the primary two, and this sub-layer handles multi-head attention over the
encoder stack's output. Like the encoder, residual connections and a normalizing layer are used
surrounding each of the sub-layers. The encoder and decoder's overall Transformer design is
depicted in Fig. 21, left and right halves respectively [150, 151].
Traditional RNN-based Seq2Seq models could be replaced with attention layers. Using
various projection matrices, the query, key, and value vectors in the self-attention layer are all
produced from the same sequence [152]. RNN training takes a very long period because it is
sequential and iterative. Transformer training, on the other hand, is parallel and enables all features
to be learned concurrently, significantly improving computational efficiency and cutting down on
the amount of time needed for model training [153].
Multi-Head Attention: In the Transformer model, a multi-headed self-attention mechanism
is employed to enhance the model's ability to capture dependencies between elements in a sequence.
The core principle of the attention mechanism is that every token in the sequence can aggregate
information from other tokens, allowing the model to understand contextual relationships more
effectively. This is achieved by mapping a query, a set of key-value pairs, and an output (each
represented as vectors) to form an attention function. The output is computed as a weighted sum of
the values, where the weights are determined by the compatibility function between the query and
its corresponding key [148].
(a) (b)
Figure 22. (a) Scaled Dot-Product Attention, (b) Multi-Head Attention.
JAI, 2024 x
Positional encodings in transformer architecture were achieved by using sine and cosine
functions of various frequencies:
𝑃𝐸(𝑝𝑜𝑠,2𝑖) = 𝑠𝑖𝑛(𝑝𝑜𝑠/100002𝑖/𝑑𝑚𝑜𝑑𝑒𝑙 )
{ (20)
𝑃𝐸(𝑝𝑜𝑠,2𝑖+1) = 𝑐𝑜𝑠(𝑝𝑜𝑠/100002𝑖/𝑑𝑚𝑜𝑑𝑒𝑙 )
where 𝑝𝑜𝑠 is the position and 𝑖 is the dimension. Every dimension of the positional encoding
has a sinusoidal relationship. The wavelengths range from 2𝜋 𝑡𝑜 10000 · 2𝜋 in a geometric
development. This function was selected because it would make it simple for the model to learn
how to attend to relative positions, since for any fixed offset 𝑘, 𝑃𝐸𝑝𝑜𝑠+𝑘 can be expressed as a
linear function of 𝑃𝐸𝑝𝑜𝑠 .
One of the most renowned deep reinforcement learning models is the Deep Q-learning
Network (DQN) [176], which directly learns policies from high-dimensional inputs using
Convolutional Neural Network (CNN). Other common models in deep reinforcement learning
include Double DQN [177], Dueling DQN [178], and Monte Carlo Tree Search (MCTS) [179].
Deep reinforcement learning (DRL) models find applications in various domains, such as
video game playing [180, 181], robotic manipulation [182, 183], image segmentation [184, 185],
video analysis [186, 187], energy management [188, 189], and more.
Instances-based deep transfer learning involves selecting a subset of instances from the source
domain and assigning appropriate weight values to these selected instances to supplement the
training set in the target domain. Algorithms such as TaskTrAdaBoost [194] and TrAdaBoost.R2
[195] are well-known approaches based on this strategy.
Mapping-based deep transfer learning focuses on mapping instances from both the source and
target domains into a new data space, where instances from the two domains exhibit similarity and
are suitable for training a unified deep neural network. Successful methods based on this approach
include Extend MMD (Maximum Mean Discrepancy) [196], and MK-MMD (Multiple Kernel
variant of MMD) [197].
Network-based (model-based) deep transfer learning involves reusing a segment of a pre-
trained network from the source domain, including its architecture and connection parameters, and
applying it to a deep neural network in the target domain. These model-based approaches are highly
effective for domain adaptation between source and target data by adjusting the network (model),
making them the most widely adopted strategies in deep transfer learning (DTL). Remarkably,
these methods can even adapt target data that is significantly different from the source data [198].
Network-based (model-based) approaches in deep transfer learning typically involve pre-
training, freezing, fine-tuning, and adding new layers. Pre-trained models consist of layers from a
deep learning network (DL model) that have been trained using source data. Two key methods for
training a model with target data are freezing and fine-tuning. These methods involve using some
or all layers of a pre-defined model. When layers are frozen, they retain fixed parameters/weights
from the pre-trained model. In contrast, fine-tuning involves initializing parameters and weights
with pre-trained values instead of starting with random values, either for the entire network or
specific layers [198].
A recent advancement in model-based deep transfer learning is Progressive Neural Networks
(PNNs). This strategy involves the freezing of a pre-trained model and integrating new layers
specifically for training on target data [199]. The concept behind progressive learning is grounded
in the idea that acquiring a new skill necessitates leveraging existing knowledge. This mirrors the
way humans learn new abilities. For instance, a child learns to run by employing all the skills
acquired during crawling and walking. PNN constructs a new model for each task it encounters.
JAI, 2024 x
Each freshly generated model is interconnected with all others, aiming to learn a new task by
applying the knowledge accumulated from preceding models.
Adversarial-based methods focus on gathering transferable features from both the source and
target data by leveraging logical relationships or rules acquired in the source domain. Alternatively,
they may utilize techniques inspired by generative adversarial networks (GANs) [200].
These deep transfer learning techniques have proven to be effective in overcoming the
challenge of limited training data, enabling knowledge transfer across domains, and facilitating
improved performance in various applications such as image classification [201, 202], speech
recognition [203, 204], video analysis [205, 206], signal processing [207, 208], and other.
In transfer learning, several popular pre-trained deep learning models are frequently used,
including Xception [52], MobileNet [53], DenseNet [55], EfficientNet [57], NasNet [209], and
among others. These models are initially trained on large-scale datasets like ImageNet, and their
learned weights are then transferred to a target domain. The architectures of these networks reflect
a broader trend in deep learning design, transitioning from manually crafted by human experts to
automatically optimized patterns. This evolution focuses on striking a balance between model
accuracy and computational complexity [210].
Figure 26. Numerous possible domains for deep learning applications in the real world.
However, each real-world application area has its own specific goals and requires particular
tasks and deep learning techniques. Table 1 provides a summary of various deep learning tasks and
methods applied across multiple real-world application domains.
Table 1: A summary of the practical applications of deep learning models in real-world domains.
Application Setting Tasks Models Reference
Smart Homes & Human Activity Recognition CNN+LSTM [211]
Smart Cities Smart Energy Management Reinforcement learning [212]
Traffic Management GRU based [213]
Waste Management CNN based [214]
Smart Parking System Stacked GRU+LSTM [215]
Education Student Engagement Detection DenseNet self-attention [216]
Student Affective States ConvNeXt + GRU [82]
Recognition
Automatic Attendance System CNN+LSTM [217]
Automated Exam Control CNN based (VGG) [218]
Healthcare Medical Image Analysis Vision transformer [219]
Early Disease Detection InceptionV3 [220]
JAI, 2024 x
The third approach consists of hybrid methods, which combine algorithm-level techniques
with data-level methods in the appropriate way. Hybridization is required to address issues with
algorithm and data-level approaches and improve classification accuracy [285].
10.3 Overfitting
Overfitting occurs when a deep learning model learns the systematic and noise components of
the training data to the point that it adversely affects the model's performance on new data. In fact,
overfitting occurs as a result of noise, the small size of the training set, and the complexity of the
classifiers. Overfitted models tend to memorize all the data, including the inevitable noise in the
training set, rather than understanding the underlying patterns in the data [24]. Overfitting is
addressed with methods including dropout [92], weight decay [286], batch normalization [287,
288], regularization [289], data augmentation, and others, although determining the ideal balance
is still difficult.
them [293].
Several strategies have been proposed to address catastrophic forgetting. One such approach
is Elastic Weight Consolidation (EWC) [294], which penalizes changes to the weights that are
important for previous tasks, thereby preserving learned knowledge while allowing the model to
adapt to new tasks. Incremental Moment Matching (IMM) ) [295] is another technique that merges
models trained on different tasks into a single model, balancing the performance across all tasks.
The iCaRL (incremental Classifier and Representation Learning) [296] method combines
classification with representation learning, enabling the model to learn new classes without
forgetting previously learned ones. Additionally, the Hard Attention to the Task (HAT) [293]
approach employs task-specific masks that prevent interference between tasks, reducing the
likelihood of forgetting.
10.6 Underspecifcation
Underspecification is an emerging challenge in the deployment of machine learning (ML)
models, particularly deep learning (DL) models, in real-world applications. It refers to the
phenomenon where an ML pipeline can produce a multitude of models that all perform well on the
validation set but exhibit unpredictable behavior in deployment. This issue arises because the
pipeline's design does not fully specify which model characteristics are critical for generalization
in real-world scenarios. The underspecification problem is often linked to the high degrees of
freedom inherent in ML pipelines. Factors such as random seed initialization, hyperparameter
selection, and the stochastic nature of training can lead to the creation of models with similar
validation performance but divergent behaviors in production. These differences can manifest as
inconsistent predictions when the model is exposed to new data or deployed in environments
different from the training conditions [297].
Addressing underspecification requires rigorous testing and validation beyond standard
metrics. Stress tests, as proposed by D’Amour et al. [297], are designed to evaluate a model's
robustness under various real-world conditions, identifying potential failure points that may not be
apparent during standard validation. These tests simulate different deployment scenarios, such as
varying input distributions or environmental changes, to assess how the model's predictions might
vary. Moreover, some researches have been conducted to analyze and mitigate underspecification
across different ML tasks [298, 299].
Bidirectional LSTM and Bidirectional GRU models provide an additional advantage by processing
information in both forward and backward directions.
Additionally, we evaluated eight different CNN-based models: VGG, Inception, ResNet,
InceptionResNet, Xception, MobileNet, DenseNet, and NASNet for the classification of fruit
images using the Fruit-360 dataset. Given that image data is not sequential or time-dependent,
recurrent models were not suitable for this task. CNN-based models are particularly effective for
image analysis because of their ability to capture spatial dependencies. Moreover, the faster training
time of CNN models is due to their parallel processing capabilities, which allow for efficient
computation on GPU (Graphics Processing Unit), thereby accelerating the training process.
To evaluate the performance of these models, we employed assessment metrics such as
accuracy, precision, recall, and F1-measure. Accuracy measures the overall correctness of the
model's predictions, while precision evaluates the proportion of correctly predicted positive
instances. Recall assesses the model's ability to correctly identify positive instances, and F1-
measure provides a balanced measure of precision and recall.
𝑇𝑝 + 𝑇𝑛
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (21)
𝑇𝑝 + 𝑇𝑛 + 𝐹𝑝 + 𝐹𝑛
𝑇𝑝
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (22)
𝑇𝑝 + 𝐹𝑝
𝑇𝑝
𝑅𝑒𝑐𝑎𝑙𝑙 = (23)
𝑇𝑝 + 𝐹𝑛
𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 × (24)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Where 𝑇𝑝 = True Positive, 𝑇𝑛 = True Negative, 𝐹𝑝 = False Positive, and 𝐹𝑛 = False
Negative.
By conducting a comprehensive analysis using these metrics, we can gain insights into the
strengths and weaknesses of each deep learning model. This comparative evaluation enables us to
identify the most effective model for specific datasets and applications, ultimately advancing the
field of deep learning and its practical applications.
All experiments were conducted on a GeForce RTX 3050 GPU (Graphics Processing Unit)
with 4 Gigabyte of RAM (Random Access Memory).
Figure 27. The structural for analysis of different deep learning models on IMDB dataset
In this architecture, text data is first passed through an embedding layer, which transforms the
high-dimensional, sparse input into dense, lower-dimensional vectors of real numbers. This allows
the model to capture semantic relationships within the data. In the second layer, one of eight models:
CNN, RNN, LSTM, Bi-LSTM, GRU, Bi-GRU, TCN, or Transformer is employed for feature
extraction and data training. This layer is crucial for capturing patterns and dependencies in the
data. Following this, a dropout layer is included to address the issue of overfitting by randomly
deactivating a portion of the neurons during training, which helps improve the model's
generalization. Subsequently, the multi-dimensional vector turns into a one-dimensional vector
using a flatten layer, enabling it to work with fully connected layers. Finally, the output is passed
through a fully connected (Dense) layer, which uses a Softmax function for classification,
converting the model's predictions into probabilities for each class.
Building a neural network with high accuracy necessitates careful attention to hyperparameter
selection, as these adjustments significantly influence the network's performance. For example,
setting the number of training iterations too high can lead to overfitting, where the model performs
well on the training data but poorly on unseen data. Another critical hyperparameter is the learning
rate, which affects the rate of convergence during training. If the learning rate is too high, the
network may converge too quickly, potentially overshooting the global minimum of the loss
function. Conversely, if the learning rate is too low, the convergence process may become
excessively slow, prolonging training. Therefore, finding the optimal balance of hyperparameters
is essential for maximizing the network's performance and ensuring effective learning.
In the experiment phase, consistent parameters were applied across all models to ensure a
standardized comparison. The parameters were set as follows: epochs = 30, batch size = 64, dropout
= 0.2, with the loss function set to "Binary Crossentropy," and the optimizer function set to
Stochastic Gradient Descent (SGD) with a learning rate of 0.2. For the CNN model, 100 filters
were used with a kernel size of 3, along with the Rectified Linear Unit (ReLU) activation function.
The RNN, LSTM, Bi-LSTM, GRU, and Bi-GRU models each employed 64 units. The TCN model
was configured with 16 filters, a kernel size of 5, and dilation rates of [1, 2, 4, 8]. The Transformer
JAI, 2024 x
model was set up with 2 attention heads, a hidden layer size of 64 in the feed-forward network, and
the ReLU activation function. These parameter settings and architectural choices were designed to
allow for a standardized comparison of the deep learning models on the IMDB dataset. This
standardization facilitates an accurate analysis of each model's performance, enabling a comparison
of their accuracy and loss values.
Table 2 shows the result of different deep learning models on IMDB review dataset based on
various metrics including Accuracy, Precision, Recall, F1-Score, and Time of training.
Figure 28. Accuracy and validation-accuracy of deep learning models on IMDB dataset.
x JAI, 2024
Figure 29. Loss and validation- loss diagrams of deep learning models on IMDB dataset.
Fig. 29 illustrates the loss and validation-loss diagram where the loss diagram is a visual
representation of loss values during the training process for six different models, and the validation-
loss diagram depicts the variation in loss values on the testing set during the evaluation process for
the different models. The loss function measures the discrepancy between the predicted sentiment
labels and the actual labels.
Furthermore, the confusion matrices for the various deep learning models are displayed in Fig.
30. These matrices provide a detailed breakdown of each model's performance, highlighting how
well the models classify different classes. By closely examining these confusion matrices, we can
gain insights into the precision of the models and identify patterns of misclassification for each
class. This analysis helps in understanding the strengths and weaknesses of the models' predictions.
Figure 30. Confusion matrix for different deep learning models on IMDB dataset.
JAI, 2024 x
removed from the dataset. Next, a time-based static sliding window technique is applied for
segmenting sensor events. This method groups sequences of sensor events into intervals of equal
duration. Optimizing the time interval is crucial for effective segmentation; after evaluating
intervals ranging from 30 to 360 seconds, a 90-second interval was determined to be optimal for
the ARAS dataset. The segmentation task aids in decreasing training time and increasing accuracy
for the deep learning models.
Figure 32. The structural for analysis of different deep learning models on the ARAS dataset
After preprocessing, the data is passed through an input layer. In the second layer, one of eight
models: CNN, RNN, LSTM, Bi-LSTM, GRU, Bi-GRU, TCN, or Transformer is employed for
feature extraction and training. This layer plays a vital role in capturing patterns and dependencies
within the data. To mitigate overfitting, a dropout layer follows, which randomly deactivates a
portion of the neurons during training, thereby improving the model's generalization. Subsequently,
a flatten layer is used to convert the multi-dimensional vector into a one-dimensional vector,
making it compatible with fully connected layers. Finally, the output passes through a fully
connected (Dense) layer, which uses a Softmax function for classification, transforming the
model’s predictions into probability distributions across the classes.
In the experimental phase, we split the data from the first resident of house B, allocating 70%
for training and 30% for testing, using a random split. Additionally, 20% of the training data was
set aside for validation. The models were trained with a fixed set of parameters: 30 epochs, a batch
size of 64, a dropout rate of 0.2, the "Categorical Crossentropy" loss function, and the Adam
optimizer. For the CNN model, we used 100 filters with a kernel size of 3 and the rectified linear
unit (ReLU) activation function. The RNN, LSTM, Bi-LSTM, GRU, and Bi-GRU models were
configured with 64 units each. The TCN model was set with 16 filters, a kernel size of 5, and
dilation rates of [1, 2, 4, 8]. The Transformer model utilized 2 attention heads, a hidden layer size
of 64 in the feedforward network, and the ReLU activation function.
Table 3 illustrates the results of experiments on ARAS dataset with various metrices including
Accuracy, Precision, Recall, F1-Score, and Time of training.
JAI, 2024 x
Also, Fig. 33 presents the accuracy diagram and validation-accuracy diagram for the deep
learning models, while Fig. 34 shows the loss diagram and validation-loss diagram for deep
learning models.
Figure 34. Loss and validation- loss diagrams of deep learning models on ARAS dataset
x JAI, 2024
Since we performed preprocessing tasks like data cleaning and segmentation, the data is nearly
normalized and balanced, leading to consistent and closely grouped results across all models.
However, the results indicate that the Transformer and TCN models outperformed the others on
the ARAS dataset. This outcome aligns with the dataset's nature, which comprises spatial and
temporal sequences of sensor events. Among the models, the Transformer exhibited the highest
performance in terms of accuracy, recall, and F1-score, while the Bi-LSTM model excelled in the
precision metric. Moreover, the Transformer model demonstrated a notable advantage in training
time, second only to the CNN model, underscoring its efficiency in processing and learning from
time-series data. Additionally, when examining the accuracy and loss curves, it is evident that the
Transformer, TCN, and CNN models stabilized earlier than the others. Overall, the Transformer
model proved to be the most effective for working with the ARAS dataset, striking a balance
between accuracy, training time, and consistency throughout the training phases, making it the
optimal choice for recognizing human activities based on sensor data.
Figure 35. The structural for analysis of different CNN-based models on Fruit-360 dataset.
JAI, 2024 x
First, the fruit images are passed through an input layer. In the second layer, one of eight
models (VGG, Inception, ResNet, InceptionResNet, Xception, MobileNet, DenseNet, or NASNet)
is employed for feature extraction and training. Next, a Global Average Pooling 2D (GAP) layer is
applied, which significantly reduces the spatial dimensions of the data by collapsing each feature
map into a single value. To combat overfitting, a dropout layer is then introduced, randomly
deactivating a portion of the neurons during training, which enhances the model's ability to
generalize. Finally, the output is passed through a fully connected (Dense) layer, where a Softmax
function is used to classify the fruit images.
The dataset comprises 55,244 images of 81 different fruit classes, each with a resolution of
100 × 100 pixels. For the experiments, a subset of 60 fruit classes was selected, containing 28,484
images for training and 9,558 images for testing. Non-fruit items such as chestnuts and ginger root
were removed from the dataset.
All models were trained with a consistent set of parameters: 20 epochs, a batch size of 512, a
dropout rate of 0.2, the "Categorical Crossentropy" loss function, and the Adam optimizer.
Additionally, all models utilized the “ImageNet” dataset for pre-training.
Table 4 presents the experimental results for various models on the Fruit-360 dataset,
including VGG16, InceptionV3, ResNet50, InceptionResNetV2, Xception, MobileNet,
DenseNet121, and NASNetLarge. The table includes metrics such as Accuracy, Precision, Recall,
F1-Score, and Time of training.
Furthermore, the accuracy, validation-accuracy, loss, and validation-loss diagrams were used
to compare the performance of various models. When assessing the models' performance for tasks
involving the categorization of fruit photos, these graphs offer valuable insights into how
effectively the models are learning from the data. Fig. 36 shows the accuracy and validation-
accuracy diagram of the deep learning models, while Fig. 37 illustrates the loss diagram and
validation-loss diagram of the deep learning models.
Based on the results, it can be concluded that the DenseNet and MobileNet models achieved
the best performance for fruit image classification on the Fruit-360 dataset. Both models
demonstrated high accuracy in classifying fruit images. Notably, MobileNet had a significantly
shorter training time compared to DenseNet, indicating that it was faster to train while still
delivering performance close to that of DenseNet. Additionally, the Xception model also showed
good accuracy and required less training time than DenseNet. Overall, the MobileNet model stands
out as a favorable choice due to its balance between accuracy and training efficiency.
x JAI, 2024
Figure 37. Loss and validation- loss diagrams of different CNN-based deep learning models on
Friut-360 dataset.
particularly noteworthy, as they excel in leveraging unlabeled image data for deep representation
learning and training highly non-linear mappings between latent and data spaces. The GAN
framework offers the flexibility to formulate new theories and methods tailored to emerging deep
learning applications, positioning it as a pivotal area for future exploration.
- Hybrid/Ensemble Modeling: Hybrid deep learning architectures have shown great potential in
enhancing model performance by combining components from multiple models. For instance, the
integration of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)
can capture both temporal and spatial dependencies in data, leveraging the strengths of each model.
Hybrid models also benefit from combining generative and supervised learning, offering superior
performance and improved uncertainty handling in high-risk scenarios. Developing effective
hybrid models, whether supervised or unsupervised, presents a significant research opportunity to
address a wide range of real-world problems, including semi-supervised learning tasks and model
uncertainty. This approach moves beyond conventional, isolated models, emphasizing the need for
sophisticated methods that can handle the complexity of various data types and applications.
- Hyperparameter Optimization for Efficient Deep Learning: As deep learning models have
evolved, the number of parameters, computational latency, and resource requirements have
increased substantially [152]. Selecting the appropriate hyperparameters is critical to building a
neural network with high accuracy. Key hyperparameters include learning rate, loss function, batch
size, number of training iterations, and dropout rate, among others. The challenge lies in finding an
optimal balance of these parameters, as they significantly influence network performance. However,
iterating through all possible combinations of hyperparameters is computationally expensive. To
address this, metaheuristic optimization techniques, such as Genetic Algorithm (GA) [303], Particle
Swarm Optimization (PSO) [304], and others, can be employed to explore the search space more
efficiently than exhaustive methods. Future research should focus on optimizing hyperparameters
tailored to specific data types and contexts. For example, the learning rate plays a crucial role in
training, where a rate too high may cause the model to converge prematurely, while a rate too low
can lead to slow convergence and prolonged training times. Adaptive learning rate techniques, such
as including Adaptive Moment Estimation (Adam) [305], Stochastic Gradient Descent (SGD) [306],
adaptive gradient algorithm (ADAGRAD) [307], and Nesterov-accelerated Adaptive Moment
Estimation (Nadam) [308], and more recent innovations like Evolved Sign Momentum (Lion) [309],
offer promising avenues for improving network performance and minimizing loss functions. Future
research could further explore these optimizers, focusing on their comparative effectiveness in
enhancing model performance through iterative weight and bias adjustments.
- Federated Learning: Federated learning is an emerging deep learning paradigm that enables
collaborative model training across multiple organizations or teams without the need to share raw
data. This approach is particularly relevant in contexts where data privacy is paramount. However,
federated learning introduces new challenges, especially with the advent of data fusion technologies
that combine data from multiple sources with varying formats. As data diversity and volume
continue to grow, optimizing data and model utilization in federated learning becomes increasingly
important. Addressing challenges such as safeguarding user privacy, developing universal models,
and ensuring the stability of data fusion outcomes will be crucial for the future application of
federated learning across multiple domains [310].
- Quantum Deep Learning: Quantum computing and deep learning have both seen significant
advancements over the past few decades. Quantum computing, which leverages the principles of
quantum mechanics to store and process information, has the potential to outperform classical
supercomputers on certain tasks, making it a powerful tool for complex problem-solving. The
intersection of quantum computing and deep learning has led to the emergence of quantum deep
learning and quantum-inspired deep learning algorithms. Future research directions in this area
include investigating and developing quantum deep learning models, such as Quantum
Convolutional Neural Network (Quantum CNN) [311], Quantum Recurrent Neural Network
x JAI, 2024
(Quantum RNN) [312], Quantum Generative Adversarial Network (Quantum GAN) [313], and
others. Additionally, exploring the application of these models across various domains and creating
novel quantum deep learning architectures represents a cutting-edge frontier in the field [314, 315].
In conclusion, the research directions outlined above underscore the dynamic and evolving
nature of deep learning. By addressing these challenges and exploring new avenues, the field can
continue to advance, driving innovation and enabling the development of more powerful and
efficient models for a wide range of applications.
13 Conclusion
This article provides an extensive overview of deep learning technology and its applications
in machine learning and artificial intelligence. The article covers various aspects of deep learning,
including neural networks, MLP models, and different types of deep learning models such as CNN,
RNN, TCN, Transformer, generative models, DRL, and transfer learning. The classification of deep
learning models allows for a better understanding of their specific applications and characteristics.
The RNN models, including LSTM, Bi-LSTM, GRU, and Bi-GRU, are particularly suited for time
series data due to their ability to capture temporal dependencies. On the other hand, CNN-based
models excel in image data analysis by effectively capturing spatial features.
The experiments conducted on three public datasets, namely IMDB, ARAS, and Fruit-360,
further reinforce the suitability of specific deep learning models for different data types. The results
demonstrate that the CNN-based models such as DenseNet and MobileNet perform exceptionally
well in image classification tasks. The RNN models, such as LSTM and GRU, show strong
performance in time series analysis. However, the Transformer model outperforms classical RNN-
based models, particularly in text analysis, due to its use of the attention mechanism.
Overall, this article highlights the diverse applications and effectiveness of deep learning
models in various domains. It emphasizes the importance of selecting the appropriate deep learning
model based on the nature of the data and the task at hand. The insights gained from the experiments
contribute to a better understanding of the strengths and weaknesses of different deep learning
models, facilitating informed decision-making in practical applications.
Acknowledgement: The authors would like to express sincere gratitude to all the individuals who
have contributed to the completion of this research paper. Their unwavering support, valuable
insights, and encouragement have been instrumental in making this endeavor a success.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: Study
conception and design: F. M. Shiri, T. Perumal; data collection: F. M. Shiri; analysis and
interpretation of results: F. M. Shiri, T. Perumal, N. Mustapha, R. Mohamed; draft manuscript
preparation: F. M. Shiri, T. Perumal, N. Mustapha, R. Mohamed. All authors reviewed the results
and approved the final version of the manuscript.
Availability of Data and Materials: The code used and/or analyzed during this research are
available from the corresponding author upon reasonable request. Data used in this study can be
accessed via the following links:
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding
the present study.
References
[1] P. P. Shinde and S. Shah, "A review of machine learning and deep learning applications," in 4th Int.
Conf. Comput. Commun. Ctrl. Autom. (ICCUBEA), Pune, India, 16-18 Aug 2018: IEEE, pp. 1-6, doi:
https://doi.org/10.1109/ICCUBEA.2018.8697857.
[2] C. Janiesch, P. Zschech, and K. Heinrich, "Machine learning and deep learning," Electron. Mark., vol.
31, no. 3, pp. 685-695, 2021, doi: https://doi.org/10.1007/s12525-021-00475-2.
[3] W. Han et al., "A survey of machine learning and deep learning in remote sensing of geological
environment: Challenges, advances, and opportunities," ISPRS J. Photogramm. Remote. Sens., vol. 202,
pp. 87-113, 2023, doi: https://doi.org/10.1016/j.cogr.2023.04.001.
[4] S. Zhang et al., "Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on
Advances," Sens., vol. 22, no. 4, Feb 14 2022, doi: https://doi.org/10.3390/s22041476.
[5] S. Li, Y. Tao, E. Tang, T. Xie, and R. Chen, "A survey of field programmable gate array (FPGA)-based
graph convolutional neural network accelerators: challenges and opportunities," PeerJ Computer
Science, vol. 8, pp. e1166, 2022.
[6] A. Mathew, P. Amudha, and S. Sivakumari, "Deep learning techniques: an overview," in Adv. Mach.
Learn. Technol. App.: AMLTA 2020, 2021, pp. 599-608.
[7] J. Liu and Y. Jin, "A comprehensive survey of robust deep learning in computer vision," J. Autom.
Intell. , 2023, doi: https://doi.org/10.1016/j.jai.2023.10.002.
[8] A. Shrestha and A. Mahmood, "Review of deep learning algorithms and architectures," IEEE access.,
vol. 7, pp. 53040-53065, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2912200.
[9] M. A. Wani, F. A. Bhat, S. Afzal, and A. I. Khan, Advances in deep learning. Singapore: Springer 2020.
[10] L. Alzubaidi et al., "Review of deep learning: Concepts, CNN architectures, challenges, applications,
future directions," J. Big. Data., vol. 8, pp. 1-74, 2021, doi: https://doi.org/10.1186/s40537-021-00444-
8.
[11] I. H. Sarker, "Deep learning: a comprehensive overview on techniques, taxonomy, applications and
research directions," SN Comput. Sci., vol. 2, no. 6, pp. 420, 2021, doi: https://doi.org/10.1007/s42979-
021-00815-1.
[12] M. N. Hasan, T. Ahmed, M. Ashik, M. J. Hasan, T. Azmin, and J. Uddin, "An Analysis of Covid-19
Pandemic Outbreak on Economy using Neural Network and Random Forest," J. Inf. Syst. Telecommun.
(JIST), vol. 2, no. 42, pp. 163, 2023, doi: https://doi.org/10.52547/jist.34246.11.42.163.
[13] N. B. Gaikwad, V. Tiwari, A. Keskar, and N. Shivaprakash, "Efficient FPGA implementation of
multilayer perceptron for real-time human activity classification," IEEE Access., vol. 7, pp. 26696-
26706, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2900084.
[14] K.-C. Ke and M.-S. Huang, "Quality prediction for injection molding by using a multilayer perceptron
neural network," Polym., vol. 12, no. 8, pp. 1812, 2020, doi: https://doi.org/10.3390/polym12081812.
x JAI, 2024
[15] A. Tasdelen and B. Sen, "A hybrid CNN-LSTM model for pre-miRNA classification," Sci. Rep., vol.
11, no. 1, pp. 1-9, 2021, doi: https://doi.org/10.1038/s41598-021-93656-0.
[16] L. Qin, N. Yu, and D. Zhao, "Applying the convolutional neural network deep learning technology to
behavioural recognition in intelligent video," Tehnički vjesnik, vol. 25, no. 2, pp. 528-535, 2018, doi:
https://doi.org/10.17559/TV-20171229024444.
[17] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, "A Survey of Convolutional Neural Networks: Analysis,
Applications, and Prospects," IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999-7019,
Dec 2022, doi: https://doi.org/10.1109/TNNLS.2021.3084827.
[18] B. P. Babu and S. J. Narayanan, "One-vs-All Convolutional Neural Networks for Synthetic Aperture
Radar Target Recognition," Cybern. Inf. Technol, vol. 22, pp. 179-197, 2022, doi:
https://doi.org/10.2478/cait-2022-0035.
[19] S. Mekruksavanich and A. Jitpattanakul, "Deep convolutional neural network with rnns for complex
activity recognition using wrist-worn wearable sensor data," Electro., vol. 10, no. 14, pp. 1685, 2021,
doi: https://doi.org/10.3390/electronics10141685.
[20] W. Lu, J. Li, J. Wang, and L. Qin, "A CNN-BiLSTM-AM method for stock price prediction," Neural
Comput. Appl., vol. 33, pp. 4741-4753, 2021, doi: https://doi.org/10.1007/s00521-020-05532-z.
[21] W. Rawat and Z. Wang, "Deep convolutional neural networks for image classification: A
comprehensive review," Neural Comput., vol. 29, no. 9, pp. 2352-2449, 2017, doi:
https://doi.org/10.1162/NECO_a_00990.
[22] L. Chen, S. Li, Q. Bai, J. Yang, S. Jiang, and Y. Miao, "Review of image classification algorithms based
on convolutional neural networks," Remote Sens., vol. 13, no. 22, pp. 4712, 2021, doi:
https://doi.org/10.3390/rs13224712.
[23] J. Gu et al., "Recent advances in convolutional neural networks," Pattern. Recognit., vol. 77, pp. 354-
377, 2018, doi: https://doi.org/10.1016/j.patcog.2017.10.013.
[24] S. Salman and X. Liu, "Overfitting mechanism and avoidance in deep neural networks," arXiv preprint
arXiv:1901.06566, 2019.
[25] A. Ajit, K. Acharya, and A. Samanta, "A review of convolutional neural networks," in 2020 Int. Conf.
Emerg. Tren. Inf. Technol. Engr. (ic-ETITE). 2020: IEEE, pp. 1-5.
[26] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, "A survey of deep neural network
architectures and their applications," Neurocomputing., vol. 234, pp. 11-26, 2017, doi:
https://doi.org/10.1016/j.neucom.2016.12.038.
[27] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual
recognition," IEEE Trans. Pattern. Anal. Mach. Intell., vol. 37, no. 9, pp. 1904-1916, 2015, doi:
https://doi.org/10.1109/TPAMI.2015.2389824.
[28] D. Yu, H. Wang, P. Chen, and Z. Wei, "Mixed pooling for convolutional neural networks," in Rough.
Sets. Knwl. Technol.: 9th Int. Conf., RSKT Shanghai, China, October 24-26 2014: Springer, pp. 364-
375, doi: https://doi.org/10.1007/978-3-319-11740-9_34.
[29] Y. Gong, L. Wang, R. Guo, and S. Lazebnik, "Multi-scale orderless pooling of deep convolutional
activation features," in Comput. Vis. (ECCV): 13th Europ. Conf., Zurich, Switzerland, September 6-12
2014: Springer, pp. 392-407.
JAI, 2024 x
[30] M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural
networks," arXiv preprint arXiv:1301.3557, 2013.
[31] V. Dumoulin and F. Visin, "A guide to convolution arithmetic for deep learning," arXiv preprint
arXiv:1603.07285, 2016.
[32] M. Krichen, "Convolutional neural networks: A survey," Comput. , vol. 12, no. 8, pp. 151, 2023, doi:
https://doi.org/10.3390/computers12080151.
[33] S. Kılıçarslan, K. Adem, and M. Çelik, "An overview of the activation functions used in deep learning
algorithms," J. New Results Sci., vol. 10, no. 3, pp. 75-88, 2021, doi:
https://doi.org/10.54187/jnrs.1011739.
[34] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, "Activation functions: Comparison of trends
in practice and research for deep learning," arXiv preprint arXiv:1811.03378, 2018.
[35] K. Hara, D. Saito, and H. Shouno, "Analysis of function of rectified linear unit used in deep learning,"
in Int. Jt. Conf. Neural. Netw. (IJCNN), Killarney, Ireland, 2015, pp. 1-8, doi:
https://doi.org/10.1109/IJCNN.2015.7280578.
[36] A. L. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic
models," in Proc. icml, 2013, vol. 30, no. 1: Atlanta, GA, p. 3.
[37] K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance
on imagenet classification," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1026-1034.
[38] B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional
network," arXiv preprint arXiv:1505.00853, 2015.
[39] X. Jin, C. Xu, J. Feng, Y. Wei, J. Xiong, and S. Yan, "Deep learning with s-shaped rectified linear
activation units," in Proceedings of the AAAI Conference on Artificial Intelligence, 2016, vol. 30, no. 1.
[40] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, "Fast and accurate deep network learning by
exponential linear units (elus)," arXiv preprint arXiv:1511.07289, 2015.
[41] D. Hendrycks and K. Gimpel, "Gaussian error linear units (gelus)," arXiv preprint arXiv:1606.08415,
2016.
[42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural
networks," in 25th Int. Conf. Neural Inf. Process. Syst., Lake Tahoe, NV, Dec. 2012, pp. 1097-1105.
[43] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition,"
arXiv preprint arXiv:1409.1556, 2014.
[44] C. Szegedy et al., "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern.
Recognit., 2015, pp. 1-9.
[45] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for
computer vision," in Proc. IEEE Conf. Comput. Vis. Pattern. Recognit., 2016, pp. 2818-2826.
[46] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern. Recognit., 2016, pp. 770-778.
[47] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," in Comput. Vis.
(ECCV): 14th Europ. Conf., Amsterdam, Netherlands, October 11–14 2016: Springer, pp. 630-645.
[48] S. Zagoruyko and N. Komodakis, "Wide residual networks," arXiv preprint arXiv:1605.07146, 2016.
[49] G. Larsson, M. Maire, and G. Shakhnarovich, "Fractalnet: Ultra-deep neural networks without
residuals," arXiv preprint arXiv:1605.07648, 2016.
x JAI, 2024
[66] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber, "A novel
connectionist system for unconstrained handwriting recognition," IEEE Trans. Pattern. Anal. Mach.
Intell., vol. 31, no. 5, pp. 855-868, 2008, doi: https://doi.org/10.1109/TPAMI.2008.137.
[67] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks
on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
[68] J. Chen, D. Jiang, and Y. Zhang, "A hierarchical bidirectional GRU model with attention for EEG-based
emotion classification," IEEE Access., vol. 7, pp. 118530-118540, 2019, doi:
https://doi.org/10.1109/ACCESS.2019.2936817.
[69] M. Fortunato, C. Blundell, and O. Vinyals, "Bayesian recurrent neural networks," arXiv preprint
arXiv:1704.02798, 2017.
[70] F. Kratzert, D. Klotz, C. Brenner, K. Schulz, and M. Herrnegger, "Rainfall–runoff modelling using long
short-term memory (LSTM) networks," Hydrol. Earth Syst. Sci., vol. 22, no. 11, pp. 6005-6022, 2018,
doi: https://doi.org/10.5194/hess-22-6005-2018.
[71] A. Graves, "Generating sequences with recurrent neural networks," arXiv preprint arXiv:1308.0850,
2013.
[72] S. Minaee, E. Azimi, and A. Abdolrashidi, "Deep-sentiment: Sentiment analysis using ensemble of cnn
and bi-lstm models," arXiv preprint arXiv:1904.04206, 2019.
[73] D. Gaur and S. Kumar Dubey, "Development of Activity Recognition Model using LSTM-RNN Deep
Learning Algorithm," J. inf. organ. sci., vol. 46, no. 2, pp. 277-291, 2022, doi:
https://doi.org/10.31341/jios.46.2.1.
[74] X. Zhu, P. Sobihani, and H. Guo, "Long short-term memory over recursive structures," in Int. Conf.
Mach. Learn., 2015: PMLR, pp. 1604-1612.
[75] F. Gu, M.-H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu, "A survey on deep learning for
human activity recognition," ACM Comput. Surv., vol. 54, no. 8, pp. 1-34, 2021, doi:
https://doi.org/10.1145/3472290.
[76] T. H. Aldhyani and H. Alkahtani, "A bidirectional long short-term memory model algorithm for
predicting COVID-19 in gulf countries," Life., vol. 11, no. 11, pp. 1118, 2021, doi:
https://doi.org/10.3390/life11111118.
[77] F. M. Shiri, E. Ahmadi, M. Rezaee, and T. Perumal, "Detection of Student Engagement in E-Learning
Environments Using EfficientnetV2-L Together with RNN-Based Models," J. Artif. Intell. , vol. 6, no.
1, pp. 85--103, 2024, doi: https://doi.org/10.32604/jai.2024.048911.
[78] D. Liciotti, M. Bernardini, L. Romeo, and E. Frontoni, "A sequential deep learning application for
recognising human activities in smart homes," Neurocomputing., vol. 396, pp. 501-513, 2020, doi:
https://doi.org/10.1016/j.neucom.2018.10.104.
[79] A. Dutta, S. Kumar, and M. Basu, "A gated recurrent unit approach to bitcoin price prediction," J. Risk
Financial Manag., vol. 13, no. 2, pp. 23, 2020, doi: https://doi.org/10.3390/jrfm13020023.
[80] A. Gumaei, M. M. Hassan, A. Alelaiwi, and H. Alsalman, "A Hybrid Deep Learning Model for Human
Activity Recognition Using Multimodal Body Sensing Data," IEEE Access., vol. 7, pp. 99152-99160,
2019, doi: https://doi.org/10.1109/access.2019.2927134.
[81] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and
translate," arXiv preprint arXiv:1409.0473, 2014.
x JAI, 2024
[99] K. Xu, L. Chen, and S. Wang, "Kolmogorov-Arnold Networks for Time Series: Bridging Predictive
Power and Interpretability," arXiv preprint arXiv:2406.02496, 2024.
[100] A. A. Aghaei, "fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis
functions," arXiv preprint arXiv:2406.07456, 2024.
[101] Z. Bozorgasl and H. Chen, "Wav-kan: Wavelet kolmogorov-arnold networks," arXiv preprint
arXiv:2405.12832, 2024.
[102] F. Zhang and X. Zhang, "GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov
Arnold Networks," arXiv preprint arXiv:2406.13597, 2024.
[103] A. Jabbar, X. Li, and B. Omar, "A survey on generative adversarial networks: Variants,
applications, and training," ACM Comput. Surv., vol. 54, no. 8, pp. 1-49, 2021, doi:
https://doi.org/10.1145/3463475.
[104] D. Bank, N. Koenigstein, and R. Giryes, "Autoencoders," in Mach. learn. data Sci. Handb. :Data
Mining. Knwl. Discov. Handb. Cham: Springer, 2023, pp. 353-374.
[105] I. Goodfellow et al., "Generative adversarial nets," Adv. Neural. Inf. Process. Syst., vol. 27, pp.
2672–2680, 2014.
[106] N. Zhang, S. Ding, J. Zhang, and Y. Xue, "An overview on restricted Boltzmann machines,"
Neurocomputing., vol. 275, pp. 1186-1199, 2018, doi: https://doi.org/10.1016/j.neucom.2017.09.065.
[107] G. E. Hinton, "Deep belief networks," Scholarpedia, vol. 4, no. 5, pp. 5947, 2009, doi:
https://doi.org/10.4249/scholarpedia.5947.
[108] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. Cambridge MIT press, 2016.
[109] J. Zhai, S. Zhang, J. Chen, and Q. He, "Autoencoder and its various variants," in 2018 IEEE Int.
Conf. Syst. Man. Cybern. (SMC), Miyazaki, Japan, 7-10 Oct 2018: IEEE, pp. 415-419, doi:
https://doi.org/10.1109/SMC.2018.00080.
[110] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, "Adversarial autoencoders," arXiv
preprint arXiv:1511.05644, 2015.
[111] Y. Wang, H. Yao, and S. Zhao, "Auto-encoder based dimensionality reduction,"
NEUROCOMPUTING, vol. 184, pp. 232-242, 2016, doi: https://doi.org/10.1016/j.neucom.2015.08.104.
[112] Y. N. Kunang, S. Nurmaini, D. Stiawan, and A. Zarkasi, "Automatic features extraction using
autoencoder in intrusion detection system," in 2018 Int. Conf. Electr. engr. Compu. Sci. (ICECOS),
Pangkal, Indonesia, 2-4 Oct 2018: IEEE, pp. 219-224, doi:
https://doi.org/10.1109/ICECOS.2018.8605181.
[113] C. Zhou and R. C. Paffenroth, "Anomaly detection with robust deep autoencoders," in Proc. 23rd
ACM SIGKDD Int. Conf. Knwl. Discov. Data Mining., 2017, pp. 665-674, doi:
https://doi.org/10.1145/3097983.3098052.
[114] A. Creswell and A. A. Bharath, "Denoising adversarial autoencoders," IEEE Trans. Neural Netw.
Learn. Syst., vol. 30, no. 4, pp. 968-984, 2018, doi: https://doi.org/10.1109/TNNLS.2018.2852738.
[115] D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint
arXiv:1312.6114, 2013.
[116] A. Ng, "Sparse autoencoder," CS294A Lecture notes, vol. 72, no. 2011, pp. 1-19, 2011.
[117] S. Rifai et al., "Higher order contractive auto-encoder," in Mach. Learn. Knwl. Discov. DB.: Europ.
Conf. ECML PKDD, Athens, Greece, September 5-9 2011: Springer, pp. 645-660.
x JAI, 2024
[118] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust
features with denoising autoencoders," in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 1096-1103,
doi: https://doi.org/10.1145/1390156.1390294.
[119] D. P. Kingma and M. Welling, "An introduction to variational autoencoders," Foundations and
Trends® in Machine Learning, vol. 12, no. 4, pp. 307-392, 2019.
[120] M.-Y. Liu and O. Tuzel, "Coupled generative adversarial networks," in 30th Int. Conf. Neural Inf.
Process. Syst., Dec. 2016, pp. 469-477.
[121] C. Wang, C. Xu, X. Yao, and D. Tao, "Evolutionary generative adversarial networks," IEEE Trans.
Evol. Comput., vol. 23, no. 6, pp. 921-934, 2019, doi: https://doi.org/10.1109/TEVC.2019.2895748.
[122] A. Aggarwal, M. Mittal, and G. Battineni, "Generative adversarial network: An overview of theory
and applications," Int. J. Inf. Manag. Data Insights., vol. 1, no. 1, pp. 100004, 2021, doi:
https://doi.org/10.1016/j.jjimei.2020.100004.
[123] B.-C. Chen and A. Kae, "Toward realistic image compositing with adversarial learning," in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2019, pp. 8415-8424.
[124] D. P. Jaiswal, S. Kumar, and Y. Badr, "Towards an artificial intelligence aided design approach:
application to anime faces with generative adversarial networks," Procedia Comput. Sci., vol. 168, pp.
57-64, 2020, doi: https://doi.org/10.1016/j.procs.2020.02.257.
[125] Y. Liu, Q. Li, and Z. Sun, "Attribute-aware face aging with wavelet-based generative adversarial
networks," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2019, pp. 11877-11886.
[126] J. Islam and Y. Zhang, "GAN-based synthetic brain PET image generation," Brain Inform., vol. 7,
pp. 1-12, 2020, doi: https://doi.org/10.1186/s40708-020-00104-2.
[127] H. Lan, A. D. N. Initiative, A. W. Toga, and F. Sepehrband, "SC-GAN: 3D self-attention
conditional GAN with spectral normalization for multi-modal neuroimaging synthesis," BioRxiv, pp.
2020.06. 09.143297, 2020, doi: https://doi.org/10.1101/2020.06.09.143297.
[128] K. A. Zhang, A. Cuesta-Infante, L. Xu, and K. Veeramachaneni, "SteganoGAN: High capacity
image steganography with GANs," arXiv preprint arXiv:1901.03892, 2019.
[129] S. Nam, Y. Kim, and S. J. Kim, "Text-adaptive generative adversarial networks: manipulating
images with natural language," in 32nd Int. Conf. Neural Inf. Process. Syst., Dec. 2018, pp. 42-51.
[130] L. Sixt, B. Wild, and T. Landgraf, "Rendergan: Generating realistic labeled data," Front. Robot.
AI. , vol. 5, pp. 66, 2018, doi: https://doi.org/10.3389/frobt.2018.00066.
[131] K. Lin, D. Li, X. He, Z. Zhang, and M.-T. Sun, "Adversarial ranking for language generation," in
31st Int. Conf. Neural Inf. Process. Syst., Dec. 2017, pp. 3158 - 3168.
[132] D. Xu, C. Wei, P. Peng, Q. Xuan, and H. Guo, "GE-GAN: A novel deep learning framework for
road traffic state estimation," Transp. Res. Part C Emerg., vol. 117, pp. 102635, 2020, doi:
https://doi.org/10.1016/j.trc.2020.102635.
[133] A. Clark, J. Donahue, and K. Simonyan, "Adversarial video generation on complex datasets,"
arXiv preprint arXiv:1907.06571, 2019.
[134] E. L. Denton, S. Chintala, and R. Fergus, "Deep generative image models using a laplacian
pyramid of adversarial networks," in 28st Int. Conf. Neural Inf. Process. Syst., Dec. 2015, pp. 1486 -
1494.
JAI, 2024 x
[135] C. Li and M. Wand, "Precomputed real-time texture synthesis with markovian generative
adversarial networks," in Comput. Vis. (ECCV): 14th Europ. Conf., Amsterdam, Netherlands, October
11-14 2016: Springer, pp. 702-716, doi: https://doi.org/10.1007/978-3-319-46487-9_43.
[136] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, "Unrolled generative adversarial networks,"
arXiv preprint arXiv:1611.02163, 2016.
[137] M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein generative adversarial networks," in Int.
Conf. Mach. Learn., 2017: PMLR, pp. 214-223.
[138] D. Berthelot, T. Schumm, and L. Metz, "Began: Boundary equilibrium generative adversarial
networks," arXiv preprint arXiv:1703.10717, 2017.
[139] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-
consistent adversarial networks," in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2223-2232.
[140] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, "Learning to discover cross-domain relations with
generative adversarial networks," in Int. Conf. Mach. Learn., 2017: PMLR, pp. 1857-1865.
[141] A. Jolicoeur-Martineau, "The relativistic discriminator: a key element missing from standard
GAN," arXiv preprint arXiv:1807.00734, 2018.
[142] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial
networks," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2019, pp. 4401-4410.
[143] G. Zhao, M. E. Meyerand, and R. M. Birn, "Bayesian conditional GAN for MRI brain image
synthesis," arXiv preprint arXiv:2005.11875, 2020.
[144] K. Chen, D. Zhang, L. Yao, B. Guo, Z. Yu, and Y. Liu, "Deep learning for sensor-based human
activity recognition: Overview, challenges, and opportunities," ACM Comput. Surv., vol. 54, no. 4, pp.
1-40, 2021, doi: https://doi.org/10.1145/3447744.
[145] N. Alqahtani et al., "Deep belief networks (DBN) with IoT-based alzheimer’s disease detection
and classification," Appl. Sci., vol. 13, no. 13, pp. 7833, 2023, doi: https://doi.org/10.3390/app13137833.
[146] A. P. Kale, R. M. Wahul, A. D. Patange, R. Soman, and W. Ostachowicz, "Development of Deep
belief network for tool faults recognition," Sens., vol. 23, no. 4, pp. 1872, 2023, doi:
https://doi.org/10.3390/s23041872.
[147] E. Sansano, R. Montoliu, and O. Belmonte Fernandez, "A study of deep neural networks for human
activity recognition," Comput. Intell., vol. 36, no. 3, pp. 1113-1139, 2020, doi:
https://doi.org/10.1111/coin.12318.
[148] A. Vaswani et al., "Attention is all you need," in 31st int. Conf. Neural Inf. Process. Syst., Long
Beach, CA, USA, Dec. 2017, pp. 5998–6008.
[149] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450,
2016.
[150] K. Gavrilyuk, R. Sanford, M. Javan, and C. G. Snoek, "Actor-transformers for group activity
recognition," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2020, pp. 839-848.
[151] Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, "Efficient transformers: A survey," ACM Comput.
Surv., vol. 55, no. 6, pp. 1-28, 2022, doi: https://doi.org/10.24963/ijcai.2023/764.
[152] G. Menghani, "Efficient deep learning: A survey on making deep learning models smaller, faster,
and better," ACM Comput. Surv., vol. 55, no. 12, pp. 1-37, 2023, doi: https://doi.org/10.1145/3578938.
x JAI, 2024
[153] Y. Liu and L. Wu, "Intrusion Detection Model Based on Improved Transformer," Applied Sciences,
vol. 13, no. 10, pp. 6251, 2023.
[154] D. Chen, S. Yongchareon, E. M. K. Lai, J. Yu, Q. Z. Sheng, and Y. Li, "Transformer With
Bidirectional GRU for Nonintrusive, Sensor-Based Activity Recognition in a Multiresident
Environment," IEEE Internet Things J., vol. 9, no. 23, pp. 23716-23727, 2022, doi:
https://doi.org/10.1109/jiot.2022.3190307.
[155] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional
transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[156] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding
by generative pre-training," 2018.
[157] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are
unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, pp. 9, 2019.
[158] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, "Transformer-xl:
Attentive language models beyond a fixed-length context," arXiv preprint arXiv:1901.02860, 2019.
[159] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized
autoregressive pretraining for language understanding," in 33rd Conf. Neural Inf. Process. Syst.,
Vancouver, Canada, Dec. 2019, pp. 5754–5764.
[160] N. Shazeer, "Fast transformer decoding: One write-head is all you need," arXiv preprint
arXiv:1911.02150, 2019.
[161] Y.-H. H. Tsai, S. Bai, P. P. Liang, J. Z. Kolter, L.-P. Morency, and R. Salakhutdinov, "Multimodal
transformer for unaligned multimodal language sequences," in Proc. Conf. Assoc. Comput. Linguist.
Mtg., 2019, vol. 2019: NIH Public Access, p. 6558.
[162] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at
scale," arXiv preprint arXiv:2010.11929, 2020.
[163] W. Wang et al., "Pyramid vision transformer: A versatile backbone for dense prediction without
convolutions," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2021, pp. 568-578.
[164] Z. Liu et al., "Swin transformer: Hierarchical vision transformer using shifted windows," in Proc.
IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 10012-10022.
[165] L. Yuan et al., "Tokens-to-token vit: Training vision transformers from scratch on imagenet," in
Proc. IEEE/CVF Int. Conf. Comput. Vis. , 2021, pp. 558-567.
[166] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, "Transformer in transformer," Adv. Neural.
Inf. Process. Syst., vol. 34, pp. 15908-15919, 2021.
[167] K. Han, J. Guo, Y. Tang, and Y. Wang, "Pyramidtnt: Improved transformer-in-transformer
baselines with pyramid architecture," arXiv preprint arXiv:2201.00978, 2022.
[168] W. Fedus, B. Zoph, and N. Shazeer, "Switch transformers: Scaling to trillion parameter models
with simple and efficient sparsity," J. Mach. Learn. Res. , vol. 23, no. 120, pp. 1-39, 2022.
[169] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-
11986.
[170] J. Zhang et al., "Eatformer: Improving vision transformer inspired by evolutionary algorithm," Int.
J. Comput. Vis., pp. 1-28, 2024, doi: https://doi.org/10.1007/s11263-024-02034-6.
JAI, 2024 x
[171] N. Vithayathil Varghese and Q. H. Mahmoud, "A survey of multi-task deep reinforcement
learning," Electron., vol. 9, no. 9, pp. 1363, 2020, doi: https://doi.org/10.3390/electronics9091363.
[172] N. Le, V. S. Rathour, K. Yamazaki, K. Luu, and M. Savvides, "Deep reinforcement learning in
computer vision: a comprehensive survey," Artif. Intell. Rev., pp. 1-87, 2022, doi:
https://doi.org/10.1007/s10462-021-10061-9.
[173] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. Hoboken:
John Wiley & Sons, 2014.
[174] Z. Zhang, D. Zhang, and R. C. Qiu, "Deep reinforcement learning for power system applications:
An overview," CSEE J. Power Energy Syst., vol. 6, no. 1, pp. 213-225, 2019, doi:
https://doi.org/10.17775/CSEEJPES.2019.00920.
[175] S. E. Li, "Deep reinforcement learning," in Reinforcement learning for sequential decision and
optimal control. Singapore: Springer, 2023, pp. 365-402.
[176] V. Mnih et al., "Human-level control through deep reinforcement learning," NATURE, vol. 518,
no. 7540, pp. 529-533, 2015, doi: https://doi.org/10.1038/nature14236.
[177] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in
Proc. AAAI Conf. Artif. Intell., 2016, vol. 30, no. 1, doi: https://doi.org/10.1609/aaai.v30i1.10295.
[178] Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, "Dueling network
architectures for deep reinforcement learning," in Int. Conf. Mach. Learn., 2016: PMLR, pp. 1995-
2003.
[179] R. Coulom, "Efficient selectivity and backup operators in Monte-Carlo tree search," in Comput.
Gam.: 5th Int. Conf., Turin, Italy, 2007: Springer, pp. 72-83.
[180] N. Justesen, P. Bontrager, J. Togelius, and S. Risi, "Deep learning for video game playing," IEEE
Trans. Games., vol. 12, no. 1, pp. 1-20, 2019, doi: https://doi.org/10.1109/TG.2019.2896986.
[181] K. Souchleris, G. K. Sidiropoulos, and G. A. Papakostas, "Reinforcement learning in game
industry—Review, prospects and challenges," Appl. Sci., vol. 13, no. 4, pp. 2443, 2023, doi:
https://doi.org/10.3390/app13042443.
[182] S. Gu, E. Holly, T. Lillicrap, and S. Levine, "Deep reinforcement learning for robotic manipulation
with asynchronous off-policy updates," in 2017 IEEE Int. Conf. robot. autom. (ICRA), 2017: IEEE, pp.
3389-3396.
[183] D. Han, B. Mulyana, V. Stankovic, and S. Cheng, "A survey on deep reinforcement learning
algorithms for robotic manipulation," Sens., vol. 23, no. 7, pp. 3762, 2023, doi:
https://doi.org/10.3390/s23073762.
[184] K. M. Lee, H. Myeong, and G. Song, "SeedNet: Automatic Seed Generation with Deep
Reinforcement Learning for Robust Interactive Segmentation," in IEEE/CVF Conf. Comput. Vis.
Pattern. Recognit. (CVPR), Salt Lake City, UT, USA, 18-23 June 2018: IEEE Computer Society, pp.
1760-1768, doi: https://doi.org/10.1109/cvpr.2018.00189.
[185] H. Allioui et al., "A multi-agent deep reinforcement learning approach for enhancement of
COVID-19 CT image segmentation," J. Pers. Med., vol. 12, no. 2, pp. 309, 2022, doi:
https://doi.org/10.3390/jpm12020309.
x JAI, 2024
[186] F. Sahba, "Deep reinforcement learning for object segmentation in video sequences," in 2016 Int.
Conf. Comput. Sci. Comput. Intell. (CSCI), Las Vegas, NV, USA, 15-17 Dec 2016: IEEE, pp. 857-860,
doi: https://doi.org/10.1109/CSCI.2016.0166.
[187] H. Liu et al., "Learning to identify critical states for reinforcement learning from videos," in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2023, pp. 1955-1965.
[188] A. Shojaeighadikolaei, A. Ghasemi, A. G. Bardas, R. Ahmadi, and M. Hashemi, "Weather-Aware
Data-Driven Microgrid Energy Management Using Deep Reinforcement Learning," in 2021 North.
American. Power. Symp. (NAPS), College Station, TX, USA, 14-16 Nov 2021: IEEE, pp. 1-6, doi:
https://doi.org/10.1109/NAPS52732.2021.9654550.
[189] B. Zhang, W. Hu, A. M. Ghias, X. Xu, and Z. Chen, "Multi-agent deep reinforcement learning
based distributed control architecture for interconnected multi-energy microgrid energy management
and optimization," Energy Conv. Manag., vol. 277, pp. 116647, 2023, doi:
https://doi.org/10.1016/j.enconman.2022.116647.
[190] M. Long, H. Zhu, J. Wang, and M. I. Jordan, "Deep transfer learning with joint adaptation
networks," in Int. Conf. Mach. Learn., 2017: PMLR, pp. 2208-2217.
[191] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, "A survey on deep transfer learning," in
Artif. Neural NET. Mach. Learn. ICANN 2018: 27th Int. Conf. Artif. Neural NET., Rhodes, Greece,
October 4-7 2018: Springer, pp. 270-279, doi: https://doi.org/10.1007/978-3-030-01424-7_27.
[192] F. Zhuang et al., "A comprehensive survey on transfer learning," P IEEE, vol. 109, no. 1, pp. 43-
76, 2020.
[193] M. K. Rusia and D. K. Singh, "A Color-Texture-Based Deep Neural Network Technique to Detect
Face Spoofing Attacks," Cybern. Inf. Technol., vol. 22, no. 3, pp. 127-145, 2022, doi:
https://doi.org/10.2478/cait-2022-0032.
[194] Y. Yao and G. Doretto, "Boosting for transfer learning with multiple sources," in 2010 IEEE
Comput. Conf. Comput. socy. Vis. Pattern. Recognit., San Francisco, CA, USA, 13-18 June 2010: IEEE,
pp. 1855-1862, doi: https://doi.org/10.1109/CVPR.2010.5539857.
[195] D. Pardoe and P. Stone, "Boosting for regression transfer," in Proc. 27th Int. Conf. Mach. Learn.,
2010, pp. 863-870.
[196] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, "Deep domain confusion: Maximizing
for domain invariance," arXiv preprint arXiv:1412.3474, 2014.
[197] M. Long, Y. Cao, J. Wang, and M. Jordan, "Learning transferable features with deep adaptation
networks," in Int. Conf. Mach. Learn., 2015: PMLR, pp. 97-105.
[198] M. Iman, H. R. Arabnia, and K. Rasheed, "A review of deep transfer learning and recent
advancements," Technol., vol. 11, no. 2, pp. 40, 2023, doi:
https://doi.org/10.3390/technologies11020040.
[199] A. A. Rusu et al., "Progressive neural networks," arXiv preprint arXiv:1606.04671, 2016.
[200] Y. Guo, J. Zhang, B. Sun, and Y. Wang, "Adversarial Deep Transfer Learning in Fault Diagnosis:
Progress, Challenges, and Future Prospects," Sens., vol. 23, no. 16, pp. 7263, 2023, doi:
https://doi.org/10.3390/s23167263.
[201] Y. Gulzar, "Fruit image classification model based on MobileNetV2 with deep transfer learning
technique," Sustain., vol. 15, no. 3, pp. 1906, 2023, doi: https://doi.org/10.3390/su15031906.
JAI, 2024 x
[202] N. Kumar, M. Gupta, D. Gupta, and S. Tiwari, "Novel deep transfer learning model for COVID-
19 patient detection using X-ray chest images," J. Ambient Intell. Humaniz. Comput., vol. 14, no. 1, pp.
469-478, 2023, doi: https://doi.org/10.1007/s12652-021-03306-6.
[203] H. Kheddar, Y. Himeur, S. Al-Maadeed, A. Amira, and F. Bensaali, "Deep transfer learning for
automatic speech recognition: Towards better generalization," Knowl.-Based Syst., vol. 277, pp. 110851,
2023, doi: https://doi.org/10.1016/j.knosys.2023.110851.
[204] L. Yuan, T. Wang, G. Ferraro, H. Suominen, and M.-A. Rizoiu, "Transfer learning for hate speech
detection in social media," Journal of Computational Social Science, vol. 6, no. 2, pp. 1081-1101, 2023.
[205] A. Ray, M. H. Kolekar, R. Balasubramanian, and A. Hafiane, "Transfer learning enhanced vision-
based human activity recognition: A decade-long analysis," Int. J. Inf. Manag. Data Insights. , vol. 3,
no. 1, pp. 100142, 2023, doi: https://doi.org/10.1016/j.jjimei.2022.100142.
[206] T. Kujani and V. D. Kumar, "Head movements for behavior recognition from real time video based
on deep learning ConvNet transfer learning," J. Ambient Intell. Humaniz. Comput., vol. 14, no. 6, pp.
7047-7061, 2023, doi: https://doi.org/10.1007/s12652-021-03558-2.
[207] A. Maity, A. Pathak, and G. Saha, "Transfer learning based heart valve disease classification from
Phonocardiogram signal," Biomed. Signal Process. Control. , vol. 85, pp. 104805, 2023, doi:
https://doi.org/10.1016/j.bspc.2023.104805.
[208] K. Rezaee, S. Savarkar, X. Yu, and J. Zhang, "A hybrid deep transfer learning-based approach for
Parkinson's disease classification in surface electromyography signals," Biomed. Signal Process.
Control., vol. 71, pp. 103161, 2022, doi: https://doi.org/10.1016/j.bspc.2021.103161.
[209] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable
image recognition," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2018, pp. 8697-8710.
[210] Y. Zhang et al., "Deep learning in food category recognition," Inf. Fusion., vol. 98, pp. 101859,
2023, doi: https://doi.org/10.1016/j.inffus.2023.101859.
[211] E. Ramanujam and T. Perumal, "MLMO-HSM: Multi-label Multi-output Hybrid Sequential
Model for multi-resident smart home activity recognition," J. Ambient Intell. Humaniz. Comput., vol.
14, no. 3, pp. 2313-2325, 2023, doi: https://doi.org/10.1007/s12652-022-04487-4.
[212] M. Ren, X. Liu, Z. Yang, J. Zhang, Y. Guo, and Y. Jia, "A novel forecasting based scheduling
method for household energy management system based on deep reinforcement learning," Sustain.
Cities Soc., vol. 76, pp. 103207, 2022, doi: https://doi.org/10.1016/j.scs.2021.103207.
[213] S. M. Abdullah et al., "Optimizing traffic flow in smart cities: Soft GRU-based recurrent neural
networks for enhanced congestion prediction using deep learning," Sustain., vol. 15, no. 7, pp. 5949,
2023, doi: https://doi.org/10.3390/su15075949.
[214] M. I. B. Ahmed et al., "Deep learning approach to recyclable products classification: Towards
sustainable waste management," Sustain., vol. 15, no. 14, pp. 11138, 2023, doi:
https://doi.org/10.3390/su151411138.
[215] C. Zeng, C. Ma, K. Wang, and Z. Cui, "Parking occupancy prediction method based on multi
factors and stacked GRU-LSTM," IEEE Access., vol. 10, pp. 47361-47370, 2022, doi:
https://doi.org/10.1109/ACCESS.2022.3171330.
x JAI, 2024
[216] N. K. Mehta, S. S. Prasad, S. Saurav, R. Saini, and S. Singh, "Three-dimensional DenseNet self-
attention neural network for automatic detection of student’s engagement," Appl. Intell., vol. 52, no. 12,
pp. 13803-13823, 2022, doi: https://doi.org/10.1007/s10489-022-03200-4.
[217] A. K. Shukla, A. Shukla, and R. Singh, "Automatic attendance system based on CNN–LSTM and
face recognition," Int. J. Inf. Technol., vol. 16, no. 3, pp. 1293-1301, 2024, doi:
https://doi.org/10.1007/s41870-023-01495-1.
[218] B. Rajalakshmi, V. K. Dandu, S. L. Tallapalli, and H. Karanwal, "ACE: Automated Exam Control
and E-Proctoring System Using Deep Face Recognition," in 2023 Int. Conf. Circuit. Power. Comput.
Technol. (ICCPCT), Kollam, India, 10-11 Aug 2023: IEEE, pp. 301-306, doi:
https://doi.org/10.1109/ICCPCT58313.2023.10245126.
[219] I. Pacal, "MaxCerVixT: A novel lightweight vision transformer-based Approach for precise
cervical cancer detection," Knowl.-Based Syst. , vol. 289, pp. 111482, 2024, doi:
https://doi.org/10.1016/j.knosys.2024.111482.
[220] M. M. Rana et al., "A robust and clinically applicable deep learning model for early detection of
Alzheimer's," IET Image Process., vol. 17, no. 14, pp. 3959-3975, 2023, doi:
https://doi.org/10.1049/ipr2.12910.
[221] S. Vimal, Y. H. Robinson, S. Kadry, H. V. Long, and Y. Nam, "IoT based smart health monitoring
with CNN using edge computing," J. Internet Technol., vol. 22, no. 1, pp. 173-185, 2021, doi:
https://doi.org/10.3966/160792642021012201017.
[222] T. S. Johnson et al., "Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep
transfer learning framework for prioritizing cells in relation to disease," Genome Med., vol. 14, no. 1,
pp. 11, 2022, doi: https://doi.org/10.1186/s13073-022-01012-2.
[223] W. Zheng, S. Lu, Z. Cai, R. Wang, L. Wang, and L. Yin, "PAL-BERT: an improved question
answering model," Comput. Model. engr. Sci., pp. 1-10, 2023, doi:
https://doi.org/10.32604/cmes.2023.046692.
[224] F. Wang et al., "TEDT: transformer-based encoding–decoding translation network for multimodal
sentiment analysis," Cogn. Comput., vol. 15, no. 1, pp. 289-303, 2023, doi:
https://doi.org/10.1007/s12559-022-10073-9.
[225] M. Nafees Muneera and P. Sriramya, "An enhanced optimized abstractive text summarization
traditional approach employing multi-layered attentional stacked LSTM with the attention RNN," in
Comput. Vis. Mach. Intell. Paradigm., 2023: Springer, pp. 303-318, doi: https://doi.org/10.1007/978-
981-19-7169-3_28.
[226] M. A. Uddin, M. S. Uddin Chowdury, M. U. Khandaker, N. Tamam, and A. Sulieman, "The
Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition," Comput. Mater.
Contin., vol. 74, no. 1, 2023, doi: https://doi.org/10.32604/cmc.2023.031177.
[227] M. De Silva and D. Brown, "Multispectral Plant Disease Detection with Vision Transformer–
Convolutional Neural Network Hybrid Approaches," Sens., vol. 23, no. 20, pp. 8531, 2023, doi:
https://doi.org/10.3390/s23208531.
[228] T. Akilan and K. Baalamurugan, "Automated weather forecasting and field monitoring using
GRU-CNN model along with IoT to support precision agriculture," Expert Syst. Appl, vol. 249, pp.
123468, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123468.
JAI, 2024 x
[229] R. Benameur, A. Dahane, B. Kechar, and A. E. H. Benyamina, "An Innovative Smart and
Sustainable Low-Cost Irrigation System for Anomaly Detection Using Deep Learning," Sens., vol. 24,
no. 4, pp. 1162, 2024, doi: https://doi.org/10.3390/s24041162.
[230] M. Hosseinpour-Zarnaq, M. Omid, F. Sarmadian, and H. Ghasemi-Mobtaker, "A CNN model for
predicting soil properties using VIS–NIR spectral data," Environ. Earth. Sci., vol. 82, no. 16, pp. 382,
2023, doi: https://doi.org/10.1007/s12665-023-11073-0.
[231] M. Shakeel, K. Itoyama, K. Nishida, and K. Nakadai, "Detecting earthquakes: a novel deep
learning-based approach for effective disaster response," Appl. Intell., vol. 51, no. 11, pp. 8305-8315,
2021, doi: https://doi.org/10.1007/s10489-021-02285-7.
[232] Y. Zhang, Z. Zhou, J. Van Griensven Thé, S. X. Yang, and B. Gharabaghi, "Flood Forecasting
Using Hybrid LSTM and GRU Models with Lag Time Preprocessing," Water, vol. 15, no. 22, pp. 3982,
2023, doi: https://doi.org/10.3390/w15223982.
[233] H. Xu and H. Wu, "Accurate tsunami wave prediction using long short-term memory based neural
networks," Ocean Model., vol. 186, pp. 102259, 2023, doi:
https://doi.org/10.1016/j.ocemod.2023.102259.
[234] J. Yao, B. Zhang, C. Li, D. Hong, and J. Chanussot, "Extended vision transformer (ExViT) for
land use and land cover classification: A multimodal deep learning framework," IEEE Trans. Geosci.
Remote Sens., vol. 61, pp. 1-15, 2023, doi: https://doi.org/10.1109/TGRS.2023.3284671.
[235] A. Y. Cho, S.-e. Park, D.-j. Kim, J. Kim, C. Li, and J. Song, "Burned area mapping using
Unitemporal Planetscope imagery with a deep learning based approach," IEEE J. Sel. Top. Appl. Earth
Obs. Remote Sens., vol. 16, pp. 242-253, 2022, doi: https://doi.org/10.1109/JSTARS.2022.3225070.
[236] M. Alshehri, A. Ouadou, and G. J. Scott, "Deep Transformer-based Network Deforestation
Detection in the Brazilian Amazon Using Sentinel-2 Imagery," IEEE Geosci. Remote Sens. Lett., 2024,
doi: https://doi.org/10.1109/LGRS.2024.3355104.
[237] V. Hnamte and J. Hussain, "DCNNBiLSTM: An efficient hybrid deep learning-based intrusion
detection system," Telemat. Inform. Rep., vol. 10, pp. 100053, 2023, doi:
https://doi.org/10.1016/j.teler.2023.100053.
[238] E. S. Alomari et al., "Malware detection using deep learning and correlation-based feature
selection," Symmetry., vol. 15, no. 1, pp. 123, 2023, doi: https://doi.org/10.3390/sym15010123.
[239] Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, "A deep
learning-based phishing detection system using CNN, LSTM, and LSTM-CNN," Electron., vol. 12, no.
1, pp. 232, 2023, doi: https://doi.org/10.3390/electronics12010232.
[240] H. Fanai and H. Abbasimehr, "A novel combined approach based on deep Autoencoder and deep
classifiers for credit card fraud detection," Expert Syst. Appl., vol. 217, pp. 119562, 2023, doi:
https://doi.org/10.1016/j.eswa.2023.119562.
[241] R. A. Joshi and N. Sambre, "Personalized CNN Architecture for Advanced Multi-Modal Biometric
Authentication," in 2024 Int. Conf. Invent. Comput. Technol. (ICICT), Lalitpur, Nepal, 24-26 April 2024:
IEEE, pp. 890-894, doi: https://doi.org/10.1109/ICICT60155.2024.10544987.
[242] J. Sohafi-Bonab, M. H. Aghdam, and K. Majidzadeh, "DCARS: Deep context-aware
recommendation system based on session latent context," Appl. Soft Comput., vol. 143, pp. 110416,
2023, doi: https://doi.org/10.1016/j.asoc.2023.110416.
x JAI, 2024
[243] J. Duan, P.-F. Zhang, R. Qiu, and Z. Huang, "Long short-term enhanced memory for sequential
recommendation," World. Wide. Web., vol. 26, no. 2, pp. 561-583, 2023, doi:
https://doi.org/10.1007/s11280-022-01056-9.
[244] P. Mondal, D. Chakder, S. Raj, S. Saha, and N. Onoe, "Graph convolutional neural network for
multimodal movie recommendation," in Proc. 38th ACM/SIGAPP Symp. Appl. Comput., 2023, pp.
1633-1640, doi: https://doi.org/10.1145/3555776.3577853.
[245] Z. Liu, "Prediction Model of E-commerce Users' Purchase Behavior Based on Deep Learning,"
Front. Bus. Econ. Manag., vol. 15, no. 2, pp. 147-149, 2024, doi: https://doi.org/10.54097/p22ags78.
[246] S. Deng, R. Li, Y. Jin, and H. He, "CNN-based feature cross and classifier for loan default
prediction," in 2020 Int. Conf. Image. video. Process. Artif. Intell., 2020, vol. 11584: SPIE, pp. 368-
373.
[247] C. Han and X. Fu, "Challenge and opportunity: deep learning-based stock price prediction by using
Bi-directional LSTM model," Front. Bus. Econ. Manag., vol. 8, no. 2, pp. 51-54, 2023, doi:
https://doi.org/10.54097/fbem.v8i2.6616.
[248] Y. Cao, C. Li, Y. Peng, and H. Ru, "MCS-YOLO: A multiscale object detection method for
autonomous driving road environment recognition," IEEE Access., vol. 11, pp. 22342-22354, 2023, doi:
https://doi.org/10.1109/ACCESS.2023.3252021.
[249] D. K. Jain, X. Zhao, G. González-Almagro, C. Gan, and K. Kotecha, "Multimodal pedestrian
detection using metaheuristics with deep convolutional neural network in crowded scenes," Inf. Fusion.,
vol. 95, pp. 401-414, 2023, doi: https://doi.org/10.1016/j.inffus.2023.02.014.
[250] S. Sindhu and M. Saravanan, "An optimised extreme learning machine (OELM) for simultaneous
localisation and mapping in autonomous vehicles," Int. J. Syst. Syst. Eng., vol. 13, no. 2, pp. 140-159,
2023, doi: https://doi.org/10.1504/IJSSE.2023.131231.
[251] G. Singal, H. Singhal, R. Kushwaha, V. Veeramsetty, T. Badal, and S. Lamba, "RoadWay: lane
detection for autonomous driving vehicles via deep learning," Multimed. Tools Appl., vol. 82, no. 4, pp.
4965-4978, 2023, doi: https://doi.org/10.1007/s11042-022-12171-0.
[252] H. Shang, C. Sun, J. Liu, X. Chen, and R. Yan, "Defect-aware transformer network for intelligent
visual surface defect detection," Adv. Eng. Inform., vol. 55, pp. 101882, 2023, doi:
https://doi.org/10.1016/j.aei.2023.101882.
[253] T. Zonta, C. A. Da Costa, F. A. Zeiser, G. de Oliveira Ramos, R. Kunst, and R. da Rosa Righi, "A
predictive maintenance model for optimizing production schedule using deep neural networks," J.
Manuf. Syst., vol. 62, pp. 450-462, 2022, doi: https://doi.org/10.1016/j.jmsy.2021.12.013.
[254] Z. He, K.-P. Tran, S. Thomassey, X. Zeng, J. Xu, and C. Yi, "A deep reinforcement learning based
multi-criteria decision support system for optimizing textile chemical process," Comput. Ind., vol. 125,
pp. 103373, 2021, doi: https://doi.org/10.1016/j.compind.2020.103373.
[255] M. Pacella and G. Papadia, "Evaluation of deep learning with long short-term memory networks
for time series forecasting in supply chain management," PROC CIRP, vol. 99, pp. 604-609, 2021, doi:
https://doi.org/10.1016/j.procir.2021.03.081.
[256] P. Shukla, H. Kumar, and G. C. Nandi, "Robotic grasp manipulation using evolutionary computing
and deep reinforcement learning," Intell. Serv. Robot., vol. 14, no. 1, pp. 61-77, 2021, doi:
https://doi.org/10.1007/s11370-020-00342-7.
JAI, 2024 x
[257] K. Kamali, I. A. Bonev, and C. Desrosiers, "Real-time motion planning for robotic teleoperation
using dynamic-goal deep reinforcement learning," in 2020 17th Conf. Comput. Robot. Vis. (CRV), 13-
15 May 2020: IEEE, pp. 182-189, doi: https://doi.org/10.1109/CRV50864.2020.00032.
[258] J. Zhang, H. Liu, Q. Chang, L. Wang, and R. X. Gao, "Recurrent neural network for motion
trajectory prediction in human-robot collaborative assembly," CIRP annals, vol. 69, no. 1, pp. 9-12,
2020, doi: https://doi.org/10.1016/j.cirp.2020.04.077.
[259] B. K. Iwana and S. Uchida, "An empirical survey of data augmentation for time series
classification with neural networks," PLOS ONE, vol. 16, no. 7, pp. e0254841, 2021, doi:
https://doi.org/10.1371/journal.pone.0254841.
[260] C. Khosla and B. S. Saini, "Enhancing performance of deep learning models with different data
augmentation techniques: A survey," in 2020 Int. Conf. Intell. engr. Mgmt. (ICIEM), London, UK, 17-
19 June 2020: IEEE, pp. 79-85, doi: https://doi.org/10.1109/ICIEM48762.2020.9160048.
[261] M. Paschali, W. Simson, A. G. Roy, R. Göbl, C. Wachinger, and N. Navab, "Manifold exploring
data augmentation with geometric transformations for increased performance and robustness," in Inf.
Process. Medical. Image.: 26th Int. Conf., IPMI 2019, Hong Kong, China, June 2–7 2019: Springer, pp.
517-529.
[262] H. Guo, Y. Mao, and R. Zhang, "Augmenting data with mixup for sentence classification: An
empirical study," arXiv preprint arXiv:1905.08941, 2019.
[263] O. O. Abayomi-Alli, R. Damaševičius, A. Qazi, M. Adedoyin-Olowe, and S. Misra, "Data
augmentation and deep learning methods in sound classification: A systematic review," Electro., vol.
11, no. 22, pp. 3795, 2022, doi: https://doi.org/10.3390/electronics11223795.
[264] T.-H. Cheung and D.-Y. Yeung, "Modals: Modality-agnostic automated data augmentation in the
latent space," in Int. Conf. Learn. Represen., 2020.
[265] C. Shorten, T. M. Khoshgoftaar, and B. Furht, "Text data augmentation for deep learning," J. Big
Data, vol. 8, no. 1, pp. 101, 2021, doi: https://doi.org/10.1186/s40537-021-00492-0.
[266] F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, "Learning from simulation: An end-to-end deep-
learning approach for computational ghost imaging," Opt. Express, vol. 27, no. 18, pp. 25560-25572,
2019, doi: https://doi.org/10.1364/OE.27.025560.
[267] K. Ghosh, C. Bellinger, R. Corizzo, P. Branco, B. Krawczyk, and N. Japkowicz, "The class
imbalance problem in deep learning," Mach. Learn., vol. 113, no. 7, pp. 4845-4901, 2024, doi:
https://doi.org/10.1007/s10994-022-06268-8.
[268] D. Singh, E. Merdivan, J. Kropf, and A. Holzinger, "Class imbalance in multi-resident activity
recognition: an evaluative study on explainability of deep learning approaches," Univers. Access. Inf.
Soc., pp. 1-19, 2024, doi: https://doi.org/10.1007/s10209-024-01123-0.
[269] A. S. Tarawneh, A. B. Hassanat, G. A. Altarawneh, and A. Almuhaimeed, "Stop oversampling for
class imbalance learning: A review," IEEE ACCESS, vol. 10, pp. 47643-47660, 2022, doi:
https://doi.org/10.1109/ACCESS.2022.3169512.
[270] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority
over-sampling technique," J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002, doi:
https://doi.org/10.1613/jair.953.
x JAI, 2024
[271] H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: a new over-sampling method in
imbalanced data sets learning," in Int. Conf. Intell. Comput., 2005: Springer, pp. 878-887.
[272] H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for
imbalanced learning," in 2008 Int. Jt. Conf. Neural. Netw., 2008: IEEE, pp. 1322-1328.
[273] Y. Tang, Y.-Q. Zhang, N. V. Chawla, and S. Krasser, "SVMs modeling for highly imbalanced
classification," IEEE Trans. Syst. Man. Cybern., Part B (Cybernetics), vol. 39, no. 1, pp. 281-288, 2008,
doi: https://doi.org/10.1109/TSMCB.2008.2002909.
[274] S. Barua, M. M. Islam, X. Yao, and K. Murase, "MWMOTE--majority weighted minority
oversampling technique for imbalanced data set learning," IEEE Trans. Knowl. Data Eng., vol. 26, no.
2, pp. 405-425, 2012, doi: https://doi.org/10.2478/cait-2022-0035.
[275] C. Bellinger, S. Sharma, N. Japkowicz, and O. R. Zaïane, "Framework for extreme imbalance
classification: SWIM—sampling with the majority class," Knowl. Inf. Syst., vol. 62, pp. 841-866, 2020,
doi: https://doi.org/10.1007/s10115-019-01380-z.
[276] R. Das, S. K. Biswas, D. Devi, and B. Sarma, "An oversampling technique by integrating reverse
nearest neighbor in SMOTE: Reverse-SMOTE," in 2020 Int. Conf. Smart. Electron. Commun.
(ICOSEC), 2020: IEEE, pp. 1239-1244.
[277] C. Liu et al., "Constrained oversampling: An oversampling approach to reduce noise generation
in imbalanced datasets with class overlapping," IEEE ACCESS, vol. 10, pp. 91452-91465, 2020, doi:
https://doi.org/10.1109/ACCESS.2020.3018911.
[278] A. S. Tarawneh, A. B. Hassanat, K. Almohammadi, D. Chetverikov, and C. Bellinger, "Smotefuna:
Synthetic minority over-sampling technique based on furthest neighbour algorithm," IEEE ACCESS,
vol. 8, pp. 59069-59082, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2983003.
[279] X.-Y. Liu, J. Wu, and Z.-H. Zhou, "Exploratory undersampling for class-imbalance learning,"
IEEE Trans. Syst. Man. Cybern., Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, 2008, doi:
https://doi.org/10.1109/TSMCB.2008.2007853.
[280] M. A. Tahir, J. Kittler, and F. Yan, "Inverse random under sampling for class imbalance problem
and its application to multi-label classification," Pattern. Recognit., vol. 45, no. 10, pp. 3738-3750, 2012,
doi: https://doi.org/10.1016/j.patcog.2012.03.014.
[281] V. Babar and R. Ade, "A novel approach for handling imbalanced data in medical diagnosis using
undersampling technique," Commun. Appl. Electron., vol. 5, no. 7, pp. 36-42, 2016.
[282] Z. H. Zhou and X. Y. Liu, "On multi‐class cost‐sensitive learning," Comput. Intell., vol. 26, no.
3, pp. 232-257, 2010, doi: https://doi.org/10.1111/j.1467-8640.2010.00358.x.
[283] C. X. Ling and V. S. Sheng, "Cost-sensitive learning and the class imbalance problem," ency.
Mach. Learn., vol. 2011, pp. 231-235, 2008.
[284] N. Seliya, A. Abdollah Zadeh, and T. M. Khoshgoftaar, "A literature review on one-class
classification and its potential applications in big data," J. Big Data, vol. 8, pp. 1-31, 2021, doi:
https://doi.org/10.1186/s40537-021-00514-x.
[285] V. S. Spelmen and R. Porkodi, "A review on handling imbalanced data," in Int. Conf. Curr. Trend.
Toward. Converg. Technol. (ICCTCT), Coimbatore, India, 1-3 March 2018: IEEE, pp. 1-11, doi:
https://doi.org/10.1109/ICCTCT.2018.8551020.
JAI, 2024 x
[286] G. Zhang, C. Wang, B. Xu, and R. Grosse, "Three mechanisms of weight decay regularization,"
arXiv preprint arXiv:1810.12281, 2018.
[287] C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, "Batch normalized recurrent neural
networks," in 2016 IEEE Int. Conf. Acoust. Speech. Signal. Process. (ICASSP), Shanghai, China, 20-25
March 2016: IEEE, pp. 2657-2661, doi: https://doi.org/10.1109/ICASSP.2016.7472159.
[288] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing
internal covariate shift," in Int. Conf. Mach. Learn., 2015: pmlr, pp. 448-456.
[289] G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, "Regularizing neural networks by
penalizing confident output distributions," arXiv preprint arXiv:1701.06548, 2017.
[290] G. E. Dahl, T. N. Sainath, and G. E. Hinton, "Improving deep neural networks for LVCSR using
rectified linear units and dropout," in IEEE Int. Conf. Acoust. Speech. Signal. Process., 2013: IEEE, pp.
8609-8613.
[291] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural
networks," in Proc. 13 Int. Conf. Artif. Intell. Stats., 2010: JMLR Workshop and Conference
Proceedings, pp. 249-256.
[292] G. Srivastava, S. Vashisth, I. Dhall, and S. Saraswat, "Behavior analysis of a deep feedforward
neural network by varying the weight initialization methods," in Smart Innov. Commun. Comput. Sci.:
Proc. ICSICCS 2020, 2021: Springer, pp. 167-175, doi: https://doi.org/10.1007/978-981-15-5345-5_15.
[293] J. Serra, D. Suris, M. Miron, and A. Karatzoglou, "Overcoming catastrophic forgetting with hard
attention to the task," in Int. Conf. Mach. Learn., 2018: PMLR, pp. 4548-4557.
[294] J. Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks," Proc. Natl. Acad.
Sci., vol. 114, no. 13, pp. 3521-3526, 2017, doi: https://doi.org/10.1073/pnas.1611835114.
[295] S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang, "Overcoming catastrophic forgetting by
incremental moment matching," in 31st int. Conf. Neural Inf. Process. Syst., Dec. 2017, pp. 4655-4665.
[296] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, "icarl: Incremental classifier and
representation learning," in Proc. IEEEConf. Comput. Vis. Pattern. Recognit., 2017, pp. 2001-2010.
[297] A. D'Amour et al., "Underspecification presents challenges for credibility in modern machine
learning," J. Mach. Learn. Res., vol. 23, no. 226, pp. 1-61, 2022.
[298] D. Teney, M. Peyrard, and E. Abbasnejad, "Predicting is not understanding: Recognizing and
addressing underspecification in machine learning," in Europ. Conf. Comput. Vis., 2022: Springer, pp.
458-476.
[299] N. Chotisarn, W. Pimanmassuriya, and S. Gulyanon, "Deep learning visualization for
underspecification analysis in product design matching model development," IEEE ACCESS, vol. 9, pp.
108049-108061, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3102174.
[300] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for
sentiment analysis," in Proc. 49th Annual. Meeting. Assoc. Comput. Linguist.: Hum. langu. Tech.,
Portland Oregon, June 19 - 2 2011, pp. 142-150.
[301] H. Alemdar, H. Ertan, O. D. Incel, and C. Ersoy, "ARAS human activity datasets in multiple homes
with multiple residents," in 2013 7th Int. Conf. Perv. Comput. Technol. Healthcare. Workshop., 2013:
IEEE, pp. 232-235.
x JAI, 2024
[302] H. Mureşan and M. Oltean, "Fruit recognition from images using deep learning," arXiv preprint
arXiv:1712.00580, 2017.
[303] X. Xiao, M. Yan, S. Basodi, C. Ji, and Y. Pan, "Efficient hyperparameter optimization in deep
learning using a variable length genetic algorithm," arXiv preprint arXiv:2006.12703, 2020.
[304] H. J. Escalante, M. Montes, and L. E. Sucar, "Particle swarm model selection," J. Mach. Learn.
Res., vol. 10, no. 2, pp. 405–440, 2009.
[305] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint
arXiv:1412.6980, 2014.
[306] L. Bottou, "Stochastic gradient descent tricks," in Neural Networks: Tricks of the Trade: Second
Edition. Berlin, Heidelberg: Springer, 2012, pp. 421-436.
[307] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and
stochastic optimization," J. Mach. Learn. Res., vol. 12, no. 7, pp. 2121–2159, 2011.
[308] T. Dozat, "Incorporating nesterov momentum into adam," in Proc. 4th Int. Conf. Learn. Represent.
(ICLR) Workshop Track., San Juan, Puerto Rico, 2016, pp. 1-4.
[309] X. Chen et al., "Symbolic discovery of optimization algorithms," in 37st int. Conf. Neural Inf.
Process. Syst., Dec. 2024, pp. 49205-49233.
[310] L. Alzubaidi et al., "A survey on deep learning tools dealing with data scarcity: definitions,
challenges, solutions, tips, and applications," J. Big. Data., vol. 10, no. 1, pp. 46, 2023, doi:
https://doi.org/10.1186/s40537-023-00727-2.
[311] I. Cong, S. Choi, and M. D. Lukin, "Quantum convolutional neural networks," Nat. Phys, vol. 15,
no. 12, pp. 1273-1278, 2019, doi: https://doi.org/10.1038/s41567-019-0648-8.
[312] Y. Takaki, K. Mitarai, M. Negoro, K. Fujii, and M. Kitagawa, "Learning temporal data with a
variational quantum recurrent neural network," Phys. Rev. A, vol. 103, no. 5, pp. 052414, 2021, doi:
https://doi.org/10.1103/PhysRevA.103.052414.
[313] S. Lloyd and C. Weedbrook, "Quantum generative adversarial learning," Phys. Rev. Lett., vol. 121,
no. 4, pp. 040502, 2018, doi: https://doi.org/10.1103/PhysRevLett.121.040502.
[314] S. Garg and G. Ramakrishnan, "Advances in quantum deep learning: An overview," arXiv preprint
arXiv:2005.04316, 2020.
[315] F. Valdez and P. Melin, "A review on quantum computing and deep learning algorithms and their
applications," Soft Comput., vol. 27, no. 18, pp. 13217-13236, 2023, doi:
https://doi.org/10.1007/s00500-022-07037-4.