A Metaheuristics-Based Hyperparameter Optimization
A Metaheuristics-Based Hyperparameter Optimization
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
A Metaheuristics-Based Hyperparameter
Optimization Approach to Beamforming Design
KIEU-XUAN THUC , (MEMBER, IEEE), HOANG MANH KHA , (MEMBER, IEEE),
NGUYEN VAN CUONG , AND TONG VAN LUYEN , (MEMBER, IEEE)
Hanoi University of Industry, Hanoi 100000, Vietnam
ABSTRACT The paradigm shift from “connected things” to “connected intelligence” is anticipated to be
made possible by the sixth-generation wireless systems, which typically use millimeter wave beamforming
to mitigate the significant propagation loss. However, beamforming design in millimeter wave
communications poses many different challenges owing to the large antenna arrays with the limitation of
radio frequency chains and analog beamforming architectures. To circumvent this problem, deep learning
models have recently been utilized as a disruptive method for solving difficult optimization problems in sixth-
generation mobile systems, such as maximizing spectral efficiency. However, it is still unclear how to
produce high-performance deep learning models which require considering appropriate hyperparameters.
This study proposes a metaheuristics-based approach for optimizing hyperparameters that are used to build
optimized deep learning models to maximize spectral efficiency. The research results demonstrate that the
proposed approach-based models establish higher spectral efficiency than the state-of-the-art approach-based
models and the reference model whose hyperparameters are based on empirical trials.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
optimization algorithms. Deep learning (DL) has recently global optimization due to the fact that they typically treat
been utilized as a disruptive method to solve difficult the problem as a black box and are therefore flexible and
optimization problems in 6G and to support a number of easy to implement. In addition, these optimizers have no
artificial intelligence services and the Internet of Everything stringent mathematical criteria (e.g., differentiability,
applications [4]. It has also been proven to be a useful tool smoothness), making them acceptable for problems with
for dealing with difficult non-convex problems and high- various features, such as discontinuities and nonlinearity [8].
computability concerns owing to its excellent recognition A metaheuristic is considered a potential solution to
and representation capabilities [5]. To enable a paradigm optimization problems if it can strike a tradeoff between
shift from traditional optimization theory-based approaches exploration (diversification) and exploitation
for employing more promising DL architectures, DL-based (intensification). Exploitation is required to find regions of
optimization algorithm design aims to achieve near-optimal the search space that contain solutions of high quality.
performance with excellent computing efficiency for Exploration is necessary to intensify the search in some
challenging large-scale optimization problems in 6G systems prospective regions based on gathered search knowledge [9],
[4]. In particular, superior performance, scalability and [10]. Metaheuristics are aimed at obtaining acceptable
generalizability, computational efficiency, and robustness solutions in a realistic running time and providing practical
are some benefits of using DL for large-scale optimization. solutions to a variety of problems [11], [12]. Metaheuristics
Hyperparameters, however, allow the performance of the have also gained appeal over exact methods for addressing
DL approach to be greatly tuned. The values of these optimization issues due to the ease and robustness of the
parameters must be carefully chosen in order to get the best solutions they give in a range of sectors, including
performance because they typically have a significant impact engineering, business, transportation, and even the social
on the learner’s complexity, behavior, speed, and other sciences. The metaheuristic community has also conducted
aspects. Human trial-and-error selection of these values is substantial research, which includes the development of
time-consuming, prone to error, frequently biased, and novel methods, applications, and performance evaluations
computationally impossible to reproduce unreproducible. As [13], [14].
the mathematical formulation of hyperparameter It can be seen that DL and metaheuristics both provide
optimization (HPO) is basically black-box optimization with their own distinct advantages, but what is missing from the
higher-dimensional spaces, it is preferable to transfer this past studies is a comprehensive approach to utilizing DL in
task to suitable algorithms in order to improve efficiency and the context of beamforming design. Our study contributes to
guarantee reproduction [6]. Over the past 20 to 30 years, finding solutions for beamforming design based on the
numerous HPO strategies have been developed to facilitate combination of metaheuristics and DL in a manner that
and automate the search for hyperparameter combinations facilitates synergy between the two approaches. Specifically,
with optimal performance. However, more advanced HPO we propose an HPO approach utilizing metaheuristics for
techniques are not utilized as frequently as they could (or designing beamforming in mmWave communication
should) be. This may be due to a combination of the systems. By applying this approach, obtained
following reasons [6]: (i) a lack of understanding of HPO hyperparameters can be used to build DL models with high
techniques by prospective users, who could consider them as performance. The proposed approach-based model has
complicated “black boxes”; (ii) low belief among proved to outperform the state-of-art approach-based model
prospective users in the superiority of HPO processes over [15] and the reference model in [16] with respect to spectral
rudimentary methods, resulting in doubt over the anticipated efficiency, convergence characteristics, and computational
return on investment (time); (iii) the absence of guidance on time.
the selection and configuration of pertinent HPO approaches The structure of this study is as follows. Section II
to the issue at hand; (iv) difficulty accurately defining the discusses related studies on HPO and DL-based
search space of HPO approaches. The primary objective of beamforming design for mmWave systems. Section III sheds
HPO is to automate the process of searching light on DL-based beamforming design in mmWave
hyperparameters and enable users to utilize optimized DL communication systems. Section IV formulates the HPO
models for real-world problems. A DL model’s optimal problem based on metaheuristics and introduces an
model architecture is expected to be attained using an HPO algorithm for optimizing hyperparameters. Results and
procedure. To effectively utilize HPO approaches, it is comparative analysis are shown in Section V, and the
essential to choose an appropriate optimization strategy to discussion is presented in Section VI in Section VII.
identify optimal hyperparameters. Numerous HPO problems
are non-convex or non-differentiable optimization problems. II. RELATED WORK
Therefore, traditional optimization approaches dealing with Recently, some HPO techniques have been developed with
these HPO problems might lead to a local solution rather than their own merits and demerits. Grid search (GS) is a
a global solution [7]. straightforward approach, but it suffers from the
Though traditional optimization algorithms can be dimensionality curse and takes a long time [17], [18]. In
effective for the local search, metaheuristic algorithms, also comparison to GS, random search (RS) is more effective and
known as metaheuristics, have significant advantages for supports all kinds of hyperparameters. In real-world
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
applications, RS evaluation of the hyperparameter values productive applications namely designing beamforming for
chosen at random enables analysts to search a wide area. weighted sum-rate maximization [22], predicting the optimal
However, as RS does not take the outcomes of earlier tests transmit/receive beam pairs by utilizing DL models as the
into account, it may include numerous pointless evaluations, role of hybrid precoding [23], using an autoencoder DL
which reduces its effectiveness [7], [18]. The iterative model to improve hybrid precoding [24], leveraging deep
Bayesian optimization (BO) algorithm is a well-liked reinforcement learning for beamforming [25]. A technique
solution to HPO problems. In contrast to GS and RS, BO based on convolutional neural networks for joint antenna
bases the next hyperparameter value on the outcomes of prior selection and beamforming is proposed [26]. Works [16],
evaluations in order to cut down on pointless assessments [27] have shown that in comparison to conventional
and increase efficiency. As a result, BO needs fewer approaches, DL approaches are computationally more
iterations to find the ideal set of hyperparameters than GS efficient in their search for optimum beamformers and
and RS. However, it is challenging to parallelize BO models tolerant of imperfect channel inputs. Based on compressive
since they operate sequentially to balance the search for channel data learned by deep auto-encoders, the work [23]
unexplored areas and the utilization of currently tested has designed beamformer vectors. BSs that collect the
regions [7]. Although GS, RS, and BO are frequently used to mobile user’s omni-beampatterns for codebook-based
configure hyperparameters, they are unworkable when the beamforming have been taken into account for the DL-based
complexity of the problem and the number of parameters are wideband beamforming in [28]. Moreover, in the case of
high. Both Hyperband and RS offer simultaneous assuming perfect channel covariance matrix knowledge at
executions, but Hyperband can be considered an enhanced the transmitter, DL-based statistical hybrid beamforming is
form of RS. Hyperband is more effective than RS, especially studied in [29]. mmWave multiple-input multiple-output
when time and resources are at a premium. It balances model systems can considerably benefit from the application of DL
performance with resource utilization. GS, RS, BO, and approaches to their essential components, as evidenced by
Hyperband treat each hyperparameter independently and do these works. However, hyperparameters in these DL models
not take into account hyperparameter correlations. This is a are all determined experimentally or not based on any
significant limitation for any of these approaches. They will principles at all. Therefore, HPO approaches for DL models
therefore be ineffective in logistic regression, support vector in mmWave communication problems are imperative.
machines, and density-based spatial clustering of noisy
applications, which are all DL algorithms [7]. III. DL-BASED BEAMFORMING
In addition, to automate the search for DLs’ designs and A. SYSTEM MODEL
settings, researchers have also presented new studies based The downlink of narrowband multiple-input single-output
on metaheuristic optimization techniques. The differential mmWave systems using analog beamforming architectures
evolution approach was used in the work [17] to give a in Fig. 1 is studied, in which base stations with a single RF
framework for automating the search for long short-term chain and N t antennas transmit one data stream to a user
memory hyperparameters, such as the number of hidden equipped with a single antenna [16]. Let s represent the
neurons and batch size. The experimental findings
symbol with normalized average symbol energy throughout
demonstrated that the system’s average accuracy, which was
transmission. The symbol is multiplied by a scalar digital
based on an optimized long short-term memory network
precoder D ( D is a scalar because there is only one RF
using differential evolution and particle swarm optimization
algorithms, improved dramatically over time. Besides, the chain) before being multiplied by an N t 1 analog precoder
work [19] trained DL by adjusting its parameters for the vector ( v RF ) that is used by phase shifters. The final signal
vehicle logo recognition system. The learning rate, the
after precoding is x = v RFD s .
number of filters, and the size of the filters, in each
convolutional layer, were all optimized hyperparameters. The received signal through the mmWave channel is
They claimed that when compared to existing manual feature denoted as r = h†channel vRFD s + n , where n is the additive
extraction techniques, the DL framework optimized by white gaussian noise satisfying the circularly-symmetric
particle swarm optimization obtained more accuracy. A complex normal with zero mean and covariance 2 , h†channel
hyper-heuristic parameter optimization approach was
is mmWave channel vector between the base station and the
proposed in the work [20] for configuring deep belief
user, and † denotes Hermitian transpose. With one line-of-
network parameters. On the MNIST, CalTech 101
Silhouettes, and Semeion datasets, this approach was sight path and L −1 non-line-of-sight paths, the widely
contrasted with various metaheuristic algorithms such as employed Saleh-Valenzuela mmWave channel is expressed
particle swarm optimization. In almost all datasets, the as [30]:
hyper-heuristic parameter optimization had the lowest test
a ( ),
L
Nt
mean square error. h†channel = †
t t (1)
In the context of the mmWave communication systems, L =1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
where represents the complex gain of the th path, t FIGURE 2. The illustration of offline and online stages
is the azimuth angle of departure of the th path, and for DL-based beamformer [16].
a (t
†
t ) is the antenna array response vector at the base h channel
station. The term with = 1 means the line-of-sight path in est
FC Layer 1 (256)
FC Layer 2 (128)
FC Layer 3 (64)
Loss
The optimization objective function of the problem is h channel_est v RF
ReLu
ReLu
considered the spectral efficiency that is widely utilized in
current beamforming design works. This function is given as
[16]:
† 2
FIGURE 3. The architecture of the reference DL model.
R = log 2 1 + hchannel v RF , (2)
Nt
based beamformer with est = . By minimizing a loss
where represents the Signal-to-Noise Ratio (SNR). The
function, the beamformer then can generate optimized
beamformer aims to generate the optimized analog beamforming vectors v RF . As the SNR values and channel
beamforming vectors v RF so that the spectral efficiency is
samples are produced randomly by the simulation (called
maximized. Then, the beamforming optimization problem generated channels in this study), they can be used directly
with the constant modulus constraint of v RF can be given by in the loss computation. By utilizing the estimated channels
[16]: as the input and generated channels in the loss function, the
beamformer can be trained to figure out how to obtain as
† 2 close to the ideal spectral efficiency with the estimated
minimize − log 2 1 + hchannel v RF
Nt channels as possible and become robust to channel
v RF (3) estimation errors. During the online deployment phase, the
v RF nt
2
subject to = 1, for nt = 1,…,Nt . base station uses the same mmWave channel estimator. The
estimated channels are then inputted to the trained
As the SNR is often regarded as being more correctly beamformer to obtain optimized beamforming vectors for
measured than the channel, the SNR and the estimated maximizing spectral efficiency. It is important to note that
generated channels are only necessary during the offline
SNR est are assumed to be equal, i.e., est = . training stage to compute the loss. This is because all the
B. DL-BASED BEAMFORMING DESIGN parameters of the trained beamformer have already been
In this study, we take the DL-based beamformer designed in fixed, and the trained beamformer is ready to accept practical
[16] as the reference one to verify our proposed approach. mmWave channels as inputs to directly output beamforming
This beamformer consists of two stages, which are illustrated vectors. Multiple offline training samples are necessary to
in Fig. 2, directly output v RF to solve (3). During the offline ensure the generalizability of DL models, so 1e5 samples to
training stage, random channel samples are generated using train and 5e3 samples to test are used in this study.
via simulation on the system model. The base station then The architecture of the reference model for the
applies a practical channel estimator to achieve partial beamformer in [16] in the offline stage consists of six main
channel state information. The mmWave channel estimator layers which are demonstrated in Fig. 3. The inputs are the
in [30] is adopted, where the mmWave channels are generated channels h channel , the estimated SNRs est (random
estimated by sending pilot symbols in a hierarchical in the training stage), and the estimated channels hchannel_est ,
codebook and then receiving the user’s decision feedback
where complex-valued hchannel_est with the size of Nt = 64 is
based on the signal received rp . The estimated channel
separated into real and imaginary parts, and then these parts
h†channel_est and the estimated SNR est are inputs for the DL- are concatenated into a vector with the size of 128. The
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
Algorithm 1 The proposed algorithm for HPO. where and int ( ) denotes rounding to the nearest number
1: Determine: Datasets; the search space H ; and converting to integer numbers, respectively. Next, the
performance metrics; fitness function Fitness ; hyperparameter vector h can be obtained by mapping k
number of populations ( numPop ); number of into H . At this point, it can build DL models with h , then
train models and test models to find the current best
iterations, and dimension of solutions.
hyperparameter vector based on performance metrics.
2: Initialize populations and then obtain h from
Finding the best hyperparameters: The search operation
solutions in initialized populations; train and test
of BBA is implemented. For the p -th bat with
DL models with h ; and find the current best
solution (the current h * ). p = 1,2, , numPop , the frequency Qp and the velocity
3: Repeat V iter
at the iter -th iteration are updated as follows:
p
4: Adjust frequency and update velocities; compute
transfer function; and then update positions. Qp = Qmin + ( Qmax − Qmin ) rand , (9)
5: if rand > pulse rate then
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
convergence ability is demonstrated. Finally, the proposed time tradeoff [32]. AdaMax, Adam, RMSProp, and Nadam
approach-based model is compared to the reference model are the most efficient and widely used optimization
and the Hyperband approach-based model in terms of algorithms in DL [33]-[35]. A large learning rate helps the
maximizing spectral efficiency. In all figures, reference, model to learn quicker at the expense of arriving at a
Hyperband, and proposal refer to the reference model, the suboptimal final set of weights. A smaller learning rate may
Hyperband approach-based model, and the proposed enable the model to acquire a more optimum or even globally
approach-based model, respectively. optimal set of weights, but it may require much more time to
train [36]. The learning rate range to be taken into
A. PARAMETER SETUP
This study focuses on verifying the proposed approach, so consideration is from 1e−4 to 5e−3 , including the learning
we use the same datasets as used for the reference model. rate which is set in the reference model.
Datasets, source code, and trained weights for the reference The fitness function for the proposed algorithm is built
model are publicly provided by authors in [16]. The number based on the spectral efficiency function in (2) as follows:
of total paths ( L ) is 3 and the estimation of channel samples 1
with the pilot-to-noise power ratio is 20dB . Fitness = 20
.
(14)
BBA belongs to one type of metaheuristics; in addition, Rsnr
snr =−20
the maximum number of iterations and the population size
are two factors that have a close relationship with the The spectral efficiency is evaluated on test datasets with
metaheuristics’ performance [8]. Based on experiments, we SNRs from −20dB to 20dB with the step of 5. Note that the
have determined that the population size and the maximum spectral efficiency increases as the fitness function
number of iterations should be 20 and 15, respectively, for decreases.
this problem. Termination conditions are that all iterations
have been completed. Other parameters are set as suggested B. CONVERGENCE CHARACTERISTICS
by [31]: pulse rate = 0.5 ; Qmin = 0 ; Qmax = 2 . The In this subsection, the convergence ability and the training
loss produced by DL models on test datasets are evaluated.
illustrated results are the average value of 20 independent
The values of the fitness function in Fig. 5 indicate that the
runs.
proposed approach nearly converges after the 6th iteration
This study verifies the proposed approach by optimizing
hyperparameters of DL models that have six main layers with the value of −16.728dB and insignificantly decreases
same as the model of the reference beamformer. The search from the 7th iteration onwards. This means that at the 6th
space H , which is expressed in (13), includes the number iteration, the proposed approach can figure out optimized
of neurons in the first two FC layers (corresponding to the hyperparameters that are listed in Table 1. Fig. 6 compares
first two rows), activation functions after the first two FC the training loss between the reference model, the proposed
layers (the fourth row), optimizers (the fifth row), and the approach-based model, and the Hyperband approach-based
initial learning rate (the last row). The order of model, where optimized hyperparameters of these DL
hyperparameters in the search space is not required to be in models are in Table 1. Both HBO approach-based models
the order of each layer in the reference DL model. These achieve lower loss values and converge faster than the
hyperparameters are determined by the empirical trials in reference model even though both have more trainable
[16], so they will be optimized by our proposed approach for parameters. However, the proposed approach-based model
achieving the ideal spectral efficiency. Assume that each achieves −5.302 while the Hyperband approach-based
hyperparameter has 4 choices, the dimension of one solution model is −5.252 , and the reference model is −5.136 .
d is 12, calculated by (7). C. SPECTRAL EFFICIENCY CHARACTERISTICS
This subsection compares the achievable spectral efficiency
128 192 256 320 between the reference model, the Hyperband approach-based
64 96 128 192 model, and the proposed approach-based model. The spectral
H = ELU ReLU Sigmoid Tanh . (13) efficiency versus SNR performance in Fig. 7 shows that the
proposed approach-based model produces higher spectral
AdaMax Adam RMSprop Nadam efficiency than the reference model. To obtain 9.72 bits/s/Hz,
1e−4 5e−4 1e −3 5e −3 for example, the optimized model achieves around 1dB in
SNR over the reference model. Besides, the proposed
The network complexity of DL models increases
approach-based model is also slightly better than the
proportionally with the number of neurons, so the search
Hyperband approach-based model for spectral efficiency.
space for the number of neurons in the first two FC layers is
There are estimation errors in estimating L in practical
set to values in a range that includes the number of neurons
systems. Owing to the estimation complexity and the sparsity
set in the reference model. Exponential Linear Unit (ELU),
of mmWave channels, the estimated number of channel
Sigmoid, Rectified Linear Unit (ReLU), and Tanh are the
paths should be set to a small value [30]. Moreover, L in
most prevalent and widespread non-linearity layers and are
practice often differs from those in training, so the
proven to be effective solutions to non-zero mean and zero
consideration of the mismatch between training and
gradient problems, as well as the accuracy versus training
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
12
-16.66 11.37
-16.68 10 9.72
8
9
-16.70 14 16 18 20
6
-16.72 4
2 Reference
-16.74 Hyperband
Proposed
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -20 -15 -10 -5 0 5 10 15 20
Iteration SNR (dB)
FIGURE 5. The fitness function over 15 iterations. FIGURE 7. The spectral efficiency versus SNR.
12
TABLE 1. Hyperparameters for three DL models.
Proposed, LTr = 2, 3 LTr = 3
9
-4.8
8
-5.0 7
-5.2 6
5
-5.4
4
0 100 200 300 400 500
3
Epoch
2
FIGURE 6. The training loss versus epochs 1
0
deployment plays an important role. Assuming that the Proposed Hyperband Reference
online deployment stage’s channel model has three paths (
L = 3 ), but the DL-based models are trained with LTr paths. FIGURE 9. The distribution of the spectral efficiency.
The impact of the channel model’s mismatch between
training and deployment stages is depicted in Fig. 8. This and deployment stages are limited, which indicates the
figure demonstrates the achievable spectral efficiency with robustness and generalizability of DL-based models to the
the output of the DL-based models which have been trained model mismatch issue. In these models, the proposed
with LTr = 2,3 , respectively. Even though there is a model approach-based model produces higher spectral efficiency
than the reference model by about 0.041 to 0.304 bits/s/Hz.
mismatch when LTr = 2,3 , the losses between the training
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
TABLE 2. The median, the first and the third quartiles For SNR = 5dB , moreover, the proposed approach-based
produced by three DL models. model is better than both the Hyperband approach-based
model and the reference model in respect of spectral
Hyperband- Proposed-
Reference efficiency. With q = 3 , for example, the proposed approach-
Parameter based based
model based model achieves 6.412 bits/s/Hz while the Hyperband
model model
Median 7.340 7.593 7.616
approach-based model and the reference model only achieve
6.365 and 6.104 bits/s/Hz, respectively.
The first quartile 4.826 5.442 5.538 Once the DL model is trained in the offline stage, this
The third
7.596 7.661 7.663
model will be adopted to output beamforming vectors.
quartile Therefore, the computational time for yielding these vectors
should be carefully considered in the online stage. Table 3
6.6 shows the average computational time in milliseconds to
output beamforming vectors on 5000 test samples over 1000
Spectral Efficiency (bits/s/Hz)
6.4
independent runs in computers equipped with an NVIDIA
6.2 T4 Tensor Core GPU. The proposed approach-based model
6.0
not only takes less time than the other two models but also
achieves higher spectral efficiency.
5.8
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
VII. CONCLUSION [17] Nakisa et al., “Long short term memory hyperparameter optimization
This study has proposed an HPO approach based on for a neural network based emotion recognition framework,” IEEE
Access, vol. 6, pp. 49325–49338, 2018.
metaheuristics for DL models. The proposed approach was [18] N. Bacanin et al., “Optimizing convolutional neural network
applied to optimizing hyperparameters in DL models that hyperparameters by enhanced swarm intelligence metaheuristics,”
aim to output optimized beamforming coefficients to Algorithms, vol. 13, no. 3. MDPI AG, p. 67, 2020.
approach the ideal spectral efficiency in mmWave [19] F.C. Soon et al., “Hyper‐parameters optimisation of deep CNN
architecture for vehicle logo recognition,” IET Intelligent Transport
communication systems with large-scale antenna arrays. Systems, vol. 12, no. 8, pp. 939–946, 2018.
Results have shown the ability to optimize hyperparameters [20] N.R. Sabar et al., “An evolutionary hyper-heuristic to optimise deep
and provided an insightful solution to forthcoming HPO belief networks for image reconstruction,” Applied Soft Computing,
problems. Comparative analysis has also indicated that the vol. 97. Elsevier BV, p. 105510, 2020.
[21] V. Raj, N. Nayak and S. Kalyani, “Deep reinforcement learning based
proposed approach-based models can produce higher blind mmWave MIMO beam alignment,” IEEE Transactions on
spectral efficiency than the Hyperband approach-based Wireless Communications, vol. 21, no. 10, pp. 8772–8785, 2022.
models and the reference model. As for future work, it would [22] H. Huang et al., “Unsupervised learning-based fast beamforming
be interesting to apply the proposed approach to more design for downlink MIMO,” IEEE Access, pp. 7599–7605, 2019.
[23] X. Li and A. Alkhateeb, “Deep learning for direct hybrid precoding
complex DL models and beamforming problems using in millimeter wave massive MIMO systems,” 2019 53rd Asilomar
hybrid beamforming architectures for reconfigurable Conference on Signals, Systems, and Computers. IEEE, 2019.
intelligent surfaces, and integrated sensing and [24] H. Huang et al., “Deep-learning-based millimeter-wave massive
communication in 6G wireless communication systems. MIMO for hybrid precoding,” IEEE Transactions on Vehicular
Technology, vol. 68, no. 3, pp. 3027–3032, 2019.
[25] Q. Wang et al., “PrecoderNet: Hybrid beamforming for millimeter
REFERENCES wave systems with deep reinforcement learning,” IEEE Wireless
Communications Letters, vol. 9, no. 10, pp. 1677–1681, 2020.
[1] W. Hong et al., “The role of millimeter-wave technologies in 5G/6G
[26] A.M. Elbir and K.V. Mishra, “Joint antenna selection and hybrid
wireless communications,” IEEE Journal of Microwaves, vol. 1, no.
beamformer design using unquantized and quantized deep learning
1, pp. 101–122, 2021.
networks,” IEEE Transactions on Wireless Communications, vol. 19,
[2] L. Zhu et al., “Millimeter-wave communications with non-orthogonal
no. 3, pp. 1677–1688, 2020.
multiple access for B5G/6G,” IEEE Access, vol. 7, pp. 116123–
[27] P. Dong et al., “Deep CNN-based channel estimation for mmWave
116132, 2019.
massive MIMO systems,” IEEE Journal of Selected Topics in Signal
[3] M.Y. Javed et al., “Wideband inter-beam interference cancellation
Processing, vol. 13, no. 5, pp. 989–1000, 2019.
for mmW/Sub-THz phased arrays with squint,” IEEE Transactions
[28] A. Alkhateeb et al., “Deep learning coordinated beamforming for
on Vehicular Technology, pp. 1–13, 2023.
highly-mobile millimeter wave systems,” IEEE Access, pp. 37328–
[4] Y. Shi et al., “Deep learning for large-scale optimization in 6G
37348, 2018.
wireless networks.” arXiv, 2023. doi: 10.48550/arXiv.2301.03377.
[29] A.M. Elbir, “A deep learning framework for hybrid beamforming
[5] H. Huang et al., “Deep-learning-based millimeter-wave massive
without instantaneous CSI feedback,” IEEE Transaction Vehicle
MIMO for hybrid precoding,” IEEE Transactions on Vehicular
Technology, vol. 69, no. 10, pp. 11 743–11 755, 2020.
Technology, vol. 68, no. 3, pp. 3027–3032, 2019.
[30] A. Alkhateeb et al., “Channel estimation and hybrid precoding for
[6] B. Bischl et al., “Hyperparameter optimization: Foundations,
millimeter wave cellular systems,” in IEEE Journal of Selected
algorithms, best practices, and open challenges,” WIREs Data
Topics in Signal Processing, vol. 8, no. 5, pp. 831–846, 2014.
Mining and Knowledge Discovery. Wiley, 2023.
[31] S. Mirjalili et al., “Binary bat algorithm,” Neural Computing and
[7] L. Yang and A. Shami, “On hyperparameter optimization of deep
Applications, vol. 25, no. 3, pp. 663–681, 2014.
learning algorithms: Theory and practice,” Neurocomputing, vol.
[32] S.R. Dubey, S.K. Singh, and B.B. Chaudhuri, “Activation functions
415. Elsevier BV, pp. 295–316, 2020.
in deep learning: A comprehensive survey and benchmark,”
[8] Q. Li et al., “Influence of initialization on the performance of
Neurocomputing, vol. 503. Elsevier BV, pp. 92–108, 2022.
metaheuristic optimizers,” Applied Soft Computing, vol. 91, p.
[33] T. Dozat, “Incorporating nesterov momentum into adam,
106193, 2020.
international conference on learning representations,”, 2016.
[9] I. Boussaïd, J. Lepagnot, and P. Siarry, “A survey on optimization
[34] D. Soydaner, “A comparison of optimization algorithms for deep
metaheuristics,” Information Sciences, vol. 237. Elsevier BV, pp. 82–
learning,” International Journal of Pattern Recognition and Artificial
117, Jul. 2013. doi: 10.1016/j.ins.2013.02.041.
Intelligence, vol. 34, no. 13, p. 2052013, 2020.
[10] T.V. Luyen et al., “Null-steering beamformers for suppressing
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic
unknown direction interferences in sidelobes,” Journal of
optimization.” arXiv, 2014. doi: 10.48550/arXiv.1412.6980.
Communications, pp. 600–607, 2022.
[36] M. D. Zeiler, “Adadelta: An adaptive learning rate method.” arXiv,
[11] T. Dokeroglu et al., “A survey on new generation metaheuristic
2012. doi: 10.48550/arXiv.1212.5701.
algorithms,” Computers and Industrial Engineering, vol. 137.
[37] B. Akay, D. Karaboga, and R. Akay, “A comprehensive survey on
Elsevier BV, p. 106040, 2019.
optimizing deep learning models by metaheuristics,” Artificial
[12] T.V. Luyen and N.V. Cuong, “An effective beamformer for
Intelligence Review, vol. 55, no. 2, pp. 829–894, 2022.
interference suppression without knowing the direction,” Journal of
[38] C. Liu et al., “Predictive beamforming for integrated sensing and
Electrical and Computer Engineering, vol. 13, pp. 601–610, 2022.
communication in vehicular networks: A deep learning approach,”
[13] K. Hussain et al., “Metaheuristic research: a comprehensive survey,”
IEEE International Conf. on Communications, pp. 1948–1954, 2022.
Artificial Intelligence Review, vol. 52, no. 4. pp. 2191–2233, 2018.
[14] H.M. Kha et al., “A null synthesis technique-based beamformer for
uniform rectangular arrays,” 2022 International Conference on
Advanced Technologies for Communications, 2022, pp. 13–17.
[15] L. Li et al., “Hyperband: A novel bandit-based approach to
hyperparameter optimization,” Journal of Machine Learning
Research 18, pp. 1–52, 2018.
[16] T. Lin and Y. Zhu, “Beamforming design for large-scale antenna
arrays using deep learning,” IEEE Wireless Communications Letters,
vol. 9, no. 1, pp. 103–107, 2020.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3277625
HOANG MANH KHA (Member, IEEE) TONG VAN LUYEN (Member, IEEE)
received the B.E and M.E degrees in received the B.S. and M.S. degree from the
Electronics and Telecommunications Hanoi University of Science and Technology,
Engineering both from Hanoi University of in 2002 and 2004, respectively, and the Ph.D.
Science and Technology, in 2002 and 2004, degree from VNU University of Engineering
respectively. He obtained his Ph.D. degree in and Technology in 2019. His research
Communications Engineering from the interests are in beamforming and beam-
University of Paderborn, Germany in 2016. steering for antenna arrays, smart antennas,
His research interests include digital signal optimum array processing, nature-inspired
processing, wireless communication, optimization algorithms, and artificial
positioning engineering, deep learning, intelligence.
pattern classification, and metaheuristics.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4