0% found this document useful (0 votes)

12 views

Automatic Modulation Classification Using Different Neural Network and PCA Combinations

Uploaded by

somu49951

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Automatic Modulation Classification Using Different Neural Network and PCA Combinations

Uploaded by

somu49951

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Expert Systems With Applications 178 (2021) 114931

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Automatic modulation classification using different neural network and

PCA combinations
Ahmed K. Ali a, b, *, Ergun Erçelebi a
a
Department of Electric and Electronic Engineering, Gaziantep University, Gaziantep 27310, Turkey
b
Al-Mustansiriyah University, Baghdad, Iraq

A R T I C L E I N F O A B S T R A C T

Keywords: This paper highlights one of the most promising research directions for automatic modulation recognition al
Modulation classification gorithms, although it does not provide a final solution. We study the design of a high-precision classifier for
Neural networks recognizing PSK, QAM and DVB-S2 APSK modulation signals. First, an efficient pattern recognition model that
Statistical features
includes three main modules for feature extraction, feature optimization and classification is presented. The
RBF-PCA combination
DVB-S2 APSK
feature extraction module extracts the most useful combinations of up to six high-order cumulants that embed
sixth-order moments and uses logarithmic function properties to improve the distribution curve of the six-order
cumulants. To the best of our knowledge, this is the first time that these combinations and the improved feature
criteria have been applied in this area. The optimizer module selects optimal features via principal component
analysis (PCA). Then, in the classifier module, we study two important supervised neural network classifiers (i.e.,
multilayer perceptron (MLP)- and radial basis function (RBF)-based classifiers). Through an experiment, we
determine the best classifier for recognizing the considered modulations. Then, we propose an RBF-PCA com
bined recognition system in which an optimization module is added to enhance the overall classifier perfor
mance. This module optimizes the classifier performance by searching for the best subset of features to use as the
classifier input. The simulation results illustrate that the RBF-PCA classifier combination achieves high recog
nition accuracy even at a low signal-to-noise ratio (SNR) and with limited training samples.

1. Introduction (Weber, Peter, & Felhauer, 2015). Furthermore, blind recognition of the
modulation type of the received signal is a significant problem in com
The recent explosion in communications systems due to rapid growth mercial radio systems, specifically in software-defined radio (SDR),
in radio communications technology is considered a critical aspect in the which must cooperate with an assortment of other communications
development of a new generation of radio systems capable of recog systems. Generally, supplementary information is transmitted to allow
nizing the modulation type of unknown communications signals. The the reconstruction of SDR signals. Blind techniques can be utilized with
ability to deliver sophisticated information services and systems to an intelligent receiver, which leads to enhanced transmission efficiency
military applications even in congested electromagnetic spectra is a by reducing the overhead. Such applications require flexible intelligent
high-concern communications issue for engineers. Friendly signals must communication systems in which automatically recognizing the modu
be safely transmitted and received, while enemy signals should be lation of received signals is a major issue.
identified, analyzed and jammed (Dobre, Abdi, Bar-Ness, & Su, 2007). In A simplified block diagram of the AMC system model is shown in
the last seven years, tremendous developments have been made in the Fig. 1. Modulation classifier design typically includes two procedures:
field of signal classification algorithms, especially in terms of automatic appropriate selection of a signal type and a preprocessing and classifi
modulation classification (AMC). AMC has become a hot and well- cation algorithm. The preprocessing operations may include (but are not
established research area. The underlying concept of AMC is based on confined to) noise reduction, carrier frequency estimation and symbol
automatically identifying the modulation type of a transmitted signal. period determination, while determining the signal power, equalization,
Such identification plays an important role in new intelligent commu etc., relies on the classification algorithm selected for the next step.
nication systems, such as cognitive radio and radio spectrum monitoring Various levels of accuracy are required in the preprocessing tasks; some

* Corresponding author at: Department of Electric and Electronic Engineering, Gaziantep University, Gaziantep 27310, Turkey.
E-mail addresses: [email protected] (A.K. Ali), [email protected] (E. Erçelebi).

https://doi.org/10.1016/j.eswa.2021.114931
Received 22 May 2019; Received in revised form 27 February 2021; Accepted 21 March 2021
Available online 30 March 2021
0957-4174/© 2021 Elsevier Ltd. All rights reserved.
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

classification methods require specific estimates, while others are less According to (Wei & Mendel, 2000), a maximum LB method provides
sensitive to unknown parameters. optimal accuracy for estimating the correct channel type. Additional
In the past few years, the increased demand for high data rates in literature regarding LB classifiers includes (Tang, Pattipati, & Kleinman,
communications systems has resulted in extensive use of high-order 1991), (Puengnim, Thomas, Tourneret, & Vidal, 2010) and (Häring,
modulation schemes, such as 64QAM, 256QAM, 32APSK and 64APSK. Lars, Y. Chen, 2010); these approaches can adjust to various modulation
Classifying the MPSK, MAPSK and MQAM high-order modulation formats and channel circumstances. Thus, their computational
schemes is more challenging than classifying the 16QAM, 32QAM, complexity becomes the main concern, which has resulted in a number
16APSK, BPSK, QPSK, and 8PSK schemes due to the smaller distance of sophisticated second-best measures designed to reduce complexity. In
between the constellation points. Moreover, a larger input dataset (Dennis Wong & Nandi, 2008), the authors used the minimum distance
means that a larger number of features must be considered by the neural (MD) classifier to reduce the intricacy. From this viewpoint, in (Xu, Su,
network classifier (NNC)-based algorithm, which involves increased & Zhou, 2010), Xu, Su and Zhou addressed the problem of reducing
computation complexity and time. To overcome this challenge, we intricacy by storing precomputed values of the quantization database to
believe it is imperative to use principal component analysis (PCA) obviate complicated operations. All these methods succeeded in limiting
combined with NNC to obtain fewer row elements in both the radial the complexity to various degrees but at the penalty of some perfor
basis function (RBF) and multi-layer perceptron (MLP) neural network mance devolution. The FB method has been extensively used in practical
structures to achieve suitable deep learning results. This task represents applications, the FB method comprises two main subsystem modules: 1)
a critical direction of future classification systems. Deep learning is a feature extraction, which extracts parameters from a signal, and 2)
branch of machine learning that takes large amounts of data as training classification, where a prediction is made. Several features have been
input and then processes the input data through multiply layered adopted in previous conventional AMC methods. For example, in
distributed networks with hidden nodes in each layer. Finally, an output (Fehske, Gaeddert, & Reed, 2005), different slices were taken from the
classifier layer provides the predicted target classes (A Ali & Yangyu, spectral coherence function as features of AMC. The most commonly
2017a). The remainder of this paper is organized as follows. Section 2 used features are statistical high-order cumulants, as documented in
discusses the main issue. Section 3 presents the feature extraction (An, Li, & Huang, 2010), (Ebrahimzadeh & Ghazalian, 2011), (Zhu &
method based on the PCA optimizer. Section 4 explains the MLP and Nandi, 2015), and (Abdelmutalab, Assaleh, & El-Tarhuni, 2016b), and
RBFN structures based on PCA optimization. Section 5 presents the also high-order moments (HOMs), which process received signals
simulation results. Finally, Section 6 concludes the paper. differently, as addressed in (Swami & Sadler, 2000) and (Zhu, Waqar, &
Nandi, 2014). Moreover, features are extracted from the time–frequency
2. Background distribution, including instantaneous amplitude, phase, and frequency
(Nandi & Azzouz, 1998), as well as transformer features such as wave
The last four years have witnessed tremendous development in sat lets (Lei Zhou & Man, 2013) and fast Fourier transforms (Muller, Car
ellite communication systems, which has encouraged communication doso, & Klautau, 2011). In (Abdelmutalab, Assaleh, & El-Tarhuni,
engineers to develop new generations of digital modulation schemes. 2016a), the cumulants up to the sixth order were utilized for classifying
Recently, the DVB-S2 standard proposed a new set of modulation tech MPSK and MQAM modulated signals. Additionally, the authors used the
niques named MAPSK. The DVB-S2 standard includes four modulation fourth-order cumulant as a feature vector for classifying digital modu
types, QPSK and 8PSK for nonlinearized transponders and 16APSK and lation types in (Boiteau & Martret, 1996), (Norouzi, Jamshidi, & Zol
32APSK for linearized transponders (Puengnim, Thomas, Tourneret, & ghadrasli, 2016). In (Sarieddeen & Dawy, 2016), the authors utilized the
Vidal, 2010). In satellite communications, APSK is a more attractive high-order cyclic-cumulants (CCs) extracted from received signals as the
modulation scheme than QPSK or QAM (ETSI, 2009). features for the modulation classification. In recent years, researchers
Previous studies in this field have determined two different main have used multicumulant-based classification to achieve better classifi
categories of AMC: likelihood-based (LB) and feature-based (FB) (Dobre cation accuracy for multiple-input multiple-output (MIMO) communi
et al., 2007). LB AMC is a multiple compound-hypothesis testing method cations (Afan Ali & Yangyu, 2017). Numerous cumulants in various
that calculates the likelihood ratios of a selected received signal and orders are joined to form multicumulant vectors, which are then used to
known signals. Then, it makes a decision by comparing this ratio with a classify the received signals from multiple sender sites. The authors of
threshold. This generally gives an optimal decision in a Bayesian sense these studies guaranteed that utilizing a multicumulant-based compo
but involves considerable complexity. Further details regarding LB AMC nent vector yields an execution gain compared with the single-
are documented in (Chugg, Long, & Polydoros, 1996) and (Fontes, De M. cumulant-based vector. A new wavelet cyclic feature (WCF) was pre
Martins, Silveira, & Principe, 2015). sented in (Lei Zhou & Man, 2013) to classify BPSK, QPSK, MSK and
An LB classifier can provide an upper bound on the classification 2FSK. In addition, the continuous wavelet transform (CWT) was utilized
accuracy rate under a given status for accurate channel estimation. in (Ho, Vaz, & Daut, 2010) for template matching; this method achieved

Fig. 1. Block diagram of the AMC system model.

2
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

better classification accuracy at a signal-to-noise ratio (SNR) below 5 dB. (Ahmadi, 2010), (Hossen, Al-Wadahi, & Jervase, 2007), (El-Khamy,
PCA is a helpful algorithm in statistical signal processing because it Elsayed, & Rizk, 2012) and (Cheng & Liu, 2013) few works have
can reduce the dimensionality of datasets for purposes such as addressed the concept of combining PCA with an NNC. Therefore, in this
compression, pattern recognition and data interpretation (Yabo Yuan, paper, we explore the performance of a) an RBF neural network (RBFN)
Peng Zhao,BoWang, 2016). Similarly, PCA has been adopted to optimize (Mashor, 1999) and b) an MLP neural network (MLP) (Ye, Cao, Wang, &
the original features extracted from signals (Ma & Qiu, 2018). Analyti Li, 2018). We combine these neural network structures with PCA to form
cally, n correlated random variables are transformed into a set of d ≤ n classifiers. Selecting an appropriate feature set remains a major concern.
uncorrelated variables. These uncorrelated variables are linear combi In this paper, we use PCA to extract small-dimensional features from a
nations of the original variables and can be utilized to compress the suitable set of higher-order cumulants (HOCs; up to the sixth cumulants)
datasets. PCA is capable of reducing the feature vector dimension in along with the higher-order logarithmic cumulants, which are improved
high-dimensional data and restraining the influence of noise, which by the natural logarithm function (A. . Ali & Erçelebi, 2019); these
further helps to effectively solve signal classification and recognition together are proposed as the most efficient features. We use PCA to
problems involving high-dimensional features (Calvo, Rafael A and optimize these original features and reduce their dimensionality.
Partridge, Matthew and Jabri, 1998). The PCA method provides an Furthermore, through appropriate selection of the main component,
efficient method to create an alternative, smaller set of variables, PCA is capable of removing the interference caused by AWGN channels
allowing the raw data to be represented by a smaller variable set. Thus, and environmental noise. Overall, PCA not only yields a small-
in this paper, we adopt the PCA method to extract small-dimensional dimensional feature vector but also achieves signal and noise separa
features from a set of high-order cumulants and logarithmic cumulants tion. By using the PCA optimizer, we are able to select the most
that are improved by the characteristics of the natural logarithm (A. Ali appropriate parameters of the classifier and the best feature subset.
& Erçelebi, 2019). These extracted features form an eight-dimensional
vector. The goal is to retain most of the raw data information in the 3. Feature extraction based on the PCA optimizer
form of these major components but with reduced dimensions. In
addition, by properly selecting the main components, PCA can eliminate 3.1. Signal model
interference caused by additive white Gaussian noise (AWGN) channels
and environmental noise. Moreover, it can not only extract small- A typical frame is assumed to be a sampled signal with no timing
dimensional eigenvectors but also support signal and noise separation error from a rectangular pulse, which imitates the Nyquist inter symbol
(Soares-Filho et al., n.d.). In (H. Zhang, Bi, Razul, & See, 2014), the interference criterion. As mentioned in (Zhu & Nandi, 2014), timing
discriminant of ambiguity function (AF) images for various modulation error correction is always conducted using suitable timing error esti
signals led to the proposal of a novel algorithm. In this algorithm, the mation and compensation algorithms before modulation classification.
low-dimensional n-variant moment of the AF image key feature vector Channel attenuation, carrier frequency offset and carrier phase offset
based on PCA was combined with a support vector machine (SVM) (fast and slow) are considered. Additionally, AWGN is assumed to be
classifier. Despite accomplishing the desired results at low SNR, this part of the channel environment; such signals are typically used in the
method suffers from computational complexity. Some features have operationally responsive space (ORS) satellite communication model.
commonly been suggested in the literature for supervised classification. Therefore, the received signal containing N signal samples can be rep
However, some unsupervised clustering algorithms have also been resented as follows:
suggested to investigate AMC issues. Generally, the signal classification
∑
∞
subsystem of the FB method recognizes the correct target classes r(n) = Aej(2πfo nT+φ) ξ(l )g(nT − l T + E T T) + S(n) (1)
through the distinctive input features extracted from the signal. l =− ∞
Different approaches have been utilized for the classification subsystems
where Î¾ (l ) are the transmitted signal symbols drawn randomly
in AMC. According to (Ho et al., 2010), (Hazza, Shoaib, Alshebeili, &
from the alphabet set rM ∈ {MPSK, MQAM, MAPSK} of modulation M
Fahad, 2013) and (W. Zhang, 2014) the hierarchical tree, apportion
with equal probability; A is an unknown amplitude factor, which in
ment test and machine learning -based classifiers are often utilized as
cludes (signal energy assumed unity, attenuation factor, and channel
classification decision-making subsystems. Among these, machine
attenuation factor); T is the symbol spacing; E T represents the timing
learning has been a well-known option for many years due to its efficient
error; φ is the channel phase; fo is the carrier frequency offset; g (.) is the
performance. In (Ali & Yangyu, 2017b), (W. Zhang, 2014) and (Sengur,
combined impulse response of the transmit and receive filters in terms of
2009) the researchers used an SVM as a classifier to distinguish various
cascade; and S(n) is the additive Gaussian noise sequence. Here, we
modulation types. The K-nearest neighbor (KNN) imputation method
consider the symbol timing error to be trivial.
was used to address lost data in radar signal classification in (Jordanov,
It is worth noting that the noise level is expressed as a relation be
Petrov, & Petrozziello, 2016) and was used for classification of analogue
tween Pnoise , the “power of additive noise S(.)”, and PSignal , the “power of
signals in (Guldemir & Sengur, 2006). Additionally, the use of an arti
ficial neural network (ANN) was suggested in (Mobasseri, 2000) and the attenuated signal A. Î¾(.)”. Under AWGN channels, withS(.) N(0,
(Kharbech, Dayoub, Zwingelstein-colin, & Simon, 2016) to achieve σ n 2 ), we assume that the transmitted symbols have unit power and that
modulated signal classification. More recently, deep neural networks the SNR is
(DNNs) such as sparse autoencoders (Alain & Bengio, 2014) and con
Psignal A2
volutional neural networks have been applied to classify datasets (Ting, SNR = = 10log10 2 dB (2)
Pnoise σn
Tan, & Sim, 2019). From these previously published studies, it is evident
that when designing a system to automatically recognize digital mod
ulation signal types, there exist some substantial issues that, when 3.2. Feature extraction
properly considered, can lead to the development of a more robust and
efficient classifier. One of these problems is related to selecting the Note that the modulation classification type discussed in this paper is
classification method to be adopted. based on the pattern recognition approach. The feature extraction sub
A literature review revealed that although appropriate classifier se system is responsible for extracting the attributes from the received data.
lection has great potential, the application of different supervised clas This extraction is intended to reduce the dimensionality of the received
sifiers has not received sufficient attention, especially in the modulation row data (Zhu & Nandi, 2015), (Abdelmutalab et al., 2016b). In this
classification field. Additionally, despite the large number of works study, the features extracted from the received signal are HOC features
regarding the performance of NNCs in modulation recognition, such as that extract parameters of the amplitude and phase distributions of

3
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

modulations and are not affected by AWGN, especially cumulants with order of the cumulant and is thus conducive to creating a lower-
orders greater than two. Cumulants are composed of moments; there dimensional space and contributes to reducing the classifier
fore, moments have usually not been used as features. For a complex- complexity. The absolute values of the cumulants are taken to mitigate
valued stationary signal r(n), the cumulants are listed in (Appendix 1, the influence of phase rotation, as mentioned in (Farhang, Dehghani, &
Table 1). Additionally, cumulants and moments define a feature Bahramgiri, 2011).
parameter. The need for feature extraction comes from the inefficiency of using
[√̅̅̅̅̅̅̅̅̅̅̅ ] the large received datasets directly. The suitability of these features not
|c33 |2 (3) only enables the classifier to recognize more and higher digital signals
3
f1 =
1xτ but also helps reduce classifier complexity. Additionally, different
[⃒√⃒̅̅̅̅̅̅̅ ] feature parameters can be used to differentiate between different digital
⃒ C40 ⃒
f 2 = √̅̅̅̅̅̅̅̅̅ (3a) signals (Wong & Nandi, 2004). Fig. 2 illustrates a discrimination test of
|c42 | 1xτ two parameters, f 1 and f 6 . When the symbol length is N = 3000 and the
[⃒√⃒̅̅̅̅̅̅̅ ] compression ratio is 0%, as shown in Figs. 3, 4 and 5, f 2 ,f 3 ,
⃒ C41 ⃒ f 4 ,f 5 ,f 6 ,f 7 ,andf 8 vary according to the SNR for different modulation
f3 = √̅̅̅̅̅̅̅̅̅ (3b) types.
|c42 | 1xτ
Fig. 2 shows that the feature parameters f 1 and f 6 have large fluctu
[ ] ations and differentiate the class patterns when the SNR is less than zero,
|C40 |
f4 = (3c) and the values clearly intersect, especially the APSK and QAM digital
|C21 |2 1xτ signals. However, as the SNR gradually increases, the values tend to
[ [ ]] deviate, which can be observed through the distribution characteristics
|c63 |2 of f 1 and f 6 for the PSK signals in Fig. 2. The same applies to Fig. 4 (f 4 ) and
f 5 = log10 (3d)
|c42 |3 Fig. 5 (f 7 and f 8 ).
1xτ
As depicted in Figs. 2, 3, 4 and 5, which show the distribution
[√̅̅̅̅̅̅̅̅̅ ]
characteristics of digital signals, it can be concluded that separating the
f6 = |c40 | 1xτ (3e)
classes linearly is unfeasible; therefore, a linear classifier is not suitable
[√̅̅̅̅̅̅̅̅̅ ] for identifying these patterns due to the intersections between classes.
f7 = |c41 | 1xτ (3f) Therefore, we used a pattern recognition approach based on an ANN.
The ANN modulation classifier used in this research involves three main
[√̅̅̅̅̅̅̅̅̅ ]
f8 = |c80 | 1xτ (3g) blocks (Fig. 7). The first block is the dataset normalization stage, in
which the 8 input features are extracted from the simulated digital
Here, training data points of signals of interest are indexed by τ and signal. The second block is the optimizer, which fully implements the
then combined as the vector [Ψ]8x1 : PCA algorithm and is used for dimensionality compression and to select
⎡ ⎤ the features that are highly distinct between classes to improve signal
f1 recognition performance. The third block is a classifier that conducts the
⎢ f2 ⎥
⎢ ⎥ training or learning stage to appropriately adjust the structure of the
⎢ f3 ⎥
⎢ ⎥ developed AMC and executes the test stage to evaluate the developed
⎢ f4 ⎥
[Ψ]8x1 = ⎢
f
⎢ 5 ⎥
⎥ classifier. The internal structure of the proposed classifier is described in
⎢ f ⎥
⎢ 6 ⎥ Section 4.
⎣ f7 ⎦
f 8 8xτ 4. Features optimized using PCA
where ψ is the statistical feature vector andMpq is a complex-valued
Information compression is induced through a mapping operation in
stationary random process y. The pth order moment as shown in (Swami
which all the beneficial information involved in the original dataset
& Sadler, 2000):
¨
vector (Î in this study) is transformed into a smaller number of composite
Mpq = E[yp− q (y* )q ] (4) features in a new vector space ( ψ ̂ ) and redundant and irrelevant infor
where y is called a complex conjugate of y and q is the power of the
* mation is discarded. PCA is an extensively used statistical technique in
conjugate signal y* . the field of information compression, specifically in the pattern recog
nition literature (Ma & Qiu, 2018). Linear mapping is much more typical
and has the advantage of being less computationally intensive. The
3.3. Optimization features mapping function involves matrix multiplication.

Ψ = [y1 , y2 , ⋯, yi ]T (5)
As explained in Section 3.2, during feature extraction, the HOCs are
rescaled by using the root, as described in (Abdelmutalab et al., 2016a),
√⃒̅̅̅̅̅̅̅̅̅̅̅̅
⃒2̅
̂ = ξ(Ψ)
Ψ (6)
and all cumulants are transformed using ⃒Cpq ⃒ , where p refers to the
p

̂ = [x1 , x2 , ⋯, xm ]T
Ψ (7)
Table 1
Specification of the ANN-based classifier structures. i>m (8)
Network structure Numbers of neuroses in each layer ‘
Input Hidden 1 Hidden 2 Output
̂ T , if T = I, is orthogonal
Ψ = T .Ψ.T
T = [t1 , t2 , ⋯, ti ], tI is column vectors
RBF 8 24 15 10
‘
MLP 8 24 15 10 ̂ η = tη .Ψ for η = 1, 2⋯i.
Ψ
RBF – {activation function} ~ RBF hard limit purelin
MLP– {activation function} ~ Tanh logistics purelin where I is the identity matrix, the matrix T contributes based on the
Performance function MSE eigenvectors of the covariance or the autocorrelation matrix of the dis
Maximum epochs 100 ¨
tributions of datasets Î. The principal components are selected to achieve

4
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 2. The values of the feature parameters f 1 and f 6 before PCA with a symbol length of N = 3000.

5
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 3. The values of the feature parameters f 2 and f 3 before PCA with a symbol length of N = 3000.

6
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 4. The values of the feature parameters f 4 and f 5 before PCA with a symbol length of N = 3000.

the minimal mean-square error among ψ and ψEs and are expressed as q
∑
follows: ̂ =
φ( Ψ) ̂ Ci )
Wi ρ( Ψ, (11)
i=1
∑
i ∑
m
ΨEs = Ψη t η + bη tη (9) where q is the number of hidden-layer neurons and Cj and Wj are the
centre and weight of the jth hidden-layer neurons, respectively. Here,
η=1 η=i+1

where ψEs is an estimate of ψ, and bη is a preselected constant. ̂ Cj ) denotes the RBF, which is the Euclidean distance with respect to
ρ( Ψ,
PCA optimization is applied to the cumulant features to obtain more ̂ and Cj and is given by
Ψ
distinctive features. The Ψ
̂ vector is expressed as
μj ‖̂
⎡ ⎤ ̂ Cj ) = e−
ρ( Ψ, Ψ − Cj ‖2
(12)
f1
⎢
⎢ f2 ⎥
⎥ The RBF neural network is trained in 2 phases. First, the random
⎢ f3 ⎥
⎢ ⎥ sampling method is used to compute Cj . Subsequently, the back-
⎢ f4 ⎥
̂
Ψ=⎢ ⎥ (10) propagation (BP) algorithm is used to evaluate Wj and μj .
⎢ f5 ⎥
⎢ f6 ⎥
⎢ ⎥
⎣ f7 ⎦
f8 8xτ 4.2. MLP classifier

4.1. The RBFN classifier The MLP neural network consists of an input layer that accepts the
signal features, one or more hidden layers (invisible compute nodes) and
After feature optimization and compression, we used a multilayer an output layer; here, it outputs the number of labels (the modulation
RBF network as the classifier, which has achieved superior performance signals). The learning algorithm and its speed play significant roles in
in AMC techniques (Mashor, 1999). The dataset Ψ ̂ is employed as the MLP, specifically in pattern recognition. One commonly applied
learning method is the BP algorithm. However, under certain circum
input, and the output φ( Ψ)
̂ is written as follows:
stances, the BP network classifier results in poor classification accuracy
and can easily become trapped in local minima. In addition, BP requires
considerable time during the training phase. Fig. 6 shows the details of

7
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 5. The values of the feature parameters f 7 and f 8 before PCA with a symbol length of N = 3000.

the constrained pattern recognition model. In recent years, newer 1 Compute the mean square error (MSE) using initial randomly
learning algorithms for training neural networks have been suggested. generated weights:
These new-generation algorithms have advantages relative to the earlier
∑
r ∑
m
ones in terms of learning speed and result accuracy, and they are based ϑ(x, W) = 1/r em 2 (13)
on modern optimization methods for minimizing the error. In this study, n=1 m=1

we used the Levenberg–Marquardt optimization (LMO) algorithm as the where × is the dataset vector, r is the length of the training dataset,
learning algorithm for the neural network, which was developed to and W is the weight vector. Here, em is the training error, which is
update the weight and bias values. This algorithm was adapted by expressed as em = Tm − Zm ; Tm is the target desired output; and Zm is the
performing the following actions (Jordanov et al., 2016): network output vector at output n for the applied pattern m.

8
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 6. Structure of the proposed pattern recognition MLP.

2 Calculate the adaptive and update weights using the following 4. If the MSE has increased, reset the weight vector to the previous
equations: value and augment ζ; then, go to step two to try updating again.
Otherwise, when the MSE has been reduced, accept the step and
Wk+1 = Wk − (H)− 1 Jk eK (14) decreaseζ.
where Wk+1 is a new weight that calculated for the k neuron of 5 Return to step two with the new weights until the MSE is less than a
hidden-layer, Wk is a current of the neuron k in the hidden-layer. While, threshold value.
H is the Hessian matrix, expressed by The Levenberg–Marquardt (LM)-BP supervised learning method is
used to train both classifiers in the experiments in subsection 5.1.
H = J T J + ζI (15)
where, ζ is a combination coefficient, and J is the Jacobian matrix. 4.3. Proposed classifier
According to Equation (14), the LMO training algorithm is a combina
tion of the steepest descent algorithm and the Gauss–Newton algorithm. The goal of this study is to develop an AMC type that offers option to
When ζ is very small, for training, the Gauss-Newton process is used; receiver for recognizing which length of the possible modulation tech
likewise, when ζ is large, the steepest descent method is used. niques the transmitter uses to send a row of datasets. In this work, the
3 Compute the MSE for new weights. AMC is considered as a pattern recognition issue, as mentioned in the

Fig. 7. The architecture of the proposed AMC algorithm.

9
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

previous section. The features are extracted from the received signal and
j
∑
optimized using PCA. Next, the compressed feature vector is used by a Pcc = P(ωk |ωk )P(ωk ) (17)
classifier to determine the modulation types used by the transmitter. We k=1
note that any pattern recognition scheme achieved on the chosen fea
where, j is the number of modulation signal candidates to be clas
tures applies to the classifier and is characteristic of the classifier itself. { }
sified subjected to modulation schemes pool as rM ∈ ω1 , ω2 , ω3 , ⋯ωj ,
We evaluate the performance of the classifiers by their correct recog
nition rate probabilities. In this subsection, we present the proposed P(ωk ) is the probability that the modulation scheme (ωk ) occurs, and
classifier combination of an ANN combined with PCA. In this study, we P(ωk |ωk ) represents the conditional probability which underlying the
used the ANN-PCA combination to create a simple and effective mod probability of correct classification when an (ωk ) modulatıon constel
ulation classification system. The received signal undergoes a staged lation has used for transmitted data. In our work, the number of mod
procedure. The received signal is fed into the feature extraction and data ulation signals candidate to be classified are j = 10.
normalization stage, which extracts features from the signal after
normalization to create the feature vector. The optimizer stage, which is 5.1. Network training
the optimized feature vector input to the ANN classifier stage, is shown
in Fig. 7, which depicts the ANN block combined with the PCA classifier. The architecture of the proposed classifier is shown in Fig. 7, which
One advantage of this structure is that it achieves higher classification uses the cumulant-statistical key feature discussed earlier as input. The
accuracy than a common pattern recognition system based on feature neural network-based classifier used is a feedforward network (FNN)
extraction. generally referred to as an MLP. Table 1 lists the neural network
architectural specifications. All neurons are fully connected, as shown in
4.4. Reliability algorithm Fig. 6. The neurons in the input layer do not perform computation; they
simply distribute the input features to the computing neurons in the
The reliability algorithm presented in this work is intended to obtain hidden layer. In contrast, the neurons in the hidden layer perform
the maximum pattern recognition model precision before determining computations on the inputs from the input layer and pass their results to
the pattern. The reliability R is the difference between the calculated the neurons in the output layer. The eight neurons in the input layer
output corresponding to the data used during training and the actual correspond to the number of input features. There are 24 neurons in the
target output and is computed as follows: first hidden layer and 15 in the second hidden layer. The network has 10
neurons in the output layer, which correspond to the number of targets
∑
n
R= (Tn − An )2 (16) (the 10 signal schemes: (3) MPSK, (3) MAPSK, and (4) MQAM). Totals of
n=1 400 and 8000 data elements with 8-feature set inputs and 10 target
outputs were used in this paper. The adopted training phase steps are as
where n is the length of the training dataset and E is a reliability
follows:
coefficient, which is determined based on the threshold (set here to
1e, 2.5e and e = 10− 6 ). The adopted reliability algorithm follows the
1) The generated data consisted of an input matrix and a target matrix.
pseudo-code steps shown in Algorithm 1.
2) The input data were compressed via PCA at a constant ratio and
Algorithm 1: Reliability algorithm
Input: n ∈ index{PSK, APSK, QAM}i ; input boundary with index
randomly sorted.
Output: PCCi ; output probability of classification correct rate 3) The generated data were divided into training, validation and testing
Step1 If {Ri ≤ E }PCC Index SNR = PCC + 1;End datasets, and 70% of the total data were used for network training.
Step2 Return Pcc; The training dataset was used to update the weights of the network.
Training continued until the MSE used as the performance metric
was ≤ 1e-7 . Of the total data, 15% were used to validate the net
5. Simulations and classifier results work’s ability to generalize and to halt training before overfitting
occurred. The remaining 15% of the total data were used as
Due to the complexity of emerging developments in digital systems completely independent testing data to assess the generalization
and the trend toward digital telecommunications as opposed to ability of the network.
analogue telecommunication, most modern communication uses digital
signals. Considering the changes in message parameters, there are four 5.2. Reliability test
general digital signal schemes, ASK, PSK, FSK and QAM, which are often
used in M− ary format (Alharbi, Mobien, Alshebeili, & Alturki, 2012). In this subsection, we investigate the performance of different clas
Furthermore, the DVB-S2 standard includes 16APSK, 32APSK and sifiers that are combined with a PCA optimizer. In this step, we used
64APSK signals (ETSI, 2009): these are typical modulation schemes for compression ratios of 1:4 and 0 to extract statistical features. First, we
the modern generation of satellite communications. In this work, we evaluated the performance of MLP neural networks with an LM learning
considered the following set of input digital signal types: BPSK, QPSK, algorithm. Table 2 depicts the classification accuracy of this neural
8PSK, 16APSK, 32APSK, 64APSK, 16QAM 32QAM, 64QAM and 256 network. The training was conducted using 400 elements of data. Based
QAM. To evaluate the performance of the proposed classifier under on the trial-and-error method, we used a constant reliability threshold
noisy channel or variable data length situations, the performance of our (2.5e). Likewise, Table 3 reports the classification accuracy rate for MLP
AMC method under AWGN is evaluated in this section using the invoked when training on 8000 elements of data. Table 2 indicates that the
signal tools available in MATLAB simulations. We created and stored a neural network combined with the compression ratio 1:4 achieved a
total of 2048 samples of each signal type. The simulated signals included good classification accuracy rate with few training data elements. It was
AWGN, which was added to the symbols at various SNRs (-10 to 10 dB). precise at lower SNR values. This could be an important aspect of the
Each signal type was realized 50 and 1000 times, in which the digital proposed classifier. The average accuracy was significantly degraded
information (message) was generated randomly for each 100 trials to with increased training elements of data, as observed in Table 3.
ensure independent results. Classification accuracy assessments of the Nevertheless, the performance was frail at a comparison ratio of 1:4.
different classes were provided by the classification matrix, and two Then, we evaluated the performance of the recognizer with the RBF
parameters are defined: the first is average accuracy Av , which shows the classifier. Tables 4–5 report the results. From these results, it can be
analysis of different classes in percentages; the second is the probability observed that the performance of the PCA neural network combined
of correct classification Pcc , which can be formulated as follows: with the RBF neural network was generally very good even at very low

10
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Table 2
Performance of the MLP-PCA-combined recognizer at different SNRs. Training data elements: 400; constant reliability threshold: 2.5e.
ZIP SNR (dB) Signal classification PCC %
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

0 − 10 86 82 84 92 84 88 89 84 93 86
− 5 88 87 87 84 87 85 88 83 79 88
5 90 90 87 83 88 90 96 90 87 89
10 91 91 94 88 88 89 88 91 91 88
1:4 − 10 97 94 96 95 94 95 96 96 94 97
− 5 89 92 89 93 91 99 93 93 97 95
5 94 94 93 93 87 91 94 90 86 88
10 89 89 95 92 89 87 89 94 95 94
Signal syntax S1 : BPSK, S2 : QPSK, S3 : 8PSK, S4 : 16APSK,S5 : 32APSK, S6 : 64APSK, S7 : 16QAM, S8 : 32QAM,S9 : 64QAM, S10 : 256 QAM
ZIP : PCA Compression ratio (%)

Table 3
Performance of the MLP-PCA-combined recognizer at different SNRs. Training data elements: 8000; constant reliability threshold: 2.5e.
ZIP SNR (dB) Signal classification PCC %
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

0 − 10 100 99 100 99 99 99 100 100 99 100

− 5 98 100 99 100 98 98 98 99 99 100
5 99 99 100 100 98 100 99 99 100 99
10 100 100 99 99 100 99 100 100 98 100
1:4 − 10 95 94 96 94 98 96 99 100 96 95
− 5 100 100 99 97 97 97 99 96 99 96
5 99 99 100 99 99 99 100 99 100 100
10 99 99 99 100 99 99 98 98 99 100
Signal syntax S1 : BPSK, S2 : QPSK, S3 : 8PSK, S4 : 16APSK,S5 : 32APSK, S6 : 64APSK, S7 : 16QAM, S8 : 32QAM,S9 : 64QAM, S10 : 256 QAM

Table 4
Performance of the RBF-PCA combined recognizer at different SNRs. Training data elements: 400; constant reliability threshold: 2.5e.
ZIP SNR (dB) Signal classification PCC %
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

0 − 10 98 98 97 96 97 97 98 96 97 98
− 5 96 99 99 95 99 95 97 95 95 97
5 98 98 97 99 99 98 99 99 97 95
10 96 96 96 98 96 77 98 91 95 95
1:4 − 10 98 97 98 97 94 97 97 96 96 98
− 5 97 95 97 97 98 97 96 97 98 97
5 96 96 98 93 97 98 96 98 97 99
10 98 98 97 98 97 99 99 95 93 97
Signal syntax S1 : BPSK, S2 : QPSK, S3 : 8PSK, S4 : 16APSK,S5 : 32APSK, S6 : 64APSK, S7 : 16QAM, S8 : 32QAM,S9 : 64QAM, S10 : 256 QAM

Table 5
Performance of the RBF-PCA combined recognizer at different SNRs. Training data elements: 8000; constant reliability threshold: 2.5e.
ZIP SNR (dB) Signal classification PCC %
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

0 − 10 100 100 100 100 100 100 100 100 100 100
− 5 100 100 100 100 100 100 99 100 100 100
5 100 100 100 100 100 100 100 100 100 100
10 100 100 100 100 100 100 100 100 100 100
1:4 − 10 99 99 98 100 98 99 99 100 99 99
− 5 99 98 99 99 98 99 99 100 100 100
5 98 98 100 100 100 100 99 100 97 99
10 100 100 98 96 96 100 99 100 98 100
Signal syntax S1 : BPSK, S2 : QPSK, S3 : 8PSK, S4 : 16APSK,S5 : 32APSK, S6 : 64APSK, S7 : 16QAM, S8 : 32QAM,S9 : 64QAM, S10 : 256 QAM

SNRs and with fewer training samples due to two factors: the proposed classification of the RBF-PCA combined classifier with different reli
statistical features and proposed PCA neural network-combined classi ability thresholds and compression ratios (set to 0% and 1:4, respec
fier. The chosen features have effective properties in signal representa tively). The classifier with the RBF combination achieved higher
tion. However, the RBF-PCA-based classifier has a high generalization classification accuracy than the others. Thus, we chose the RBF classifier
ability for the recognition of radio signals even at low SNRs and with few with the combined PCA optimizer as the main classifier for the proposed
samples in the training network. AMC technique. Note that the training samples for each network were
Figs. 8 and 9 present a performance comparison of the average signal 50 independent signal realizations.

11
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 8. Average accuracy rate of MLP-PCA and RBF-PCA combined classifier structures for different SNR values and a reliability coefficient of E = 1e, 2.5e at a
compression ratio of 0, N = 2048 symbols, τ = 50 for each modulation scheme.

Fig. 9. Average accuracy rate of MLP-PCA and RBF-PCA-combined classifier structures for different SNR values and a reliability coefficient of E = 1e, 2.5e at a
compression ratio of 1:4, N = 2048 symbols, τ = 50 for each modulation scheme.

5.3. Performance comparison and evaluations

Table 6
Performance of the RBF-PCA combined recognizer on high-order modulation
As shown, the nonlinear RBF based on the PCA optimizer generally
schemes. The reliability coefficient is E = 2.5e at different signal lengths (250,
750, 2500 and 3500) symbols and training with 400 data elements.
achieved good performance. Thus, we designed the proposed classifier
system with an RBF neural network. Table 6 reports the performance of
Data length Signals SNR (dB)
the RBF-PCA combined recognizer at each SNR on high-order modula
− 10 − 5 5 10
tion signals. Note that the combined RBF-PCA model achieved a
250 8PSK 97 100 99 93 recognition accuracy of almost 98% for all radio signals despite few
64APSK 99 96 92 96
64 QAM 98 100 96 97
training elements. Fig. 10 presents the average classification rates for
256 QAM 98 96 98 97 different signal lengths.
750 8PSK 98 98 95 98 As mentioned above, we adopted an RBF-PCA-based classifier. One
64APSK 96 98 97 98 advantage of this structure is that the number of training samples is less
64 QAM 95 98 97 100
than was used in the MLP-network study performed by (Longmei Zhou,
256 QAM 95 97 99 97
2500 8PSK 96 99 96 96 Sun, & Wang, 2017). We also compared the performance of the proposed
64APSK 98 98 94 98 RBF-PCA-combined classifier with the deep learning classifier proposed
64 QAM 100 100 99 100 by (Afan Ali & Yangyu, 2017). Fig. 11a and b present these comparisons.
256 QAM 97 98 96 96 The proposed RBF-PCA combined structure achieved higher perfor
3500 8PSK 96 99 99 99
64APSK 97 97 97 98
mance at all SNRs than the methods proposed by Z. Zhu et al. (2017) and
64 QAM 100 99 96 96 A. Ali et al. (2017). It is worth noting that the number of training
256 QAM 100 98 99 96 samples used was 400 elements.

12
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 10. Average accuracy rate of the RBF-PCA combined classifier structure at different SNR values, a reliability coefficient of E = 2.5e and a compression ratio of
1:4, with different signal lengths (250, 750, 2500 and 3500) symbols, τ = 50 for each modulation scheme.

5.4. Robustness comparison mentation. In some applications, considerable complexities can be

accepted over classification accuracy. However, for some critical ap
A direct comparison with other studies is difficult because in addition plications, lower-complexity classifiers are preferred. This section
to discrepancies in the number of iterations and training data elements compares the computational complexity of the proposed classifier to five
used and the modulation signals, this classifier architecture has not been different classifiers: MLB classifier (Zhu et al., 2014), minimum distance
previously proposed. Nevertheless, Table 7 presents a comparison be (MD) classifier (Dennis Wong & Nandi, 2008), cumulant-based classifier
tween methods proposed in previous studies and the proposed RBF-PCA (Swami & Sadler, 2000), and DNN and FNN classifiers (Afan Ali &
combined system. As shown, compared with the previous methods, the Yangyu, 2017). The computational complexity of each classifier was
RBF-PCA combined classifier has a considerable advantage because it is measured by calculating the mathematical operations required to clas
suitable for a variety of high-order digital modulation signal types (e.g., sify one piece of signal completely. The results are shown in Table 8.
64QAM, 256QAM) and for signals from second-generation digital video Two complexities are involved in the FB method: at the preprocessing
broadcasting satellites (e.g., DVB-S2 and 64APSK). Due to the proposed and decision classifier stages. In the preprocessing, the feature vector
feature vector and classifier framework, the performance of the system must be estimated by applying some signal processing operations, which
remains high even when SNR < -6 dB. Note that using 400 training require the most mathematical operations. The decision classifier in
elements helps reduce the training time and increase the recognition volves two complexities: the training phase and the testing phase. At the
speed compared to similar signal classifiers because the use of the PCA training phase, the weight vectors are estimated by solving the activa
feature compression technique enhances the overall classifier perfor tion functions, which requires an exponential operation, in addition to
mance. The RBF-PCA combined classifier achieves an average classifi summation and multiplication. At the testing phase, the vector of the
cation rate of no less than 94% when the SNR = -10 dB. The prediction pattern is linear with respect to the number of features, and
classification rate is greater than 97% when SNR > 0 dB, and this the size is consistent training data. However, the overall complexity of
performance is achieved with fewer samples. the proposed RBF-PCA combined classifier includes two complexities
In addition to above comparison, the proposed classifier was and can be expressed as follows:
compared with maximum likelihood -based (MLB) classifiers from the
Complexity = ComplexityTraining + ComplexityTesting (18)
literature. The performance comparison with several MLB-classifiers is
listed in Table 8. The experiment was conducted for the proposed RBF-
PCA-combined using the same conditions as each existing classifier. As 5.5.1. Complexity included in training phase
expected, the results demonstrated that proposed classifier out The proposed RBF network comprises 2 hidden layers as described in
performed the MLB-classifiers (Chugg et al., 1996), (Wei & Mendel, Table 1. The computational complexity in the training phase needs
∑
2000) and (Zhu et al., 2014). The reason for the superior performance of MNNw M n=1 Ln multiplications to calculate the weight matrix of all
∑
the proposed RBF-PCA-combined classifier can be attributed to the neurons in each network layer. In addition to MNNw M n=1 Ln of the
powerful ability of the extracted features that have a high discriminating ∑M
addition operations, it requires MN n=1 Ln exponential operations in
quality between modulation schemes even when signal model has low the RBF function. However, there is rarely a concern that these
accuracy. However, the proposed RBF-PCA-combined classıfıer perfor computational complexities will increase classifier complexity because
mance is slightly decreased compared with the MLB-based power esti the training phase is an offline step.
mation technique when discriminating between MPSK signals. In this
case, the AWGN channel model has been used to degrade the signal 5.5.2. Complexity includes in the test phase
model quality at SNR 4 dB. After that, the proposed RBF-PCA-combined In the testing phase, the feature vector is used as an input to the RBF-
classifier shows a higher performance than other MLB classifiers in the PCA combined classifier. In this case, the computation complexity in
literature. volves only the multiplication of the feature vector with the weights
estimated in the training phase. Thus, the RBF network requires MNNw
multiplications to compute the input class in the testing phase.
5.5. Complexity analysis
5.5.3. Complexity comparison with other classifiers
The complexity of mathematical operations is a crucial part of As depicted in Table 9, if we exclude the operations needed in
modulation classifiers and plays an essential role in practical imple

13
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Fig. 11. Comparison of the average classification performance between the proposed RBF-PCA combined classifier and other methods.

Table 7
Comparison of different methods for different modulation types and SNR levels in terms of the recognition accuracy.
References Modulation signals Studied SNR Recognition Accuracy Simulation tool
M− PSK M− APSK M− QAM Others (dB) (%)

(Mobasseri, 2000) 4, 8 No 16 NO − 5 ~ 10 90 Not specified

(Swami & Sadler, 2000) 2, 8 No 16 PAM − 5 ~ 20 96 Not specified
(Abrahamzadeh, Seyedin, & Dehghan, 2, 4, 8 No 8.V29,32, 64 ASK − 3 ~ 20 98.5 MATLABsignal processing
2007) tools
(Ebrahimzadeh & Ghazalian, 2011) 2, 4, 8 No 8, 16, 32,64, AK,FSK − 3~6 98
V.32
Proposed module 2, 4, 8 16, 32, 64 16, 32, 64256 Easy to − 10 ~ 10 98
develop

preprocessing, it is obvious that the implementation of the MLB classifier the received signal. For the FNN and DNN classifiers, the complexity is
requires logarithmic and exponential operations, while others do not. approximately identical to the proposed RBF-PCA combined method,
The MD classifier does not require a logarithmic or exponential opera excluding the operations needed in the preprocessing step. Based on the
tion. Therefore, the MLB classifier complexities are considerably above discussion, it can be noted that the cumulant-based classifier has
reduced. However, numerous multiplications and additions are still lower complexity, which depends directly on the number of cumulants
required, similar to the operations required to calculate the cumulants of used for classification. In contrast, the MLB classifier has the highest

14
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Table 8
Performance comparison with MLB classifiers in the literature.
References Modulation schemes Classifiers Noise Accuracy (%) RBF-PCA-combined (%)

(Chugg et al., 1996) BPSK, QPSK, OQPSK MLB-based power estimation technique AWGN 99 98.6
SNR = 4 dB
(Wei & Mendel, 2000) 16QAM MLB AWGN 64 99
SNR = 5 dB
(Zhu et al., 2014) 4QAM,16QAM Phase-based MLB AWGNSNR = 0 dB 73.5 96.5

Table 9
Operations needed for a proposed classifier with existing classifiers in other publications.
Classifiers Multiplication Additions Exponential Logarithm xn Matrix Operation
∑M ∑M ∑ ∑ { }
MLB 5MN n=1 L n 6MN n=1 L n MN M
n=1 L MN M
n=1 L
0 ∑
MN M
n n n=1 L n
∑M (∑ ) { }*
MD 2MN 0 0 0 ∑
n=1 3L n +1
M
n=1 L n MN MN M
n=1 L n
*
Cumulants R MN R MN 0 0 0 {R MN}*
∑M ∑M ∑ { } { }
FNN MNNw n=1 L n MNNw n=1 L n MN M
n=1 L
0 R MN
∑M ∑
R MN M
n n=1 L n n=1 L n
* { } *
∑M ∑M ∑
DNN MNNw n=1 L n MNNw n=1 L n MN M
n=1 L
0 0 ∑
MN M
n n=1 L n
{ } { } { *}
∑M ∑M ∑
RBF-PCA-combined MNNw n=1 L n MNNw n=1 L n MN M
n=1 L
∑
MN M R MN
∑M ∑
R MN M
n n=1 L n n=1 L n n=1 L n
* * *
Abbreviations: {}* denotes that the operation is in the preprocessing stage, M is the number of modulation scheme candidates for the classification, L n is the modulation type, N is the
total number of signal samples, Nw is the total number of weights in the neural networks, R is the total number of features.

complexity because it employs logarithmic and exponential operations, ability for classifying radio signals even at low SNRs; it achieves better
but it has higher classification accuracy (Zhu et al., 2014). The results than other models. Additionally, the results show that the RBF-
complexity of the proposed RBF-PCA classifier is essentially dependent PCA combined module optimizes the classifier design and obtains very
on the number of additions, multiplications, and exponential operations high classification accuracy even at very low SNR values and with
required to calculate weights and all neuron activation in each layer, limited numbers of training elements. Overall, we can draw the
which is essentially used during the network training phase. Therefore, following conclusions from the results:
the complexity of the proposed RBF-PCA combined classifier is influ
enced by how often training is needed. However, it is worth mentioning • The performance of the MLP-PCA combined structure degrades
that in the proposed RBF-PCA combined classifier, the weight matrix is slightly when the feature dimensions are reduced by a ratio of 1:4.
computed offline during the training phase and is not repeated in the We expect that this degradation occurs because of the loss of the
testing phase. In a word, the classifiers with more complex computations temporal correlations that are strongly relevant to the classification
can mainly achieve higher classification accuracy. task.
• The RBF neural network architecture is the most powerful when
6. Conclusion combined with the PCA feature compression technique, particularly
when the dimensions are reduced by a ratio of 1:4. This combination
While this paper does not provide a final solution, it highlights one of achieves an accuracy of no less than 95% at an SNR < -5 dB.
the most promising lines of research for automatic modulation recog Moreover, our results indicate that the use of PCA to compress the
nition algorithms. Due to the complexity of this issue, there is an feature dimensions results in faster training with fewer training el
increasing demand for the use of automatic digital radio signals in ements even at low SNRs.
different applications. Nevertheless, most of the previously proposed • Most strikingly, the classification accuracy curves are not necessarily
methods are able to identify only low-order modulation signals. Such monotonic with the SNR when using the PCA feature optimizer. This
methods must also be robust to high SNR levels when identifying the effect is most pronounced when the dimensions are reduced by 1:4
considered modulation signals. In this work, we proposed using a suit and used as input to the MLP neural network architecture. In this
able combination of higher order moments and higher-order cumulants case, the average accuracy is significantly reduced at SNR levels
(up to six) as effective features. These features are compressed and above 0 dB.
optimized using PCA. In the classifier module, we first tested an MLP-
PCA classifier combination using the Levenberg–Marquardt learning CRediT authorship contribution statement
algorithm. This neural network has good average recognition accuracy.
However, its performance is weak under low SNR. Then, we tested the Ahmed K. Ali: Conceptualization, Methodology, Software, Investi
performance of an RBF-PCA classifier combination with a 1:4 ratio of gation, Writing - review & editing. Ergun Erçelebi: Supervision,
dimensionality reduction using PCA. Most of our results focused on the Validation.
performance of the RBF-PCA combined classifier under different SNRs.
The results show that the performance of the RBF-PCA combined clas
sifier is generally very good even at very low SNR. This result occurs for Declaration of Competing Interest
two reasons: the proposed features with the optimizer module and the
network structure of the proposed supervised pattern recognition clas The authors declare that they have no known competing financial
sifier. The selected features possess effective properties for signal rep interests or personal relationships that could have appeared to influence
resentation, and the RBF-based classifier has a high generalization the work reported in this paper.

15
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Appendix A

Table A. High Order Cumulants expression.

u Expression Expression as moment form

2nd C21 = cum(y(n), y(n) ) * C21 = M21

3rd C33 = cum(y(n)* , y(n)* , y(n)* ) C33 = M33 − 6M20 M31 − 9M22 M11 + 18M20 2 M11 + 12M11 3
4th C40 = cum(y(n), y(n), y(n), y(n)) C40 = M40 − 3M20 2
C41 = cum(y(n), y(n), y(n), y(n)* ) C41 = M41 − 3M20 M21
C42 = cum(y(n), y(n), y(n)* , y(n)* ) C42 = M42 − |M20 |2 − 2M21 2
6th C63 = cum(y(n), y(n), y(n), y(n)* , y(n)* , y(n)* ) C63 = M63 − 9M21 M42 + 12M21 3 − 3M20 M43 − 3M20 M41 + 18M20 M21 M22
8th C80 = cum(y(n), y(n), y(n), y(n), y(n), y(n), y(n), y(n)) C80 = M80 − 35M40 2 − 630M20 4 + 420M40 M20 2

u: order of the cumulants expression.

References Farhang, M., Dehghani, H., & Bahramgiri, H. (2011). Multi-receiver modulation
classification for satellite communications signals. In 2011 IEEE International
Conference on Signal and Image Processing Applications. https://doi.org/10.1109/
Abdelmutalab, A., Assaleh, K., & El-Tarhuni, M. (2016a). Automatic modulation
ICSIPA.2011.6144156
classification based on high order cumulants and hierarchical polynomial classifiers.
Fehske, A., Gaeddert, J., & Reed, J. H. (2005). A new approach to signal classification
Physical Communication, 21, 10–18. https://doi.org/10.1016/j.phycom.2016.08.001
using spectral correlation and neural networks. In First IEEE International
Abdelmutalab, A., Assaleh, K., & El-Tarhuni, M. (2016b). Automatic modulation
Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN
classification using hierarchical polynomial classifier and stepwise regression (Vol. 2016-
2005. (pp. 144–150). https://doi.org/10.1109/DYSPAN.2005.1542629.
Septe). https://doi.org/10.1109/WCNC.2016.7565127
Fontes, A. I. R., de M. Martins, A., Silveira, L. F. Q., & Principe, J. C. (2015). Performance
Abrahamzadeh, A., Seyedin, S. A., & Dehghan, M. (2007). Digital-Signal-Type
evaluation of the correntropy coefficient in automatic modulation classification.
Identification Using an Efficient Identifier. EURASIP J. Adv. Signal Process., 2007(1).
Expert Systems with Applications, 42(1), 1–8. https://doi.org/10.1016/j.
https://doi.org/10.1155/2007/37690
eswa.2014.07.023
Ahmadi, N. (2010). Using fuzzy clustering and TTSAS algorithm for modulation
GULDEMIR, H., & SENGUR, A. (2006). Comparison of clustering algorithms for analog
classification based on constellation diagram. Engineering Applications of Artificial
modulation classification. Expert Systems with Applications, 30(4), 642–649. https://
Intelligence, 23(3), 357–370. https://doi.org/10.1016/j.engappai.2009.05.006
doi.org/10.1016/j.eswa.2005.07.014
Alain, G., & Bengio, Y. (2014). What Regularized Auto-Encoders Learn from the Data-
Haring, L., Chen, Y., & Czylwik, A. (2010). Automatic Modulation Classification Methods
Generating Distribution Guillaume. Journal of Machine Learning Research, 15(1),
for Wireless OFDM Systems in TDD Mode. IEEE Trans. Commun., 58(9), 2480–2485.
3563–3593.
https://doi.org/10.1109/TCOMM.2010.080310.090228
Alharbi, H., Mobien, S., Alshebeili, S., & Alturki, F. (2012). Automatic modulation
Hazza, A., Shoaib, M., Alshebeili, S. A., & Fahad, A. (2013). An overview of feature-based
classification of digital modulations in presence of HF noise. Eurasip Journal on
methods for digital modulation classification (Vol. 1,, 1–6. https://doi.org/10.1109/
Advances in Signal Processing, 2012(1), 1–14. https://doi.org/10.1186/1687-6180-
ICCSPA.2013.6487244
2012-238
Ho, K. M., Vaz, C., & Daut, D. G. (2010). Automatic classification of amplitude,
Ali, A. K., & Erçelebi, E. (2019). An M-QAM Signal Modulation Recognition Algorithm in
frequency, and phase shift keyed signals in the wavelet domain. In In IEEE Sarnoff
AWGN Channel. Scientific Programming, 2019, 1–17. https://doi.org/10.1155/2019/
Symposium (pp. 1–6). https://doi.org/10.1109/SARNOF.2010.5469784
6752694
Hossen, A., Al-Wadahi, F., & Jervase, J. A. (2007). Classification of modulation signals
Ali, A., & Yangyu, F. (2017a). Automatic Modulation Classification Using Deep Learning
using statistical signal characterization and artificial neural networks. Engineering
Based on Sparse Autoencoders With Nonnegativity Constraints. IEEE Signal Process.
Applications of Artificial Intelligence, 20(4), 463–472. https://doi.org/10.1016/j.
Lett., 24(11), 1626–1630. https://doi.org/10.1109/LSP.2017.2752459
engappai.2006.08.004
Ali, A., & Yangyu, F. (2017b). Unsupervised feature learning and automatic modulation
Jordanov, I., Petrov, N., & Petrozziello, A. (2016). Supervised radar signal classification
classification using deep learning model. Physical Communication, 25, 75–84. https://
(Vol. 2016-Octob,, 1464–1471. https://doi.org/10.1109/IJCNN.2016.7727371
doi.org/10.1016/j.phycom.2017.09.004
Kharbech, S., Dayoub, I., Zwingelstein-Colin, M., & Simon, E. P. (2016). On classifiers for
An, N., Li, B., & Huang, M. (2010). Modulation Classification of Higher Order MQAM
blind feature-based automatic modulation classification over multiple-
Signals using Mixed-Order Moments and Fisher Criterion. In International
input–multiple-output channels. IET Communications, 10(7), 790–795. https://doi.
Conference on Computer and Automation Engineering (ICCAE) (pp. 150–153). IEEE.
org/10.1049/iet-com.2015.1124
https://doi.org/10.1109/ICCAE.2010.5451214.
Ma, J., & Qiu, T. (2018). Automatic Modulation Classification Using Cyclic Correntropy
Boiteau, D., & Martret, C. L. (1996). Classification of linear modulations by mean of a
Spectrum in Impulsive Noise. IEEE Wireless Commun. Lett., 8(2), 440–443. https://
fourth-order cumulant. In 8th European Signal Processing Conference (EUSIPCO 1996)
doi.org/10.1109/LWC.2018.2875001
(pp. 1–4). IEEE: Trieste, Italy, Italy.
Mashor, M. Y. (1999). Some Properties of RBF Network with Applications to System
Calvo, R. A., Partridge, M., & Jabri, M. A. (1998). A comparative study of principal
Identification, 7(1), 1–37.
component analysis techniques. In Proc. Ninth Australian Conf. on. Neural Networks,
Mobasseri, B. G. (2000). Digital modulation classification using constellation shape.
276–281.
Signal Processing, 80(2), 251–277. https://doi.org/10.1016/S0165-1684(99)00127-9
Cheng, L.i., & Liu, J. (2013). Automatic Modulation Classifier Using Artificial Neural
Muller, F. C. B. F., Cardoso, C., & Klautau, A. (2011). A Front End for Discriminative
Network Trained by PSO Algorithm. JCM, 8(5), 322–329. https://doi.org/10.12720/
Learning in Automatic Modulation Classification. IEEE Commun. Lett., 15(4),
jcm.8.5.322-329
443–445. https://doi.org/10.1109/LCOMM.2011.022411.101637
Chugg, K. M., Long, C., & Polydoros, A. (1996). Combined Likelihood Power Estimation
Nandi, A. K., & Azzouz, E. E. (1998). Algorithms for automatic modulation recognition of
and Multiple Hypothesis Modulation Classification, 1137–1141. https://doi.org/
communication signals. IEEE Transactions on Communications, 46(4), 431–436.
10.1109/ACSSC.1995.540877.
Norouzi, S., Jamshidi, A., & Zolghadrasli, A. R. (2016). Adaptive modulation recognition
Wong, M. L. D., & Nandi, A. K. (2008). Semi-blind algorithms for automatic classification
based on the evolutionary algorithms. Applied Soft Computing, 43, 312–319. https://
of digital modulation schemes. Digital Signal Processing, 18(2), 209–227. https://doi.
doi.org/10.1016/j.asoc.2016.02.028
org/10.1016/j.dsp.2007.02.007
Puengnim, A., Thomas, N., Tourneret, J.-Y., & Vidal, J. (2010). Classification of linear
Dobre, O. A., Abdi, A., Bar-Ness, Y., & Su, W. (2007). Survey of automatic modulation
and non-linear modulations using the Baum–Welch algorithm and MCMC methods.
classification techniques: Classical approaches and new trends. IET Communications,
Signal Processing, 90(12), 3242–3255. https://doi.org/10.1016/j.sigpro.2010.05.030
1(2), 137–156. https://doi.org/10.1049/iet-com:20050176
Sarieddeen, H., & Dawy, Z. (2016). In On the blind classification of parametric quadrature
Ebrahimzadeh, A., & Ghazalian, R. (2011). Blind digital modulation classification in
amplitude modulations (pp. 190–194). Beirut, Lebanon: IEEE. https://doi.org/
software radio using the optimized classifier and feature subset selection. Engineering
10.1109/ACTEA.2016.7560137.
Applications of Artificial Intelligence, 24(1), 50–59. https://doi.org/10.1016/j.
Sengur, A. (2009). Multiclass least-squares support vector machines for analog
engappai.2010.08.008
modulation classification. Expert Systems with Applications, 36(3), 6681–6685.
El-Khamy, S. E., Elsayed, H. A., & Rizk, M. R. M. (2012). Neural Network for
https://doi.org/10.1016/j.eswa.2008.08.066
Classification of Multi-User Chirp Modulation Signals using Wavelet Higher Order
Soares-Filho, W., Soares-Filho, W., Manoel de Seixas, J., Manoel de Seixas, J., Pereira
Statistics. International Journal of Emerging Technology and Advanced Engineering, 2(8).
Caloba, L., & Pereira Caloba, L. (n.d.). Principal component analysis for classifying
ETSI. (2009). Digital Video Broadcasting (DVB); modulation systems for Broadcasting,
passive sonar signals. International Symposium on Circuits and Systems, 592–595.
other broadband satellite applications (DVB-S2). Intellectual Property (Vol. 1).
https://doi.org/10.1109/ISCAS.2001.921380.

16
A.K. Ali and E. Erçelebi Expert Systems With Applications 178 (2021) 114931

Swami, A., & Sadler, B. M. (2000). Hierarchical digital modulation classification using Zhang, H., Bi, G., Razul, S. G., & See, C. M. S. (2014). Supervised modulation
cumulants. IEEE Transactions on Communications, 48(3), 416–429. https://doi.org/ classification based on ambiguity function image and invariant moments. In
10.1109/26.837045 Proceedings of the 2014 9th IEEE Conference on Industrial Electronics and Applications.
Tang, Z. B., Pattipati, K. R., & Kleinman, D. L. (1991). An Algorithm for Determining the https://doi.org/10.1109/ICIEA.2014.6931399
Decision Thresholds in a Distributed Detection Problem. IEEE Transactions on Zhang, W. (2014). Automatic Modulation Classification Based on Statistical Features and
Systems, Man and Cybernetics, 21(1), 231–237. https://doi.org/10.1109/21.101153 Support Vector Machine. In URSI General Assembly and Scientific Symposium (URSI
Ting, F. F., Tan, Y. J., & Sim, K. S. (2019). Convolutional neural network improvement GASS) (pp. 1–4). Beijing, China: IEEE. https://doi.org/10.1109/
for breast cancer classification. Expert Systems with Applications, 120, 103–115. URSIGASS.2014.6929232.
https://doi.org/10.1016/j.eswa.2018.11.008 Zhou, L., & Man, H. (2013). In Wavelet cyclic feature based automatic modulation
Weber, C., Peter, M., & Felhauer, T. (2015). Automatic modulation classification recognition using nonuniform compressive samples (pp. 1–6). Las Vegas, NV, USA: IEEE.
technique for radio monitoring. Electron. lett., 51(10), 794–796. https://doi.org/ https://doi.org/10.1109/VTCFall.2013.6692456.
10.1049/el.2015.0610 Zhou, L., Sun, Z., & Wang, W. (2017). Learning to short-time Fourier transform in
Wei, W., & Mendel, J. M. (2000). Maximum-Likelihood Classification for Digital Amplitude- spectrum sensing. Physical Communication, 25, 420–425. https://doi.org/10.1016/j.
Phase Modulations, 48(2), 189–193. https://doi.org/10.1109/ACSSC.1995.540876 phycom.2017.08.007
Wong, M. L. D., & Nandi, A. K. (2004). Automatic digital modulation recognition using Zhu, Z., & Nandi, A. K. (2014). Blind Digital Modulation Classification Using Minimum
artificial neural network and genetic algorithm. Signal Processing, 84(2), 351–365. Distance Centroid Estimator and Non-Parametric Likelihood Function. IEEE Trans.
https://doi.org/10.1016/j.sigpro.2003.10.019 Wireless Commun., 13(8), 4483–4494. https://doi.org/10.1109/TWC.2014.2320724
Xu, J. L., Su, W., & Zhou, M. (2010). Software-Defined Radio Equipped With Rapid Zhu, Z., & Nandi, A. K. (2015). Automatic Modulation Classification Principles
Modulation Recognition. Transactons on Vechicular Technology, 59(4), 1659–1667. Algorithms and Applications. John Wiley & Sons (First edit). United Kingdom: John
https://doi.org/10.1109/TVT.2010.2041805 Wiley & Sons. https://doi.org/10.1007/978-1-4757-2469-1.
Yuan, Yabo, Zhao, P., Wang, B.o., & Wu, B. (2016). Hybrid Maximum Likelihood Zhu, Z., Waqar, M. A., & Nandi, A. K. (2014). Genetic algorithm optimized distribution
Modulation Classification for Continuous Phase Modulations. IEEE Commun. Lett., 20 sampling test for M-QAM modulation classification. Signal Processing, 94(1),
(3), 450–453. https://doi.org/10.1109/LCOMM.2016.2517007 264–277. https://doi.org/10.1016/j.sigpro.2013.05.024
Ye, H., Cao, F., Wang, D., & Li, H. (2018). Building feedforward neural networks with
random weights for large scale datasets. Expert Systems with Applications, 106,
233–243. https://doi.org/10.1016/j.eswa.2018.04.007