0% found this document useful (0 votes)

10 views

Machine Learning Based Models For Spectrum Sensing in Cooperative Radio Networks

Uploaded by

joao pedro pereira

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Machine Learning Based Models For Spectrum Sensing in Cooperative Radio Networks

Uploaded by

joao pedro pereira

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

IET Communications

Special Section: Cognitive and AI-enabled Wireless and Mobile

Communications

Machine learning-based models for spectrum ISSN 1751-8628

Received on 14th September 2019

sensing in cooperative radio networks

Revised 2nd February 2020
Accepted on 12th March 2020
E-First on 15th May 2020
doi: 10.1049/iet-com.2019.0941
www.ietdl.org

Caio Henrique Azolini Tavares1, Jose Carlos Marinello1, Mario Lemes Proenca Jr1, Taufik Abrao1
1Department of Electrical Engineering, Londrina State University, Rod. Celso Garcia Cid - PR445, PO Box 10.011, CEP: 86057-970, Londrina,

PR, Brazil
E-mail: [email protected]

Abstract: In this study, the authors consider the application of machine learning (ML) models in cooperative spectrum sensing
of cognitive radio networks (CRNs). Based on a statistical analysis of the classic energy detection scheme, the probability of
detection and false alarm is derived, which depends solely on the number of samples and signal-to-noise ratio of the secondary
users. The channel occupancy detection obtained from the established analytical techniques such as maximum ratio combining
and AND/OR rules is compared to different ML techniques, including multilayer perceptron (MLP), support vector machine, and
Naive Bayes, based on receiver operating characteristic and area under the curve metrics. By using standard profiling tools,
they obtain the computational performance of the analysed models during the training phase, a critical step for operating in
CRNs. Ultimately, the results demonstrate that the MLP ML technique presents a better trade-off between training time and
channel detection performance.

1 Introduction Specifically, in the context of CR networks (CRNs), several

research papers related to ML for SS have been published. These
Cognitive radio (CR) systems are a proposed solution to the ML-based sensing techniques aim at detecting the availability of
spectrum scarcity problem in a radio frequency environment that frequency channels by formulating the process as a classification
aims to improve the overall spectrum utilisation. Several studies problem in which the classifier, supervised or unsupervised, has to
showed [1–3] that licensed spectrum bands are often not occupied decide between two states of each frequency channel: free or
by the licensed users, thus creating the opportunity for other occupied.
devices to access the unoccupied spectrum opportunistically. These Moreover, cooperative SS–CRNs have been performed by ML.
opportunistic devices denoted as secondary users (SUs) need to be The existing works can be classified into two main categories. The
able to sense the spectrum to assess the presence or absence of technique in the first category uses two steps. In the first step,
licensed users, denoted as primary users (PUs), either individually unsupervised ML techniques are used to analyse data and discover
or cooperatively. The idea of CRs was first introduced by Joseph the PU's patterns. In the second step, supervised ML techniques are
Mitola III in 1999 [4] but has been given much attention recently used to train the model with the data labelled in the first step [11].
due to the proposed heterogeneous nature of 5G networks [5–8]. For instance, a two-step ML model for SS can be constructed. In
Spectrum sensing in (SS–CRs) still poses a challenge for high- the first step, for instance, the K-means algorithm could be used to
performance and high-energy efficient systems since SS identify the state of the PU's presence. In the second step, support
performance is often proportional to the spectrum sensing period. vector machine (SVM) or other types of classifiers can be used to
In turn, SS–CR is an energy-consuming task that also degrades the attribute the new input data into one of the classes specified by the
spectral efficiency of the SUs since they need to spend time and K-means method used in the first step. Techniques of the second
energy on a task that does not result in transmitted bits. category assume that the classes are known, and they are based on
Machine learning (ML) has received increasing interest and supervised ML to train models. In the current work, we follow the
found application in many fields recently. ML is a way of second category approach. For example, existing works in the
programming computers to optimise a performance criterion using literature that use only one step in which supervised ML classifiers,
example data or experience [9]. Such interest is due to its ability to such as K-nearest neighbour, SVM, Naive Bayes (NB), and
apply complex calculations to evaluate and interpret patterns and to decision tree, are applied [11].
its ability to interpret patterns and structures in data, enabling Since the task of determining the channel status based on SS is
learning, reasoning, and decision making. This apparent self- due to its nature a classification task, several authors have
learning characteristic associated with ML techniques is by itself considered the use of ML models as inference tools. In [12], the
mostly based on applied statistics, whereas the training and authors propose and compare the performance of several
inference capabilities owe their efficiency to great computer supervised and unsupervised ML techniques for cooperative SS
science algorithms. purpose, such as SVMs, the K-means clustering algorithm and
There are several networking open problems being treated Gaussian mixture model, but do not provide a comparison of
under the ML perspective, including (a) traffic prediction; (b) detection performance over different scenarios of interest, such as
traffic classification; (c) traffic routing; (d) congestion control, considering distinct training set sizes or different channel scenarios
including important issues such as queue management, congestion of practical interest. In [13], the authors study the use of ML
inference, and packet loss classification; (e) resource management, algorithms for spectrum occupancy in CRNs, which include the
which comprises admission control and resource allocation Naive Bayesian classifier. In [14], the authors propose user
policies; (f) fault management; (g) quality of service and quality of grouping algorithms to improve SS results and SVM training time
experience management; (h) network security, aggregating and in [15] the authors enumerate the pros and cons of several
anomaly, and intrusion detection; among others. A comprehensive unsupervised and supervised ML techniques applied to SS, such as
survey on ML applied to networking is discussed in [10]. the requirement of data labelling for supervised models and the risk
of overfitting. Shah and Koo [16] proposed a centralised SS–CRN

IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109 3102

© The Institution of Engineering and Technology 2020
17518636, 2020, 18, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-com.2019.0941 by UFF - Universidade Federal Fluminense, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
scheme based on K-nearest neighbour. In the training phase, each where gi is the fading component, Di is the Euclidean distance
CR user produces a sensing report under varying conditions and between the PU and the ith SU and α is the path-loss exponent,
based on a global decision either transmit or stays silent. The local which we consider as 4, indicating a non-line-of-sight channel
decisions of CR users are combined through a majority voting at environment.
the fusion centre and a global decision is returned to each CR user, At the end of the SS period, the estimated normalised energy
implying in a spectral overhead. level at the ith SU is given by
The SVM-based cooperative SS model with the user grouping
method is discussed in [14]. User grouping procedures reduce K
1
cooperation overhead and effectively improve detection yi = ∑ zi(k)2
σn2 k = 1
(3)
performance. Hence, users in CRN are grouped before the
cooperative sensing process using energy data samples and a
proper ML model. The authors compare three grouping methods, Under hypothesis H0, zi(k) = ni(k) ∼ N(0, σn2 ), thus, yi will
the first divides normal and abnormal users into two groups, while follow a central Chi-squared distribution with K degrees of
the second grouping algorithm distinguishes redundant and non- freedom
redundant users, and the third grouping algorithm selects users
within a subset that minimise average correlation. The K
ni(k) 2 K
performances of the three algorithms were quantified in terms of yi = ∑ σn
= ∑ zi(k)^ 2
where z^i(k) ∼ N(0, 1)
the average training time, classification speed, and classification k=1 k=1 (4)
accuracy. ∴ yi ∼ XK 2

Finally, in [17], a low-dimensional probability vector is

proposed as the feature vector for ML-based classification, instead On the other hand, under hypothesis H1, the signal at the ith SU
of the high-dimensional energy vector in a CRN with a single PU
will be the composition of the PU signal scaled by the channel gain
and N SUs. Such a method down-converts a high-dimensional
plus the Gaussian receiver noise. If we assume the transmitted PU
feature vector to a constant two-dimensional feature vector for ML
techniques while keeping the same SS performance. Owing to its signal as zero-mean Gaussian random variable with variance σs2,
lower dimension, the probability vector-based classification then the estimated energy level will follow a Gamma distribution
requires a smaller training duration and a shorter classification with shape K /2 and scale 2(1 + γi)
time.
K 2
hi x(k) + ni(k)
1.1 Contributions
yi = ∑ σn
k=1
This work considers the application of supervised ML models to K
hi2σs2
the task of channel status inference based on cooperative energy = ∑ zi(k)
^ 2
, where z^i(k) ∼ N 0, 1 + (5)
σn2
detection SS. By considering the a posteriori probability of k=1
channel occupancy, we aim to obtain a clear trade-off K
characterisation between computational complexity and ∴ yi ∼ Γ , 2(1 + γi)
2
classification performance for each model when compared to
traditional analytical cooperative SS–CRN techniques, while also where γi is the signal-to-noise (SNR) ratio given by γi = hiσs /σn 2 .
raising relevant issues on the system implementability. On the conventional energy detection scheme, each SU infers
the spectrum occupancy status by comparing the normalised sensed
1.2 Paper organisation energy level to a given threshold λ
The remainder of this paper is organised as follows. Section 2
discusses the system model for the CRN and the statistics of the H1, if yi ≥ λ
si = (6)
PU signal for energy detectors. Section 3 includes the modelling of H0, if yi < λ
well-known analytical models for cooperative SS. Section 4
explores the application of supervised ML techniques. Section 5 The probability of false alarm is defined as Pfa = P(y ≥ λ H0).
shows comparative results of the techniques through Monte–Carlo Likewise, the probability of detection is defined as
simulations. Finally, Section 6 offers remarks and main Pd = P(y ≥ λ H1). Hence, given the statistics of hypothesis H0, we
conclusions. can write Pfa for the ith SU as the right-tail probability of a central
Chi-squared random variable
2 System model
∞
We consider a CRN with one PU and n SUs, where each SU
employs energy detection for SS with sensing period τ and sensing
Pfai = ∫λ
f (yi H0) dy ≜ QX2N (λ) (7)
bandwidth w. The sampling frequency is considered to be at the
Nyquist rate, f s = 2w. Thus, the SU acquires K = 2wτ samples at In turn, the threshold parameter λ can be obtained from (7) by
each sensing period. fixing a target false alarm probability (Pfa∗) as
We can write the signal at the ith SU according to two
hypotheses −1
λ = QX2N (Pfa∗) (8)
H1 : zi(k) = hi x(k) + ni(k), if the PU is active
(1) Using the definition of the incomplete Gamma function, (8) can be
H0 : zi(k) = ni(k), otherwise rewritten as [18]

where hi is the channel coefficient from the PU to the ith SU, x(k) K
λ = 2Γ−1 (9)
∗
u Pfa,
is the transmitted PU signal, and ni(k) is the noise at the ith SU 2
receiver considered to be a zero-mean Gaussian random variable
with variance σn2 . where Γu(x, n) is the upper incomplete Gamma function, defined as
The channel coefficient hi is described by the path-loss and ∞
fading components Γu(x, n) =
1
Γ(x) ∫ n
t x − 1e− tdt (10)
α
hi = giDi
−
2 (2)

IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109 3103

∞
Pdi = ∫λ
f (yi H1) dy ≜ QΓ λ; K /2, 2(1 + γi) (11)

which, in turn, can be rewritten as

λ K
Pdi = Γu ,
2(1 + γi) 2 (12)

3 Conventional SS–CRN techniques Fig. 1 Example of a CR scenario training phase. The SUs provide the
fusion center with y ∈ ℝM × 1 energy samples, while the PU provides the
So far, the discussed methodology for the ith SU to assess the
corresponding channel status vector d
presence or absence of a PU involves only the estimated energy
level on the ith SU. Now, we shall enumerate and briefly analyse
classical analytical methods of cooperative SS, where the estimated
energy level on all SUs are transmitted through a service layer to a
fusion centre, which will apply deterministic decision rules aiming
at deciding cooperatively whether a PU is present or not by
combining the SU individual decisions. In the following, the three
main deterministic decision rules for SS cooperative networks,
namely AND, OR, and maximum ratio combining (MRC) rules are
revisited.

3.1 AND rule

In the AND rule, the fusion centre decides the occupancy state of
the channel by comparing the sensed energy level of each SU to the
threshold λ from (9) and then applying the AND logical operation
on the result, as shown below

^ 1, if (s^1 ⊙ s^2 ⊙ …s^N ) = 1

S= (13)
0, otherwise

where ⊙ denotes the logical AND operator.

Fig. 2 Example of non-separable ℝ3 input space. Each dimension is the
3.2 OR rule received energy level on a SU
The OR rule is similar to the AND rule, in which it compares the
energy level from each SU to λ and then applies the logical OR The primary motivation for using ML models in SS is the ability to
operation on the result operate in the absence of knowledge of the CR network
parameters, such as the signal-to-noise ratio γi in SUs and the a
^ 1, if (s^1 ⊕ s^2 ⊕ …s^N ) = 1 priori hypothesis probabilities P(H0) and P(H1). The only
S= (14) requirement in the case of supervised learning methods is a set of
0, otherwise
labelled energy samples in order to train the models. In practical
terms, this means that the PUs need to cooperate with the training
where ⊕ denotes the logical OR operator.
phase of the SUs by providing the channel status regularly. As
shown in Fig. 1, a simple CRN configuration with N = 2 SUs, one
3.3 Maximum ratio combining (MRC) PU, and one fusion centre is sketched.
The MRC technique estimates the channel status by combining the The general objective of a supervised ML model is to devise a
weighted energy levels obtained at each SU and communicated system that given a set of training inputs and the corresponding
with the fusion centre labelled output, such a system is able to infer on the output of an
unseen example or scenario with a certain level of confidence.
N In the context of channel status inference, this means that given
^
S=
H1, if ∑ wiyi ≥ λ (15)
a set Ytrain ∈ ℝM × N of M energy samples detected by N cooperative
i=1
SUs and a corresponding set d ∈ ℝM × 1 where dm ∈ {0; 1} of
H0, otherwise channel status, we want to obtain a hypothesis function
^
h(y): ℝN → ℝ, which can estimate the channel status S, i.e. for a
where wi is the weight for the ith SU energy level defined as the given ym, we want to be able to predict dm. For clarification
normalised average SNR over N cooperate nodes γ¯i, seen on each purpose, Fig. 2 shows the received energy level on three SUs and
SU, i.e. wi = γ¯i / ∑N γ¯i and λ is given by (8). the corresponding channel status for M = 104 samples. Clearly, the
This technique achieves optimum performance at the cost of classes are non-separable in input space.
increasing complexity and overall spectral efficiency reduction, Concretely, the model training procedure can be seen as
due to the requirement of estimating and transmitting the SNR and obtaining a hypothesis function such as the linear
exact energy level of each SU to a fusion centre (centralised
decision). h(y) = yw + b (16)

4 ML techniques
3104 IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109
© The Institution of Engineering and Technology 2020
17518636, 2020, 18, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-com.2019.0941 by UFF - Universidade Federal Fluminense, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
where w ∈ ℝN × 1 optimally separates the two classes or channel
status by minimising an error function E(h(Ytrain), d). The decision
region of (16) can also be interpreted as the hyperplane h(y) = 0.
With this in mind, we can write down the general goal of a
supervised ML problem:
M
1
M i∑
minimise E(h(yi), di) (17)
w, b =1

where w and b are the parameters of the hypothesis function in

(16), and E( ⋅ ) is the error function.
In simple terms, most supervised ML models differ only in
three aspects Fig. 3 General representation of a feed-forward neural network with one
hidden layer
(a) implementation of the error function E( ⋅ );
(b) formulation of the hypothesis function h(y); probability p(H1 y), whereas the probability p(H0 y) is given by
(c) optimisation methodology applied on E( ⋅ ). 1 − o. With this in mind, we can write the conditional distribution
of desired outputs as a Bernoulli distribution [19]
4.1 Multilayer perceptron (MLP)
p(d y, w) = od 1 − o 1−d
(23)
Neural networks are known as universal function approximators
and perhaps the most well-known ML technique. The neural where the set of all weights and biases have been combined into
network maps the input vector y to the output o through a set of vector w.
weighted non-linear functions. Given its relative simplicity and By taking the negative log likelihood of (23) and by considering
success in the context of pattern recognition [19], we consider a that the training set is composed of independent observations, the
MLP with one hidden layer, which is based on Rosenblatt error function becomes the cross-entropy error function for the
perceptron developed in [20]. input vector ym [19]
In the forward phase, an input signal flows from left to right of
the network. The signal value at the input of any given neuron j at M
layer ℓ can be written as E(w) = − ∑ Em(w)
m=1
(24)
a(jℓ) = ∑ wiℓj oiℓ( ) ( − 1)
(18) M
i = − ∑ dmln om + (1 − dm)ln(1 − om)
m=1
where oi0 = yi .
Similarly, the output of any neuron j is given by Since it is not possible to arrive at the optimum weight vector
analytically [19], we can resort to a numerical approach such as
o(jℓ) = σ a(jℓ) (19) stochastic gradient descent (25) in order to find w that minimises
(24)
where σ( ⋅ ) is the activation function, which has only two
w[τ + 1] = w[τ] − η∇Em(w[τ]) (25)
requirements: to be differentiable and non-linear. Often chosen to
be the logistic sigmoid function or hyperbolic tangent. In this
study, we chose the former where η > 0 is known as the learning rate parameter and controls
the step size of the update to the weight vector.
1 A prevalent method for obtaining the error function gradient on
σ(x) = (20) each neuron is known as back-propagation. Putting simply, we
1 + e− x
propagate an error signal from the output nodes to the hidden nodes
and apply a penalty to the associated weight according to the cost
Being specific, we consider a neural network with one hidden
function. For a network with Θ outputs and two layers (same as
layer, one output unit, i.e. Θ = 1 (since we are interested in binary
depicted in Fig. 3), the error on the output node θ is trivial
classification) and with the number of inputs equal to the number
of SUs, i.e. I = N. Also, we consider the number of neurons of the
δθ = oθ − dθ (26)
hidden layer to be equal to the number of inputs N, so J = N.
Therefore, we can write the output of any neuron in the hidden
layer as whereas on the jth hidden neuron

N δ j = σ ′ a(1)
j ∑ wθ j δθ
(2)
(27)
(1)
oj = σ ∑ wi j yi(1)
(21) θ
i=0
where σ ′( ⋅ ) is the derivative of the activation function σ( ⋅ ).
and similarly for the output neurons Once we have the errors for every neuron, we can easily obtain
the derivative
N
oθ(2) = σ ∑ w jθ o j
(2) (1)
(22) ∂Em
j=0 = δ joi (28)
∂wi j
in both cases, y0 and o0(1) are known as bias inputs and are equal to Finally, once the neural network is trained, the channel status
1. They are necessary to shift the activation function away from the inference for an unseen example is made based on
origin.
By considering a training vector of desired binary outputs H1, if o ≥ 1 − Pfa∗
^
d ∈ ℝM × 1 with dm ∈ {0; 1} for M input vectors y, we can interpret Smlp = (29)
H0, otherwise
the output o of the neural network (22) as the conditional

IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109 3105

Furthermore, by considering that f (yi H j) is Gaussian, f (y H j)

becomes a multivariate Gaussian distribution with a diagonal
covariance matrix. Indeed, as the number of samples K increases,
zi(k) under hypothesis H0 can be well approximated by a Gaussian
distribution with mean K and variance 2K. Likewise, zi(k) under Fig. 4 Illustration of a SVM decision plane. The optimal hyperplane
hypothesis H1 can be well approximated by a Gaussian distribution separates the two classes (filled circles and empty circles) at half distance
with mean K(1 + γi) and variance 2K(1 + γi)2. of the maximum margin defined by the support vectors
Considering a training set with M samples of y and d, the NB
model can estimate the pdf parameters of each hypothesis by dm = − 1. Moreover, one can apply (36) to maximise the decision
applying the maximum likelihood principle margin of the hyperplane h(y) = 0. By taking into account that the
distance of any point y to the decision surface is given by
{H1 0}
M
1
μ^ i
{H1 0}
= ∑ yi[m]{H1 0} dm wTϕ(ym) + b
(37)
M {H1 0} m=1 ∥w∥
{H }
(31)
^ {H1 0}
M 10
1 {H1 0} 2
σi2 = ∑ yi[m]{H1 0} − μ^ i we choose to define that for the closest point to the decision
M {H1 0} − 1 m=1 surface, dm(wTϕ(ym) + b) = 1. Thus, the following optimisation
problem can be formulated [21]:
where superscript {H1 0} refers to the section of the training set,
which comprises hypothesis H1 or H0. 1
After the training phase, the NB classifier can estimate the minimise ∥ w ∥2
w, b 2 (38)
channel occupancy a posteriori probability given the detected
energy y through Bayes theorem s.t. dm(wTϕ(ym) + b) ≥ 1, m = 1, …, M

f (y H1)P(H1) where the constraint in (38) guarantees that every sample is

P(H1 y) = (32) correctly classified. The points ym for which this constraint is met
f (y H1)P(H1) + f (y H0)P(H0)
with the equality sign are called support vectors. The support
where P(H1) and P(H0) are the a priori probabilities of each vectors completely characterise the SVM model, while the rest of
hypothesis, estimated as follows: the training samples are entirely irrelevant.
Notwithstanding, (38) assumes that the classes can be perfectly
M {H1} separated in feature space, which is not always true in our SS
P(H1) = (33) context. To address the case of overlapping classes, we can modify
M
the constraint by introducing the slack variable δm and an
additional overlap budget ξ. Therefore, (38) can be rewritten as
M {H0} (34)
P(H0) = [12, 19]
M
1
where M {Hi}, i = 0, 1 is the number of the ith hypothesis minimise ∥ w ∥2
w, b 2
occurrences.
Finally, the NB channel status is inferred based on the s.t. (c.1) dm(wTϕ(ym) + b) ≥ 1 − δm, m = 1, …, M
(39)
evaluation (c.2) δm ≥ 0, m = 1, …, M
M
^
Snb =
H1, if P(H1 y) ≥ 1 − Pfa∗
(35)
(c.3) ∑ δm ≤ ξ
m=1
H0, otherwise
where ξ is a constant used to control the trade-off between
4.3 Support vector machine (SVM) minimising training errors and controlling the model complexity
The SVM is a classification technique aiming at finding a linearly (which can be used as a heuristic to avoid overfitting).
separable hyperplane with a maximum margin between the classes The problem in (39) is quadratic with linear constraints and the
by applying a kernel function κ(x, x′) to the input vector in order to following Lagrange primal function [21]:
increase its dimension from input space to feature space (see Fig. M
4). It is worth noting that for the SVM, the desired channel status ℒ(w, b, δ, α, μ) =
1
∥ w ∥2 + ξ ∑ δm
needs to be dm ∈ { − 1; 1}, representing hypotheses H0 and H1, 2 m=1
respectively. M M
(40)
Therefore, given the following linear model: − ∑ αm dmh(ym) − 1 + δm − ∑ μmδm
m=1 m=1
h(y) = wTϕ(y) + b (36)
where αm and μm are Lagrange multipliers.
where ϕ( ⋅ ) is a feature-space transformation function, our goal is By setting the derivatives to zero, we have
to find w and b such that h(ym) > 0 if dm = 1 and h(ym) < 0 if

3106 IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109

M
where
∂ℒ
∂b
=0→ ∑ αmdm = 0 (42)
1 dm + 1
m=1 pm = , tm = .
1+e Ah( ym) + B 2
∂ℒ
= 0 → αm = ξ − μm (43)
∂δm Finally, the channel status inference using the SVM approach
results
From (40)–(43) we have the following set of Karush–Kuhn–
Tucker conditions:
^
^ H1, if P(h(y)) ≥ 1 − Pfa∗
Ssvm = (50)
H0, otherwise
αm ≥ 0 (44a)

dmh(ym) − 1 + δm ≥ 0 (44b) 5 Numerical results

To assess the performance of the models mentioned above on
αm dmh(ym) − 1 + δm = 0 (44c) channel status inference, we ran Monte–Carlo simulations with
5 × 104 realisations, considering a scenario with one PU and 3 SUs
μm ≥ 0 (44d) depicted at Fig. 5, as well as the parameters on Table 1 under
additive white Gaussian network (AWGN) and Rayleigh flat fading
δm ≥ 0 (44e) channels.
We compared the ML methods with traditional analytical
μmδm = 0 (44f) methods AND, OR, and MRC. For the SVM, we have considered both
the linear and Gaussian kernel functions. As for the MLP, we
By substituting (41)–(44a)–(44f) into (40) we can obtain the considered a network of one hidden layer with a size equal to the
dual problem w.r.t. the support vector number of inputs and only one output.

M M M 5.1 SS performance under AWGN channels

∑ αm − 12 ∑ ∑ αmαndmdnκ(ym, yn)
~
ℒ(α) = (45)
m=1 m=1n=1 For each model, we evaluated the receiver operating characteristic
(ROC) curve, depicted in Figs. 6a and b. By visual inspection, we
where κ(ym, yn) = ϕ(ym)Tϕ(yn) is a kernel function such as the can notice the upper bound on performance defined by the MRC
technique, followed closely by the SVM with a linear kernel.
linear ymT yn. A kernel function, in turn, can be formally defined as
Alongside the cooperative techniques, we have the plot of the
in [22]: a function that computes the inner product of the images ROC curves obtained by individual energy detection on each SU.
produced in the feature space under the embedding ϕ of two data Owing to the PU distance differences, the average SNR level
points in the input space. obtained were γ¯1 ≃ − 2 dB, γ¯2 ≃ − 9 dB, and γ¯3 ≃ − 14 dB. This
Finally, one can formulate the following optimisation problem difference becomes apparent on the channel detection performance
[12, 21, 23]: displayed by each SU.
~
maximise ℒ(α)
α
s.t. (c.1) 0 ≤ αm ≤ ξ
(46)
M
(c.2) ∑ αmdm = 0
m=1

which can be solved using standard quadratic programming

techniques.
By defining α⋆ as the solution to the dual problem (46) and b⋆
as the solution of the primal problem (39), as well as modifying
(36) to be expressed in terms of α⋆, we can obtain the output of the Fig. 5 CRN scenario for evaluation purpose
SVM to an unseen example according to
M Table 1 Adopted system parameters in the Monte–Carlo
h(y) = ∑ αmdmκ(y, ym) + b
⋆ ⋆
(47) simulations
m=1
Parameters Value
It is worth noting that in order to predict the output for an bandwidth w = 5 MHz
unseen example, the SVM retains only the support vectors of the sampling frequency f s = 10 MHz
training data set, i.e. where αm is non-zero. noise power spectral density η0 = − 152 dBm/Hz
Once we obtain the SVM output h(y), we can decide the PU active probability P(H1) = 0.5
channel status by first converting the output to the estimated a
^ PU transmission power σs2 = 0.1 mW
posteriori probability P(H1 y) as [24]
SU1 → PU distance 500 m
^ 1 SU2 → PU distance 750 m
P(h(y)) = Ah( y) + B (48) SU3 → PU distance 1000 m
1+e
sensing time-interval τ = 5 μs
where parameters A and B can be found by minimising the negative number of samples K = 2wτ = 50
log-likelihood function of the training data training data set size κ ∈ [50; 100; 250; 500; 1000]

IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109 3107

Table 2 AUC results for AWGN and Rayleigh channels Fig. 7 ROC curves for the different techniques under Rayleigh fading
Technique AUC AWGN AUC Rayl. channel
(a) ROC curve, (b) Zoom into the ROC 0.9Pd / 0.1Pfa interest region
AND 0.7186 0.7370
OR 0.9302 0.9256
MRC 0.9616 0.9240
NB 0.9594 0.9244
SVM-linear 0.9613 0.9360
SVM-Gaussian 0.9604 0.9327
MLP 0.9609 0.9359
Bold values indicate the high performance for the SS method, i.e. AUC > 0.96
(AWGN channels) or AUC > 0.93 (Rayleigh channels).

To better evaluate the results of each model, we also obtained

the area under the curve (AuC) metric in Table 2. Hence, from Fig.
6 and Table 2 we notice the ML methods perform much better than
the more straightforward AND or OR techniques with results very
close to the MRC centralised SS technique.

5.2 SS performance under Rayleigh fading channels

Fig. 8 Variation of AUC results for different κ training samples for the SS–
On the other hand, when considering a flat Rayleigh fading CRNs operating under Rayleigh channels
channel, the ML techniques were able to attain better performance
than MRC, as can be seen in Fig. 7 and Table 2, because MRC
5.3 Complexity
considers the average SNR level over each SU, which, under a flat
Rayleigh fading channel, varies for each sensing period. Indeed, To compare the computational burden complexity of each
from Table 2, one can notice that the MLP and linear-SVM have technique, we obtained the time spent during the training and
achieved the highest AUC values, similarly from the Gaussian inference phase for a data set of κ = 500 training samples,
channel results. averaged over 20 rounds. Considering that all models output a
To assess the effects of the number of training samples on the posteriori probability of channel occupancy, while the inference
final AUC metric under Rayleigh channel, we variated training set phase has significantly less computation time compared to the
size from 50 to 1000 samples. Fig. 8 shows the resulting for the training phase. Table 3 summarises the time spent for each SS
AUC variation. Clearly, all analysed ML techniques benefit from the technique in both training and inference phases of the analysed
increase of the training set size, with the sharpest difference CRN scenario of Fig. 5.
between 50 and 500 samples. It is apparent that the suitable As a drawback of increasing the size of the training set, Fig. 9
performance-complexity trade-off occurs for κ = 100 training shows the increase in time spent during the training phase for the
samples for the SVM Linear and Gaussian and κ = 250 training techniques. It is worth noting that only the NB model remains with
samples for the MLP and NB ML techniques. almost constant training time for increased training sets.

3108 IET Commun., 2020, Vol. 14 Iss. 18, pp. 3102-3109

8 References
[1] Chen, Y., Oh, H.-S.: ‘A survey of measurement-based spectrum occupancy
modeling for cognitive radios’, IEEE Commun. Surv. Tutor., 2014, 18, (1), pp.
848–859
[2] Sun, Z., Bradford, G.J., Laneman, J.N.: ‘Sequence detection algorithms for
PHY-layer sensing in dynamic spectrum access networks’, IEEE J. Sel. Top.
Signal Process., 2010, 5, (1), pp. 97–109
[3] FCC: ‘Spectrum policy task force’. ET Docket No. 02-155, November 2002
[4] Mitola, J., Maguire, G.Q.: ‘Cognitive radio: making software radios more
personal’, IEEE Pers. Commun., 1999, 6, (4), pp. 13–18
[5] Tseng, F.-h., Chou, L.-d., Chao, H.-c., et al.: ‘Ultra-dense small cell planning
using cognitive radio network toward 5G’, IEEE Wirel. Commun., 2015, 22,
(6), pp. 76–83
[6] Jia, M., Gu, X., Guo, Q., et al.: ‘Broadband hybrid satellite-terrestrial
communication systems based on cognitive radio toward 5G’, IEEE Wirel.
Commun., 2016, 23, (6), pp. 96–106
[7] Chae, S.H., Jeong, C., Lee, K.: ‘Cooperative communication for cognitive
Fig. 9 Time spent during training phase for different sizes of training sets satellite networks’, IEEE Trans. Commun., 2018, 66, (11), pp. 5140–5154
[8] Li, B., Fei, Z., Chu, Z., et al.: ‘Robust chance-constrained secure transmission
The computational metrics were evaluated on a consumer for cognitive satellite–terrestrial networks’, IEEE Trans. Veh. Technol., 2018,
67, (5), pp. 4208–4219
laptop with Intel i7 5500U @2.4 GHz and 16 GB DDR3 @1600 [9] Alpaydin, E.: ‘Introduction to machine learning’ (The MIT Press, USA, 2014,
MHz RAM. Based solely on the results from Table 3 and the AUC 3rd edn.)
metrics from Table 2, the MLP achieves the best performance- [10] Boutaba, R., Salahuddin, M.A., Limam, N., et al.: ‘A comprehensive survey
computational complexity trade-off operating in both AWGN or on machine learning for networking: evolution, applications and research
opportunities’, J. Internet Serv. Appl., 2018, 9, (1), p. 16. Available at https://
Rayleigh channels when compared to the other ML models. doi.org/10.1186/s13174-018-0087-2
Nonetheless, it is worth noting that besides time spent training, [11] Arjoune, Y., Kaabouch, N.: ‘A comprehensive survey on spectrum sensing in
the ML techniques have the requirement of cooperation from the cognitive radio networks: recent advances, new challenges, and future
PU to provide the fusion centre with the channel status vector, research directions’, Sensors, 2019, 19, (126), pp. 1–32. Available at https://
doi.org/10.3390/s19010126
which must be taken into account, a system complexity of [12] Thilina, K.M., Choi, K.W., Saquib, N., et al.: ‘Machine learning techniques
implementing such models. for cooperative spectrum sensing in cognitive radio networks’, IEEE J. Sel.
On the other hand, while the MRC technique provides almost Areas Commun., 2013, 31, (11), pp. 2209–2221
instantaneous channel status inference based on energy samples, it [13] Azmat, F., Chen, Y., Stocks, N.: ‘Analysis of spectrum occupancy using
machine learning algorithms’, IEEE Trans. Veh. Technol., 2015, 65, (9), pp.
requires the SUs transmit to the fusion centre in a completely 6853–6860
centralised way their estimated SNR, which can also pose a further [14] Li, Z., Wu, W., Liu, X., et al.: ‘Improved cooperative spectrum sensing model
challenge for implementation while reducing the overall spectral based on machine learning for cognitive radio networks’, IET Commun.,
efficiency of the CRN. 2018, 12, (19), pp. 2485–2492
[15] Bkassiny, M., Li, Y., Jayaweera, S.K.: ‘A survey on machine-learning
techniques in cognitive radios’, IEEE Commun. Surv. Tutor., 2012, 15, (3), pp.
6 Discussion and final remarks 1136–1159
[16] Shah, H.A., Koo, I.: ‘Reliable machine learning based spectrum sensing in
In this study, we compared the use of ML models, namely the cognitive radio networks’, Wirel. Commun. Mob. Comput., 2018, 2018, p. 17,
SVM with linear and Gaussian kernel functions, a feed-forward no. ID 5906097. Available at https://doi.org/10.1155/2018/5906097
[17] Lu, Y., Zhu, P., Wang, D., et al.: ‘Machine learning techniques with
neural network with one hidden layer and NB method with those probability vector for cooperative spectrum sensing in cognitive radio
well-established analytical models in the context of SS problem for networks’. 2016 IEEE Wireless Communications and Networking Conf.,
CRs. Doha, Qatar, April 2016, pp. 1–6
Numerical results demonstrated that the studied models are [18] Umar, R., Sheikh, A.U.H., Deriche, M.: ‘Unveiling the hidden assumptions of
energy detector based spectrum sensing for cognitive radios’, IEEE Commun.
proved suitable for the task of channel status inference from energy Surv. Tutor., 2013, 16, (2), pp. 713–728
samples obtained from multiple SUs. Based on the ROC curve and [19] Bishop, C.M.: ‘Pattern recognition and machine learning’ (Springer, USA,
AUC metrics, we conclude that all ML models performed closely to 2006)
the optimum MRC analytical technique under AWGN channels. [20] Rosenblatt, F.: ‘The perceptron: a probabilistic model for information storage
and organization in the brain’, Psychol. Rev., 1958, 65, (6), pp. 386–408
Interestingly enough, under Rayleigh flat fading channels, all [21] Hastie, T., Tibshirani, R., Friedman, J.: ‘The elements of statistical learning’
ML techniques outperformed the MRC due to varying SNR levels (Springer Inc., New York, 2001, 2nd edn.)
on SUs over each sensing period. It becomes clear that MRC needs [22] Shawe-Taylor, J., Cristianini, N.: ‘Kernel methods for pattern analysis’
to perform channel response estimation in order to be able to (Cambridge University Press, New York, NY, USA, 2004)
[23] Haykin, S.S.: ‘Neural networks and learning machines’ (Pearson Education,
perform well on fading channels, which could be a challenge under USA, 2009, 3rd edn.)
CRNs. [24] Platt, J.C.: ‘Probabilistic outputs for support vector machines and
By using standard profiling tools, we were able to obtain comparisons to regularized likelihood methods’, in Alexander, J., Smola, P.B.,
computational performance metrics for each ML model evaluated Bernhard, S., et al. (Eds.): ‘Advances in large margin classifiers’ (MIT Press,
USA, 1999), pp. 61–74
during the training and inference phases. For small CRNs context,
the results demonstrated an advantage of MLP technique followed