Deep Learning For Synchronization and Channel Estimation in NB IoT Random Access Channel PDF
Deep Learning For Synchronization and Channel Estimation in NB IoT Random Access Channel PDF
Supervisor:
Petar Popovski
Author: Industry supervisor:
Mads Helge Jespersen Milutin Pajovic
(Mitsubishi Electric
Research Laboratories)
Copyright © Group 1050, Wireless Communication Systems 4th semester, Aalborg University 2019
This report is compiled in LATEX. Figures are made using Inkscape.
Department of Electronic Systems
Wireless Communication Systems
Aalborg University
http://www.aau.dk
Title: Abstract:
Deep Learning for Synchronization Effective decoding of wireless signals requires
and Channel Estimation in NB-IoT various parameter acquisition techniques in-
Random Access Channel cluding user activity detection, synchronization,
channel estimation, and channel equalization.
Project Period: In traditional systems, these unknown, under-
Spring Semester 2019 lying parameters of the communication channel
are individually estimated. This work proposes
Project Group: a novel joint estimation process applying deep
Group 1050 learning. The proposed method shows supe-
rior performance to traditional methods and
Participant:
is further able to find the multiplicity of colli-
Mads Helge Jespersen
sions, handle synchronization and channel esti-
mation in the case of colliding non-orthogonal
Supervisor:
transmissions, and is able to discover supe-
Petar Popovski
rior preamble sequences using an auto-encoder
Industry supervisor: structure. The proposed method is intended for
Milutin Pajovic (Mitsubishi Electric decoding transmissions at the base-station in a
Research Laboratories) massive connectivity scenario with many low-
complexity devices operating concurrently. Ex-
Number of Pages: cellent performance is demonstrated in estimat-
51 ing Time-of-Arrival (ToA), Carrier-Frequency
Offset (CFO), channel gain and collision multi-
Date of Completion: plicity from a received mixture of transmissions
June 6th, 2019 using the random access preamble structure
structure of the NB-IoT standard. The pro-
posed estimation scheme, employing a convolu-
tional neural network (CNN), achieves a ToA
Root-Mean-Square Error (RMSE) of 2.88 µs
and a CFO RMSE of 3.44 Hz at 10 dB Signal-
to-Noise Ratio (SNR), whereas a conventional
estimator using two cascaded stages have RM-
SEs of 16.20 µs and 7.98 Hz, respectively.
The content of this report is freely available, but publication may only be pursued with reference.
P R E FA C E
This thesis is written as a part of the Master of Science in Engineering - Wireless
Communication Systems at Aalborg University, Denmark.
The author would like to thank Milutin Pajovic, Toshiaki Koike-Akino and Ye Wang
from Mitsubishi Electric Research Laboratories for their collaboration and supervision
during the work of this thesis.
For citation the report employs IEEE referencing method. If citations are not present
by figures or tables, these are made by the author of the thesis. All units are indicated
according to the SI system.
V
TA B L E O F C O N T E N T S
Preface V
Abbreviations IX
1 Introduction 1
1.1 Project scope . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Machine learning in wireless physical layer . . . . . . . . . . . . . 3
2 Narrowband IoT 5
2.1 Preamble design . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Baseband model of Narrowband Physical Random Access CHannel . . . 7
2.3 Connection establishment in NB-IoT . . . . . . . . . . . . . . . 10
2.4 Traditional synchronization parameter estimation . . . . . . . . . . 11
3 Deep Learning Estimator 15
3.1 Deep learning basics . . . . . . . . . . . . . . . . . . . . . 15
3.2 Estimation procedure . . . . . . . . . . . . . . . . . . . . . 16
3.3 Estimation of the Number of Users. . . . . . . . . . . . . . . . 17
3.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 17
3.5 Network Implementation . . . . . . . . . . . . . . . . . . . . 18
4 Results 21
5 Auto-encoder 27
5.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Discussion and conclusion 31
6.1 Future improvements . . . . . . . . . . . . . . . . . . . . . 31
6.2 The performance of DL in NB-IoT . . . . . . . . . . . . . . . . 31
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 32
Bibliography 33
Appendices 37
A Cramér–Rao lower bound 39
B Literature Study 41
C GlobeCom article 46
VII
A B B R E V I AT I O N S
IX
1 INTRODUCTION
An emerging communication paradigm is to provide low-power and low-complexity
devices with wireless Internet connectivity which is commonly referred to as Internet of
Things (IoT). A challenging task in IoT development is to transform the existing human-
centered communication structure into an object-centered communication system. This
challenge involves redesigning many aspects of the protocol such as data representation
and even changing the physical layer.
It is common to divide the communication system into three distinct services: a
basic stable connection, low-power massive connectivity and reliable communication. In
the 5G standard these are classified as Enhanced Mobile BroadBand (eMBB), massive
Machine-Type Communication (mMTC) and Ultra Reliable Low Latency Communication
(URLLC) [1]. In the traditional view of IoT communication, devices are expected to
generate data packets in the order of bits that are transmitted sporadically which is
consistent with the mMTC scenario and this project focuses only on low-power massive
connectivity. In order to save power, most devices will not be constantly connected
and are therefore required to establish a new connection for every data transmission
[2]. Further, the low-complexity of devices means that short-range multi-hop type
communication systems may not a suitable. For this reason, a cellular communication
structure is expected to play a significant role in IoT connectivity [3]. The performance
of the entire IoT system relies on which technology is used and most prominent cellular
IoT technologies using unlicensed frequency bands are Sigfox and LoRa. Unlicensed
spectrum communication suffers from cross-technology interference and very limited
available bandwidth. Narrowband IoT (NB-IoT) is standard proposed to operate
in the licensed Long-Term Evolution (LTE) spectrum by the 3GPP meaning it is
backed by already-established infrastructure owners and is expected to impact the IoT
connectivity landscape [2, 4]. The potentially massive number of devices attempting to
gain communication access through the Base Station (BS) simultaneously may become
a system bottleneck [2]. Much attention must therefore be paid to reduce signaling
overhead and improve efficiency of detection algorithms in the random access phase.
Random access is used to request uplink allocation from the base station and the random
access procedure has a high impact on device battery life and number of devices that
can be supported concurrently [5].
A typical random access procedure is initiated by a user that has a packet to transmit,
which transmits a random access preamble. The random access preamble is designed
such that the base station is able to efficiently detect the transmitting user and estimate
any timing offset between the user and base-station from the received signal. The timing
offset comprises of propagation time, downlink synchronization errors and channel delay
spread [6].
NB-IoT is a recent standard proposed by the 3rd Generation Partnership Project (3GPP)
to accommodate the emerging number of wireless devices connected to the Internet and
is chosen as the example application in this project. NB-IoT is designed to co-exist with
LTE and provide low-cost and low-power devices with low throughput connectivity.
In NB-IoT random access there are only a relatively small number of different (i.e.,
1
2 1. Introduction
orthogonal) random access preambles for users to choose from. Users attempting to
gain channel access at the same time choose preambles in an uncoordinated manner.
It is likely that two or more users choose the same preamble, given a possibly large
number of users and relatively small number of orthogonal preambles. Transmitting
using the same preamble results in colliding packets which results in either discarding
both transmissions or only discarding one. The resulting collision may lead to a user
back-off time of up to almost 9 minutes [7, 8]. The main purpose of the random access
preamble is to make the base station able to detect each transmitting user and estimate
the synchronization parameters, Time of Arrival (ToA) and Carrier Frequency Offset
(CFO), of each user [6, 9].
In order to avoid unnecessary backoff periods and consequently improve channel
utilization and overall capacity of the NB-IoT system, this project seeks to find a method
for multi-user detection and synchronization parameter estimation in the case of packet
collision. The non-coherent addition of colliding signals means that the received signal
is simply a superposition of the individual transmissions and still contain information
from each user. Multi-user detection is a method used to increase capacity by detecting
interference and exploiting it to mitigate its effect on the desired signal [10]. Optimal
multi-user detection methods use the Viterbi algorithm which has a high complexity
which increases exponentially with the number of users. Further, most methods require
exact channel information at the receiver [11]. The optimal multi-user detector is not
used in practice but instead approximation such as Successive Interference Cancellation
(SIC) and turbo receivers are used instead [10]. Methods exist for multi-user detection
but traditional methods have several drawbacks for the scope of this project:
Optimal decoding has exponentially increasing complexity with the number of
users
The receiver require channel information of each user
Multi-user detection methods in literature are not developed for synchronization
and channel estimation
The receiver requires knowledge of the number of interfering signals
Deep Learning (DL)-based methods are well-suited for tackling algorithm deficit
problems [12] and can utilize non-linear relationships present in the signal to extract
the desired information. DL shows good results for source separation in speech pro-
cessing and is well suited for classification problems such as classifying the number of
interfering signals. DL algorithms are straightforward to develop and are typically not
computationally complex.
In summary this project proposes a DL-based method for separating colliding users,
detecting their number and estimating their respective ToAs and CFOs. The pro-
posed method is validated using simulations which demonstrate significantly improved
performance compared to the conventional approaches.
are estimated using the residual phase difference between symbol groups and channel
hops in a two-stage procedure in [9]. With the goal to improve the ToA estimation, [13]
suggests a novel hopping pattern that renders more accurate ToA estimation compared
to that achieved with the already defined NB-IoT preamble.
The application of machine learning has shown promise in the physical layer where
optimal algorithms, e.g., in multi-user networks, tend to be computationally complex.
Neural networks have been previously employed to perform detection and successive
interference cancellation in multi-user CDMA systems [12]. For an overview of deep
learning applied to the wireless physical layer see Appendix B.
2 NARROWBAND IOT
NB-IoT is one of the most prominent cellular techologies to accommodate the emerging
number of wireless devices connected to the internet.
The system bandwidth for NB-IoT is only 180 kHz for both downlink and uplink.
The uplink supports both single tone transmission and multi-tone transmission. Multi-
tone transmission uses SC-FDMA with a sub-carrier spacing of 15 kHz and single-tone
transmission uses a subcarrier spacing of either 15 kHz or 3.75 kHz [6]. The following
description and implementations are limited to the case of single-tone transmission with
a 3.75 kHz subcarrier spacing yielding a total of 48 subcarriers and this project will focus
only on uplink.
NB-IoT specifies three physical layer channels: Narrowband Physical Uplink Shared
CHannel (NPUSCH), Narrowband Physical Downlink Shared CHannel (NPDSCH) and
NPRACH. Particularly interesting for low-complexity IoT devices is NPRACH which
is used to request uplink allocation from the base station.Establishing a connection
using random access is a four step procedure: First the user initiates a connection by
transmitting a random access preamble, the base station transmits a response with
allocated radio resources, the user transmits its identity and finally the base station
transmits a contention resolution to resolve potential colliding users using the same
preamble [13, 14].
The inital premable sent by a user should provide enough information to the base station
such that the start of a frame can be precisely determined (ToA estimation) and any
CFO can be accounted for to improve symbol demodulation. The ToA estimation is sent
by the base station to the user to achieve uplink synchronization in the OFDMA system
[6].
Symbol group
266.7 µs ε=5
3.75 KHz
CP
m=0
12 sub-carriers Ω(m)
m=1
m=3
m=2
L=4
5
6 2. Narrowband IoT
The preamble format and packet structure is illustrated in Figure 2.1. The preamble
is divided into symbol groups, where each group consists of a Cyclic Prefix (CP) and ε
identical symbols. The value of ε depends on preamble format. The preamble format
is chosen by the user measuring the downlink power to estimate its coverage area [14].
The most common preamble is preamble format 1 with preamble frame structure 0 or 1
which has = 5 and a symbol time TSYM = 266.7 µs. The CP period for frame format 0
is TCP = 66.7 µs and TCP = 266.7 µs for frame format 1 [15]. The CP is designed such
that it is long enough to cover the maximum round trip delay to suppress inter-symbol
interference. Therefore one interpretation of allowing adaptive CP selection is for the
user to use the short CP in the range 0-8 km and the long CP in the range 8-35 km [6].
The full preamble consists of 4 repetitions of the symbol group which is again repeated
n = 2J , J = 0, . . . , 7 times for a full preamble length of L = 4 × 2J symbol groups. The
repetition of symbol groups occurs within an uplink slot, and the number of repetitions
is decided by the upper Medium Access Control (MAC)-layer depending on estimated
link quality [15]. For simplicity this projects lets J = 2 for all transmissions i.e., four
symbol groups are repeated 4 times.
The user chooses a contiguous set of K = 12, 24, 36 or 48 subcarriers with 3.75 kHz
spacing out of the available 48 subcarriers. This project focuses on preamble frame
structure type 1 where K = 12.
At the start of the NPRACH preamble transmission, the subcarrier of the first symbol
group is chosen at random. After each symbol group the subcarrier will change using a
deterministic channel hopping sequence so in the duration of a preamble there will be
L subcarrier hops. Since the hopping pattern is deterministic, several users choosing
the same initial subcarrier will thus collide for the entirety of the NPRACH preamble
sequence. The number of orthogonal preamble sequences is therefore the number of
allocated NPRACH subcarriers, K [7].
The channel hopping scheme for frame structure type 1 and preamble format 0 is
defined as follows: [15]
where n e RA
sc (i) is the frequency location of the ith symbol group. In the following the
symbol group index will be denoted m and the function which maps symbol groups to
a frequency channel is denoted Ω(m). The function c(n) is a pseudo random sequence
generator that is initialised with the base station ID. This means that the hopping
pattern is deterministic within a cell, but the subcarrier of every 4th symbol group
appears random to neighbouring cells. The above specification means there are two
“levels” of hopping as also illustrated in Figures 2.1 and 2.2. The hopping distance is
1 between symbol groups at m = 0 and m = 1, and between m = 2 and m = 3. The
hopping distance is always 6 between m = 1 and m = 2.
2.2. Baseband model of Narrowband Physical Random Access CHannel 7
Subcarrier (k)
Figure 2.2: Illustration of active channels and users in the scenario where 4 users all choose different initial subcarriers.
Color indicates user number where no color indicates an inactive channel slot. Horizontal axis is symbol index n
where 5 consecutive symbols correspond to a symbol group. The vertical axis is subcarrier index k.
The channel hopping procedure aids in the estimation of ToA and also reduces inter-
and intra-cell interference [6]. The ToA should be estimated by the base station for
successful uplink signal decoding and it further enables device positioning. Error in the
ToA estimation results in the user not being able to receive the response sent by the base
station. ToA estimation therefore has a great impact on performance in NB-IoT [13].
At the receiver the phase of each symbol depends on ToA, denoted τ , the CFO, denoted
∆f and the frequency of the user’s chosen channel with respect to the receiver’s uplink
carrier which is Ω(m). The time-domain signal at the receiver can be expressed as:
ˆ
s(t) = e−j2πf (t)(t−τ (t)) (2.1)
The received signal at the base station is a superposition of signals from multiple users,
given by
K−1
X
y[n] = ak sk [n] + w[n], (2.3)
k=0
where K is the maximum number of concurrent users, ak ∈ {0, 1} indicates whether the
kth user is active or not, and w[n] ∼ CN (0, 1/ρn ) denotes the additive noise with a per
symbol Signal-to-Noise Ratio (SNR) of ρn .
The distance from the base station to the users d has the following Probability Density
Function (PDF) [16]:
2d
fD (d) = 2 , r ≤ d ≤ R, (2.4)
R − r2
which is used to model the ToA τ = dc , where c is the propagation speed.
The mean and the variance of the distance can then be found as:
2(r2 + r · R + R2 )
Z R
E[D] = d · fD (d)dd = ≈ 24.326 km (2.5)
r 3(r + R)
(r − R)2 (r2 + 4r · R + R2 )
Z R
Var(D) = d2 · fD (d) dd − E[D]2 = ≈ 52.767 km2 (2.6)
r 18(r + R)2
2.2. Baseband model of Narrowband Physical Random Access CHannel 9
24.326 km
E[τ ] = = 81.1 µs (2.7)
3 × 108 m/s
The distribution of the ToA represents a realistic cellular system but is only valid
for stationary transmitters. A more complete model will include the dynamics of ToA
originating from channel sampling time offset and delay spread [6] as well.
where B is the binomial distribution. The case with K = 4 and p = 0.5 is considered
throughout this project. The probability of exactly k users colliding is then:
!
K k 4!
Pr(k) = p (1 − p)K−k = pk (1 − p)4−k . (2.10)
k k!(4 − k)!
10 2. Narrowband IoT
When a user is activated it has to select a base station to access the network through
and is considered to be in idle mode. In idle mode the user acquires time and frequency
synchronization using Narrow Band Primary Synchronization Signal (NPSS) which is
a known sequence transmitted in every 6th out of 10 radio subframes. The user can
achieve slot-time synchronization and estimate CFO using correlation between the known
sequence and received signal [17]. Joint time and frequency estimation is costly for low
complexity IoT devices and is therefore often implemented as a two step procedure:
First the timing offset is estimated from the first received NPSS in the presence of CFO,
second the CFO is estimated from subsequent received NPSSs using the estimated timing
offset [17].
User BS
NPSS
NSSS
NPBCH
NPRACH (Msg1)
NPDSCH (Msg2)
Time
NPUSCH (Msg3)
NPDSCH (Msg4)
NPUSCH (Msg5)
Figure 2.4: Random access procedure of NB-IoT [17]. This work focuses on NPRACH (Msg1).
Before a user can send the NPRACH it needs to be aware of the system configuration
acquired from the base station through the Narrowband Physical Broadcast CHannel
(NPBCH). The time-domain allocation of the NPRACH (Msg1) transmission is defined
by the number of repetitions, n, chosen by the user, and a specific period starting time.
The frequency allocation of the NPRACH transmission is the subset of 12 subcarriers
chosen by the user [17].
The random access procedure in NB-IoT is a 5 message procedure as illustrated in
Figure 2.4. The first message is the NPRACH (Described in section 2.1).
When the base station successfully detects an NPRACH it will respond on NPDSCH
with Msg2 (Random Access Response). The RA Response window starts on the subframe
containing the end of the preamble repetition plus 4 subframes (for preamble format 0 or 1
and n < 64 repetitions) [15]. The response contains a Random Access Preamble identifier,
a Timing Advance (TA) parameter, and allocated radio resource for transmitting Msg3.
The Random Access Preamble identifier is a number calculated from the subframe index
and initial carrier frequency (Ω(0)) of the received NPRACH [8]. The TA is used to
time-align users’ signals at the base station to account for propagation delay. The TA is
an integer number between 0 and 1282 where each integer step corresponds to a time
correction of 16Ts [18]. Ts is a basic unit of time in LTE and is defined as Ts = 32.55 ns,
2.4. Traditional synchronization parameter estimation 11
From the received NPRACH (MSG1) the BS must detect active users and perform
synchronization parameter estimation.
The Phase-Difference (PD)-based method proposed in [9] utilizes the relationship
between the phase trace of the received signal and the ToA and CFO. Phase differences
between symbols in the received signal are averaged to estimate CFO. The ToA is found
by subtracting the phase contribution due to the estimated CFO from the phase of the
received signal and averaging the phase difference between symbol groups on different
frequencies.
As a benchmark for the detection of the number of users, an amplitude-based estimator
is considered. The mean amplitude of the received signal for different number of colliding
users is compared to the amplitude of the received signal. The closest match then yields
an estimate of the number of colliding users present in the received signal.
matching value of Na
h !
i h i
N̂a = arg min E ANa − E |y| .
(2.12)
Na
where C is a random constant phase offset. In practice the phase-trace of the received
signal is not straightforward to obtain due to 2π-ambiguity but the unwrap-function and
complex argument function of the received signal provides a good approximation [9]:
βk [n] = unwrap arg(sk [n]) . (2.14)
Phase differences between symbols with the same subcarrier frequency fn can be used
to estimate the phase contribution of the CFO. Symbols groups contain five consecutive
identical symbols which phase-differences should be averaged to reduce noise variance.
Again then the average of all these estimates are used is used to estimate the CFO-induced
phase:
−1 X4
1 1 NX
βk,∆f = βk [5n + i + 1] − βk [5n + i]. (2.15)
N 4 n=0 i=1
The CFO estimate of the kth user is then simply
ˆ = 1 βk,∆f .
∆f (2.16)
k
2π
This estimate is only valid if the phase-trace of the received signal only contains the
contribution from a single user.
Differences when varying the frequency fn of this phase-trace are averaged to filter out
noise and are used to estimate ToA. The phase difference of the same symbol between
channel hopping is proportional to the channel hopping distance and the ToA can be
estimated more accurately using more channel hops.
This two-stage synchronization parameter estimation is computationally efficient and,
the approach decouples detection and estimation [9].
2.4. Traditional synchronization parameter estimation 13
Another approach which jointly detects users and estimation is presented in [6]. This
method computes the correlation with the received signal and preambles with different
time and frequency corrections to find the point of highest correlation. This 2-D grid
search approach finds the ToA and CFO simultaneously but not very computationally
efficient and does not specify a detection threshold.
3 D E E P L E A R N I N G E S T I M ATO R
This section briefly explains the basics of deep learning as used in the estimator described
in the following sections.
3.1.3 learning
When using machine learning to minimize ` this is not solved analytically but instead
addressed by Stochastic Gradient Descent (SGD). The model is defined as a set of
probability distributions parameterized by a vector θ: p(t|x, θ). The learning task can
be defined to obtain a parameter vector θ which can accurately describe this probability
distribution.
Using a maximum likelihood formulation this can be written as maximizing the
log-likelihood [12]:
max ln p(D|θ) (3.2)
15
16 3. Deep Learning Estimator
where γ is the learning rate and ∆θ is the nabla operator used to denote the gradient
of ln p(tx |xn , θ) with respect to θ averaged over the training batch. For a multilayered
network the computation of the gradient becomes the backpropagation algorithm [12].
Increasing the number of SGD iterations decreases the loss and iterations are repeated
until sufficient performance is achieved or the loss-function has converged.
3.1.4 hyper-parameters
The learning rate γ is an example of a hyper-parameter which typically must be defined
prior to training the network. Other hyper-parameters include the number of layers to
include in the model, the type of layers, the number of weights in each layer and many
more. Hyper-parameters must be fine-tuned in order to obtain the desired performance
which is a time-consuming task. The hyper-parameters define the architecture and
capacity of the network as well as how the network is trained. A network with too little
capacity will not generalize well and too much capacity increases the risk of over-fitting.
To avoid over-fitting regularization is typically applied which adds a penalty to the loss
function which prevents a priori unlikely values of θ e.g. large weights. To test whether
the model is overfitting to the training data the performance is often tested using a
separate validation set.
In this project the data is generated by a simulation corresponding to an infinitely
large set of available training data D. For each iteration of the SGD process a new
batch is generated by the simulation which means that no training data is used more
than once. This reduces overfitting and eliminates the need for separate training and
validation datasets.
The goal of the estimator is to use the discrete signal y[n] to estimate the activity
indicator a, ToA τ , CFO ∆f , and channel coefficient h of each user. Since the activity
indicator of each user is a random variable, the total number of active users in the
received signal is unknown. This boils down to a notoriously challenging problem of
source separation with unknown number of users [21]. Deep learning has significantly
improved the field of source separation and the general idea of using deep learning is to
capture non-linear relationship between inputs and corresponding targets that is often
difficult to model with analytically tractable expressions [21]. In this project, estimating
the unknown parameters is dealt with by splitting the problem into:
The two separate tasks are combined such that the synchronization parameters are
accurately estimated for each detected user.
3.3. Estimation of the Number of Users 17
where [π0 , π1 , . . . πK ] are the outputs from the last layer of the neural network and
[q0 , q1 , . . . qK ] represent the a posteriori class probabilities. A hard class prediction could
then be found as arg maxi [πi ].
The simple arg max is not differentiable, and thus the softmax approximation of
argmax is used instead [24]. The softmax function is commonly used together with
the negative log-likelihood loss function where it is equivalent to maximum likelihood
estimation and the ln in the loss-function keeps the exponential in the softmax function
from saturating the output [25].
The loss function for estimating the number of users is written:
!
exp(πk )
`p (k, π) = − ln P . (3.6)
i exp(πi )
Note that it was found that representing the complex-valued channel coefficient h by
Cartesian coordinate (i.e., real and imaginary parts) shows superior performance to
phasor representation (i.e, amplitude and phase) as seen in Figure 4.5. For K users, the
respective vectors are collected in a matrix
The neural network seeks to find an estimate X̂ such that EkX − X̂k22 is minimal which
is equivalent to a Minimum Mean-Square Error (MMSE) estimator.
The above formulation is sufficient to derive an estimation procedure. However, X
consists of multiple parameters which have values on different scales. When using a
practical optimization algorithm to find an estimate, any scaling difference between the
parameters will affect the impact each value has on the gradient descent step.
To circumvent possible issues arising from error variations across parameters, we
minimize the reconstruction error instead. The actual received signal without additive
noise, s, with the parameters in matrix X can be reconstructed using Equation 2.2. The
reconstruction is conveniently represented using function f (·) such that
s = f (X). (3.9)
For each estimate X̂, the equivalent noise-free signal ŝ is reconstructed and compared
to the actual noise-free received signal s. The noise-free signal is known during the
training procedure and is used so the output of the neural network does not account for
the distribution of the noise. The data fidelity (i.e., reconstruction loss) is quantified
using the MSE metric such that
2
2
`r (X, X̂) = E
f (X) − f (X̂)
= E
s − ŝ
. (3.10)
2 2
The number of concurrent users in each sample is known during training so when
reconstructing the signal ŝ, the contributions from the correct number of users are taken
into account when calculating the reconstruction loss `r for each sample.
The loss function which the neural network seeks to minimize is simply the sum
of Equation 3.4 and Equation 3.10
An overview of the neural network that estimates both the number of users and synchro-
nization parameters is illustrated in Figure 3.1.
The output of the network is the flattened matrix X and the probability vector π.
For 4 users there are 4 · 4 = 16 parameters in X and 5 possible classes in the number
of users (including the zero users case). The input to the network is processed so as to
extract common features that are subsequently used for multi-task learning, that is, to
detect the number of users and estimate their parameters. The first layer performs a
1-dimensional convolution over the input signal. Since the number of users, ToA, CFO
and channel coefficient all are assumed to be constant throughout a transmission, a
convolution layer is chosen so as to extract translationally invariant features of the input
time-domain signal.
Following a typical CNN structure, batch normalization, non-linear activation and
max-pooling are employed. The convolution layers, activations and pooling layers are
repeated to form a deep neural network. The features found by the convolution layers
are reshaped to a single vector which is then used as input to two individual feedforward
neural networks. One of the networks performs classification and detects the number of
users based on the output of the feature extraction layers. The other network performs
regression with the goal to yield parameters so that the reconstructed signal is as close
3.5. Network Implementation 19
1x1000 1x1000
1x2100
2@96x1 200@91x1 200@45x1 100@43x1 100@21x1 ReLu 1x16
ReLu
1x21
Concatenation
1x1000
1x200
1x5
ReLu
Batch-norm, Batch-norm,
Flatten ReLu
1-D Convolution ReLu, 1-D Convolution ReLu,
Max-Pool Max-Pool
Figure 3.1: Overview of DNN architecture for detecting and estimating synchronization parameters of up to 4 colliding
users.
as possible to the received signal in the MSE sense. Each feedforward network has two
fully connected layers followed by the ReLU activation and a linear output layer. The
network and automatic differentiation are implemented using the PyTorch framework
[26] and trained using multiple Graphics Processing Units (GPUs).
Another similar network architecture is in Figure 3.2 where each network can either
be trained individually or jointly. By comparing the performance of different network
architectures the network in Figure 3.1 was eventually decided as performing adequately.
ToA CFO
0.52 0.11
1.56 3.12
ToA + CFO NaN NaN
estimator ... ...
network NaN NaN
Mx2
Re(A) Im(A)
0.63 5.17
0.44 3.20
0 0
Active user ... ...
Received mixed signal detection 0 0
network
Mx2
Figure 3.2: Block diagram of network architecture for joint activity detection, ToA and CFO estimation.
3.5.1 scaling
In general the convergence of a neural network is faster if all inputs to all layers have
zero-mean and unit covariance between training examples in the case when all examples
are of equal importance [27]. Weights of a network w are updated according to the error
δ by: w + δx. If x has non-zero mean the updates on all the entries in w will be biased
in a particular direction resulting in inefficient training. The unit covariance ensures
that the learning rate stays consistent between each example and that all input examples
are made equally important [27].
The input to the network y and each parameter in the output X is scaled to have
zero mean and unit variance. In the simulation ToA, CFO and channel coefficient are
all drawn according to the distributions given in the system model and Na is drawn
according to Pr(k) for each sample as described in section 2.2. The variance and mean
of each parameter (CFO, ToA and h) are known in advance and are used to normalize
20 3. Deep Learning Estimator
the parameters to have mean zero and unit variance. The standardized ToA is given by
τ − E[τ ]
τ0 = . (3.12)
Var(τ )
3792 371
2 0 128 3523 140 1 92.91% 2 17 115 112 47 80 30.19%
0.0% 1.28% 35.23% 1.40% 0.01% 1.70% 11.50% 11.20% 4.70% 8.00%
7.09% 69.81%
2511 258
3 0 9 593 1831 78 72.92% 3 2 48 53 45 110 17.44%
0.0% 0.09% 5.93% 18.31% 0.78% 0.20% 4.80% 5.30% 4.50% 11.00%
27.08% 82.56%
616 54
4 0 0 50 514 52 8.44% 4 0 7 13 7 27 50.00%
0.0% 0.0% 0.50% 5.14% 0.52% 0.0% 0.70% 1.30% 0.70% 2.70%
91.56% 50.00%
648 2543 4193 2485 131 10000 143 285 223 118 231 1000
97.07% 94.57% 84.02% 73.68% 39.69% 84.40% 41.96% 40.35% 50.22% 38.14% 11.69% 35.90%
2.93% 5.43% 15.98% 26.32% 60.31% 15.60% 58.04% 59.65% 49.78% 61.86% 88.31% 64.10%
0
(a) Proposed method: 84.4% total accuracy at 10 dB SNR. (b) Conventional power-detection method: 35.9% total
accuracy at 10 dB SNR.
Figure 4.1: Confusion matrices for estimating the multiplicity of collisions comparing the neural network classifier
with a simple classifier based on the amplitude of the received signal.
21
22 4. Results
1.0
0.8
Probability of detection
0.6
NN estimator, 1 user
Amplitude-based estimator, 1 user
0.4 NN estimator, 2 users
Amplitude-based estimator, 2 users
NN estimator, 3 users
0.2 Amplitude-based estimator, 3 users
NN estimator, 4 users
Amplitude-based estimator, 4 users
0.0
4 6 8 10 12 14 16 18
SNR [dB]
Figure 4.2: Accuracy of estimating the number of colliding users. A signal is deemed correctly detected if the number
of users are estimated correctly. The NN estimator is trained for signals with 10 dB SNR.
The accuracy of the parameter estimation X̂ is found by comparing with the target
X. Since the loss function defined in Equation 3.10 only depends on the reconstruction
error, the estimated parameters in X̂ are arbitrarily ordered across users. To compare
the output with the target X the parameters are chosen to be ordered according to the
estimated amplitudes. In cases where the estimated amplitudes are similar, the ordering
may be wrong which leads to an artificially high error when evaluating performance
for multiple users. Any of the parameters can be used to dictate the ordering but the
amplitude is chosen as it is linked to SNR at the receiver in practice.
The Root-Mean-Square Error (RMSE) of each parameter in X̂ is calculated as:
r h i
RMSEk = E kek k22 , (4.1)
where e.g. the estimation error of τ is: ek = τk − τ̂k . The RMSE of the proposed neural
network-based estimator is the average of all RMSEs up to user k:
k
1X
RMSENN,k = RMSEi . (4.2)
k i=1
The performance of the neural network estimator is to be compared with the con-
ventional estimator described in section 2.4. The conventional estimator is only able
to estimate a single set of parameters, regardless of the actual number of users k. The
error of the conventional estimator is therefore measured as the estimate which has the
smallest error over all actual sets of parameters in X, e.g. the estimated ToA error is
45
NN estimator, 1 user NN estimator, 3 users
40 PD-based estimator, 1 user PD-based estimator, 3 users
NN estimator, 2 users NN estimator, 4 users
35 PD-based estimator, 2 users PD-based estimator, 4 users
CFO estimation RMSE [Hz]
30
25
20
15
10
5
0
4 6 8 10 12 14 16 18
SNR [dB]
3.44 Hz for a single user compared to 16.20 µs and 7.98 Hz for the conventional estimator.
The relatively high RMSE of the conventional estimator is likely due to the noise which
causes wrong phase unwrapping at low SNRs [9].
To explore the accuracy of the conventional and the proposed estimator the distribution
of the errors are plotted in Figures 4.6a and 4.6b. The model is trained with the number
of users varying from 0-4 but for exploring the results Cumulative Distribution Functions
(CDFs) are shown for signals with each number of collisions individually. It is seen
that there is a clear advantage using the NN estimator for 1 user but the performance
advantage is not convincing for multiple users. This is believed attributed to the
unfair advantage of the conventional estimator where estimates are chosen as the closest
matching value across all different users.
ToA and CFO both show similar distributions and it better performance can be achieved
by including more capacity in the network and fine-tuning of the hyper-parameters.
The convergence of the loss function during training of a network trained for a single
user transmission is seen in Figure 4.7 along with the CDF of estimation error. This
shows the excellent performance that can be achieved even at high SNR.
The accuracy in estimating the channel coefficient h is shown in Figure 4.5. The
RMSE is 0.101 for the in-phase part and 0.103 for the quadrature part for a single user.
24 4. Results
The RMSE shows a similar trend as in ToA and CFO estimation with deteriorating
performance as the number of concurrent users increases.
0.8
0.6
0.4
0.2
0.0
4 6 8 10 12 14 16 18
SNR [dB]
CDF of ToA estimation conditioned on 1 user(s) CDF of ToA estimation conditioned on 2 user(s)
100 NN estimator RMSE: 2.88 us 100 NN estimator RMSE: 13 us
Mean adjacent symbol group PD-based estimator RMSE: 16.2 us Mean adjacent symbol group PD-based estimator RMSE: 11.8 us
80 80
60 60
%
%
40 40
20 20
0 0
125 100 75 50 25 0 25 50 125 100 75 50 25 0 25 50 75
ToA estimation error [us] ToA estimation error [us]
CDF of ToA estimation conditioned on 3 user(s) CDF of ToA estimation conditioned on 4 user(s)
100 NN estimator RMSE: 19.9 us 100 NN estimator RMSE: 24.6 us
Mean adjacent symbol group PD-based estimator RMSE: 11.6 us Mean adjacent symbol group PD-based estimator RMSE: 12.5 us
80 80
60 60
%
40 40
20 20
0 0
100 75 50 25 0 25 50 75 75 50 25 0 25 50 75
ToA estimation error [us] ToA estimation error [us]
60 60
%
40 40
20 20
NN estimator RMSE: 15.7 Hz
0 0 Mean adjacent symbol PD-based estimator, RMSE: 8.42 Hz
60 40 20 0 20 40 60 75 50 25 0 25 50 75 100
CFO estimation error [Hz] CFO estimation error [Hz]
CDF of CFO estimation conditioned on 3 user(s) CDF of CFO estimation conditioned on 4 user(s)
100 NN estimator RMSE: 24.3 Hz 100
Mean adjacent symbol PD-based estimator, RMSE: 8.68 Hz
80 80
60 60
%
40 40
20 20
NN estimator RMSE: 30.9 Hz
0 0 Mean adjacent symbol PD-based estimator, RMSE: 9.06 Hz
100 75 50 25 0 25 50 75 100 100 75 50 25 0 25 50 75 100
CFO estimation error [Hz] CFO estimation error [Hz]
Figure 4.6: Distribution of estimation errors for the conventional and proposed estimator.
4. Results
0.3
0.2
0.1
0.0
0 2500 5000 7500 10000 12500 15000 17500 20000
Number of training batches
CDF of ToA estimation CDF of CFO estimation
100 100
80 80
60 60
%
%
40 40
20 20
NN estimator NN estimator
Mean adjacent symbol group phase difference estimator Mean adjacent symbol phase difference estimator
0 ToA Target 0 CFO Target
3 2 1 0 1 2 3 10 5 0 5 10 15
ToA estimation error [us] CFO estimation error [Hz]
Figure 4.7: Loss function and CDF of ToA and CFO estimations using the trained deep learning estimator.
26
5 AUTO-ENCODER
Current communication systems are optimized block by block for performance in relation
to a model [29]. A novel concept is to consider the communication system in an end-to-
end manner and jointly optimize both the blocks within the transmitter and receiver
as well as optimizing the transmitter and receiver structure jointly. In [29] an end-to-
end auto-encoder is proposed to derive modulation and coding schemes which shows
performance comparative to state-of-the art communication methods. The channel is
treated as a layer in a the neural network and encoder and decoder are both neural
networks. Applications of an auto-encoder applied to the physical layer of a wireless
communication systems shows great promise [30] and for this reason, are considered
applied to preamble encoding and decoding.
The general auto-encoder attempts to reconstruct the given input based on a latent
representation called h. Traditionally it is designed for dimensionality reduction and
feature learning where h is a much lower dimensionality space. In its simplest form the
encoder h = f (x) produces the latent representation and the decoder reconstructs the
input from the latent representation x̂ = g(f (x)). In the noise-free scenario x̂ = x [19].
A modern application for auto-encoders is the Denoising Auto-Encoder (DAE) where
the decoder predicts the original input based on a corrupted sample [19]. In this project
a modified DAE is used to find a preamble sequence and decoding method jointly. The
computational graph for the auto-encoder structure is illustrated in Figure 5.1.
Figure 5.1: Structure of auto-encoder mapping the input x to latent variable y using the function f (encoder). The
corruption process is the simple Additive White Gaussian Noise (AWGN) channel producing y + w. The function g
(decoder) takes the corrupted latent variable to create a reconstruction of the input x̂. The learning process attempts
to minimize the scalar reconstruction error described by the loss L.
27
28 5. Auto-encoder
The decoder is a deep neural network which takes the corrupted latent representation
ỹ as input to reconstruct the input x with minimum error. The reconstruction loss is
calculated by some function L(·), such as the l2 norm.
The training process attempts to minimize this loss by updating the weights in f (·) and
g(·).
5.1 ENCODING
The frequency pattern f and symbol sequence s are fixed sequences that are valid for all
x and are only adjusted during the training procedure to “discover” a superior preamble
sequence. Once a well-suited preamble sequence is found, the preamble sequence should
remain fixed for all future transmissions. f and s are implemented as learnable parameters
that are updated according to their gradients with a learning rate of γ = 1 × 10−4 .
5.2 DECODING
The decoder is a neural network and the same pre-trained network from section 3.5
can be used as initialization. However, for the sake of simplicity to demonstrate the
concept a simple feedforward neural network is used and the input signal consists only
of 1 user. Figure 5.2 shows the development of the frequency pattern which is found
throughout the training process. In Figure 5.2a an example of the NB-IoT NPRACH
frequency pattern is illustrated and the encoding function is initialized with this pattern.
In Figure 5.2b is the frequency pattern found after 100 iterations. It is seen that the
frequency pattern does not converge towards an orderly sequence and is not restricted
to the 12 frequency channels defined by the NB-IoT standard. This is a limitation in
5.2. Decoding 29
Frequency pattern
40000
35000
30000
Frequency [Hz]
25000
20000
15000
10000
5000
0 20 40 60 80
Symbol index
(a) One example of NB-IoT frequency pattern
Frequency pattern
40000
30000
Frequency [Hz]
20000
10000
0
0 20 40 60 80
Symbol index
(b) Learned frequency pattern using auto-encoder initialized with NB-IoT frequency hopping pattern
Frequency pattern
100000
80000
Frequency [Hz]
60000
40000
20000
20000
0 20 40 60 80
Symbol index
(c) Learned frequency pattern using auto-encoder after 100 000 training iterations
In an actual implementation much attention should be made to ensure that the data
used to train the model is representative. The performance guarantees that can be
provided using machine learning are only numerical using the available data. Either
real-world data or realistic models of dynamics should be included in the model.
The deep neural network model should be able to estimate ToA and CFO in real time.
Using a Software-Defined Radio (SDR) wideband signals can be captured and processed
in real-time. AIR-T is an SDR specifically designed for deep learning deployment that
combines an AD9371 transceiver with an FPGA for signal processing and a GPU for
deep learning [31].
The goal of this project is to find a method to increase system capacity. The increase in
number of supported users is therefore investigated. The number of concurrent users
each cell supports will be limited by the number of allocated channels and user traffic
pattern. In NB-IoT there are 48 available channels for NPRACH for each cell. This
gives an average traffic intensity of 13 erlangs according to the Erlang B loss system at a
required probability of blocking less than 1% [32].
It is assumed that each users has an average holding time of 50 ms corresponding to
the periodicity of the normal NPRACH transmission. The traffic pattern is assumed to
consist of independent users transmitting once every 10th second which gives a channel
usage per user:
Since the system can support 13 erlangs in total the maximum number of users are:
13 erlangs
= 2600 users (6.2)
5.0 × 10−3 erlangs
Using a simple model for user activity and only accounting for the normal NPRACH
configuration a total of 2600 users can be supported simultaneously if NPRACH trans-
mission are rejected in the case of collisions.
When using the proposed estimation procedure not only will synchronization be more
accurate but simultaneous preamble transmissions can be detected which improves
the access probability. 2 user transmissions are detected correctly 93 % of the time
which effectively increases the available number of random access preambles by 93
%. This increase of successfully transmitted preambles, can be used to allocate fewer
NPRACH resources, reduce access delay since collision induced back-offs will be reduced
or increase the number of supported users to 5018. However, this project does not
consider limitations of NPUSCH resources which may prove to be a bottleneck for the
RA process.
31
32 6. Discussion and conclusion
The intuition behind being able to detect and decode identical simultaneous preamble
transmissions is to exploit diversity in user locations, specific channel conditions and
oscillator imperfections to realize multiplexing.
When the base station detects multiple preambles, users are not separable by any ID
and the RAR (MSG2) cannot be specifically addressed to each user. A procedure to
associate each response to each user will be to have individual users estimate distance
from the BS using the reference signal. The estimated distance is used by the user to
approximate a scope of TA which it can use to select the RAR with the closest matching
TA.
6.3 CONCLUSION
This project considers the problem of separating colliding NB-IoT users that choose the
same random access preamble in the NPRACH scheme, and propose a method to detect
the number of colliding users and estimate their ToA, CFO and channel gains. Motivated
by recent success in leveraging learning-based methods for addressing problems related
to physical layer communications [12], the proposed method builds upon deep learning
framework. In particular, the method jointly detects the number of active users and
estimates their parameters, with the aim to improve the capacity of the critical random
access phase by not discarding interfering signals in order to utilize channel resources
better, which in turn reduces back-off periods. In addition to handling much richer class
of scenarios, the proposed method outperforms [9] in their own scenario where users
transmit orthogonal preambles and do not collide.
The method is demonstrated in NB-IoT NPRACH where the number of orthogonal
preambles is limited ensuring the proposed method is practical in IoT systems currently
being deployed. The estimation error of a conventional approach in NB-IoT is compared
to the performance of the proposed scheme. Traditional synchronization methods fail
in the case of collisions with high Signal-to-Interference Ratio (SIR) whereas, with the
proposed algorithm users can be distinguished and respective synchronization parameters
can still be estimated with a reasonable performance.
Deep learning is a promising tool for developing joint estimation procedures, which
are notoriously difficult in traditional model-based methods, and enables separation
of synchronization parameters even when users transmit using the same preamble. A
deep learning building block the denoising auto-encoder, is applied in a novel concept
to discover an alternative superior preamble sequence. The found preamble sequence
reflects the distribution of the input data but further works is required to to achieve
an increase in performance compared to the neural network estimator. Although deep
learning-based estimation will lead to sub-optimal estimators compared to an analytically
derived joint estimator, it allows for practical, straightforward development and efficient
computation.
BIBLIOGRAPHY
33
34 Bibliography
[15] 3GPP TS, 36.211 - Evolved Universal Terrestrial Radio Access (E-UTRA); Physical
channels and modulation, V. 15.4.0, 2018.
[18] 3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer
procedures,” 3rd Generation Partnership Project (3GPP), Technical Specification
(TS) 36.213, 03 2019, version 15.4.0.
[19] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
[21] J. Chien, Source Separation and Machine Learning. Elsevier Science, 2018.
[22] C. Bishop, Pattern Recognition and Machine Learning, ser. Information Science
and Statistics. Springer New York, 2016.
[27] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural
Networks: Tricks of the Trade. Springer-Verlag, 1998.
[28] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv
e-prints, p. arXiv:1412.6980, Dec 2014.
[29] T. O’Shea and J. Hoydis, “An Introduction to Deep Learning for the Physical
Layer,” IEEE Transactions on Cognitive Communications and Networking, vol. 3,
no. 4, pp. 563–575, 2017.
[30] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning at physical
layer without channel models,” IEEE Communications Letters, 2018.
Bibliography 35
[31] I. Deepwave Digital, “Deepwave digital,” Website, Mar. 2019. [Online]. Available:
https://www.deepwavedigital.com/sdr
[33] A. van den Bos, Parameter Estimation for Scientists and Engineers. Wiley, 2007.
37
A) C R A M É R – R A O L O W E R
BOUND
The achievable precision of an unbiased estimators may be given by the lower bound on
the variance of the estimation, called the Cramér–Rao lower Bound (CRB) [33]. The
CRB is used as reference to compare the performance of the traditional estimator and
the deep learning estimator in chapter 3.
The observed vector r depends on the synchronization parameters: ∆f the carrier
frequency offset, τ the ToA and the channel carrier phase θ which are desired to be
estimated. For practical reasons the challenging case of a joint estimation of τ, ∆f, θ
is overlooked and the CRB can be derived for an estimator of a single parameter and
treating the other parameters as unwanted parameters. In the general case a single
element of ∆f, τ, θ, is denoted λ and is assumed deterministic while other elements are
random variables collected in a vector u. The vector u is assumed to have a known PDF
that does not depend on λ.
The CRB is formulated as:
1
CRB(λ) = " 2 # (A.1)
δ ln p(r|λ)
Er δλ
Equation A.1 is often difficult to evaluate and therefore a different bound, the Modified
Cramér–Rao Bound (MCRB) is considered instead [34]:
1
M CRB(λ) = " 2 # (A.2)
δ ln p(r|u,λ)
Er δλ
Generally M CRB(λ) ≤ CRB(λ) meaning it is a more loose bound. However, for most
practical applications is still useful [34].
For the Gaussian channel it is much easier to derive the conditional probability p(r|u, λ)
than evaluating p(r|λ).
The received complex signal waveform with additive noise can be written:
where s(t) is the information signal as described in Equation 2.1 and w(t) is additive
noise distributed according to a complex normal distribution.
p(r|u, λ) is replaced by the likelihood function Λ(λ, u) and after some manipulation
Equation A.2 becomes [34]:
N0
M CRB(λ) = (A.4)
δs(t) 2
Z
Eu dt
T0 δλ
39
40 A. Cramér–Rao lower bound
According to Equation A.4 the expectation should be calculated over u∆f that is over
both θ and τ . However, Equation A.5 does not depend on θ and the limitations is only
calculated over τ .
"Z # "Z #
2 2 2 2
Eτ 4π (t − τ ) dt = 4π Eτ (t − τ ) dt (A.7)
T0 T0
"Z #
NT
2 2
= 4π Eτ (t − τ ) dt (A.8)
0
where N T is the length of T0 . A simple case can be assumed where τ ∼ U(a, b) and the
expectation can be evaluated to:
(N T )3
" #
2
= 4π Eτ − τ (N T )2 + τ 2 (N T ) (A.9)
3
2
= π 2 (2a2 (N T ) + 2ab(N T ) − 3a(N T )2 + 2b2 (N T ) − 3b(N T )2 + 2(N T )3 ) (A.10)
3
2π 2 (N T )3
= (A.11)
3
Finally, to summarize we calculated the denominator of the MCRB in Equation A.4:
δs(t) 2
2π 2 (N T )3
Z
Eu∆f dt = (A.12)
T0 δ∆f 3
The SNR, Es /N0 is normalized according to symbol period T , and is therefore omitted.
From Equation A.4 we have:
3N0
M CRB(∆f ) = (A.13)
2π 2 N 3
This states that the a lower bound on estimation error is a simple analytical expression
which is a function of (normalized) SNR and number of samples. The MCRB can be
similarly derived for M CRB(τ ) and M CRB(θ) but the derivation is not included here.
The expression is used to provide perspective on performance in Table A.1.
The derived expression uses a simple model that does not account for the channel
coefficient as used in the model in Equation 2.2. The frequency is normalized to the
interval [−1/2, 1/2] and the expression is used to compare performance between the
PD-based (section 2.4) and the NN estimator (section 3.2). The number of samples is
chosen to N = 96 samples and the SNR is 10 dB .
41
Table A.1: Performance comparison with MCRB, traditional estimator and NN estimator.
It is seen that both estimator are far from the achievable performance bound.
1
4) Results: LSTM based detector shows outstanding per- 3) Results: Performs comparably to traditional decoding
formance in a communication system with ISI however, OFDM implementation allows symbols to be de-
5) Conclusion: A sequence based estimator should be used coded when the CP is removed. Related works: Equalization
in case of ISI. DL is promising when there is no model for and synchronization in OFDM.
the physical channel.
E. Idea
B. Paper: [6] Inspired by OFDM detection papers as [8], an mmWave
1) Purpose: Deep neural network for channel decoding model-driven receiver can be developed that uses expert
(NND). knowledge to replace receiver blocks. NN can learn to com-
2) Challenges: Should learn a decoding structure rather pensate for non-linearities of mmWave components.
than learning to classify 2k different codewords. Block lengths
are normally long but complexity of training increases expo-
F. Idea
nentially with k.
3) Methods: Compares neural network decoding perfor- Most neural network detection methods are only created
mance on polar block codes and random codes. for a stationary channel. Work can be made to extend some of
4) Results: NND for polar codes show possibility for the already developed ML receiver detection methods to many
generalization but not for random codes. The BER for a large different channel conditions.
NND is worse than MAP decoder for both random codes and
polar codes. Neural network is able to generalize from a certain III. E ND - TO - END C OMMUNICATION S YSTEM DESIGN
fixed SNR at training to any arbitrary SNR.
5) Conclusion: High decoding complexity and only A. Paper [9]:
achieves MAP for very short block lengths. But could possibly Rethinking the communication system which is currently
be improved with RNN and parallelized computation. Deep optimized block by block for performance in relation to a
learning for channel decoding does not seem too promising. model [10] proposes an end-to-end autoencoder to derive
modulation and decoding schemes which shows performance
comparative to traditional communication systems. The chan-
C. Paper [7]
nel is treated as a layer in the neural network and therefore
1) Purpose: Considers a MIMO channel with known chan- its differentiable functional form is needed. A functional form
nel matrix H. Goal is to apply deep machine learning in the will always be a simplification which does not factor in all
classical MIMO detection. Can be used as an example where impairments of a real system such as hardware imperfections,
deep learning can be used to trade off some of the exactness of varying channel conditions. Since the applications of deep
an existing algorithm provides with faster computation times. learning in the physical layer shows great promise we need
2) Challenges: Maximum likelihood already has really a model which can robustly account for these situations [9]
good performance but high computational complexity. A sub- 1) Purpose: To make an end-to-end communication ap-
optimal implementation is desired. proach which does not need a functional description of the
3) Methods: Unfolding an iterative projected gradient de- channel.
scent method. Each iteration is represented as a layer in a 2) Challenges: Needs channel gradients in order to perform
neural network. backpropagation (optimization).
4) Results: Tested against a fixed channel model and a 3) Related work: Previous work have circumvented a func-
varying channel model with H drawn from a known distribu- tional model using a two-phase training alternative. One ap-
tion. Perforns promising both in fixed channels and a varying proach has trained on a functional model and fine-tuned neural
channel scenario. Performance is comparable to the advanced network parameters using a realistic channel [11]. Realizations
detector ”semidefinite relaxation (SDR)” but computation is of a realistic channel has been approximated using a GAN.
30x faster. Another approach applied supervised learning at the receiver
and reinforcement learning at the transmitter.
4) Methods: Use a stochastic approximation technique to
D. Paper: [8] approximate gradients for the model called Simultaneous
1) Purpose: Channel estimation and signal detection use perturbation stochastic approximation that does not require
traditional communication solutions as initialization and uses knowledge of exact channel model.
DL networks to refine the coarse inputs. 5) Results: Achieves the theoretical BER for an AWGN
2) Methods: Calls implementation ComNet. CE subnet first channel without any assumption about the channel model but
estimates OFDM channel from pilot symbols using LS. SD takes more epochs to converge compared to the case where
obtains ZF (zero forcing) estimate of the transmitted symbol. channel model is known.
The obtained estimate, along with the estimated channel and 6) Conclusion: Successful end-to-end design is created
received signal to the DL model to further refine the symbol when channel model is not a available or gradient calculation
estimates. is too complex.
3
V. CDMA/G RANT FREE RANDOM ACCESS /M ASSIVE [2] X. Yang, M. Matthaiou, J. Yang, C.-K. Wen, F. Gao, and
CONNECTIVITY S. Jin, “Hardware-Constrained Millimeter-Wave Systems for 5G:
Challenges, Opportunities, and Solutions,” IEEE Communications
Neural networks were employed in order to perform detec- Magazine, vol. 57, no. 1, pp. 44–50, 2019. [Online]. Available:
https://ieeexplore.ieee.org/document/8613274/
tion and intra-user (and mostly linear) successive interference [3] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an effective
cancellation in multi-user CDMA systems in: nonorthogonal multiple access scheme,” IEEE Transactions on Vehicular
B. Aazhang, B. P. Paris, and G. C. Orsak, Neural networks for multiuser Technology, vol. 67, no. 9, pp. 8440–8450, Sep. 2018.
detection in code-division multiple-access communications, IEEE Trans. Com- [4] M. Pajovic, T. Koike-Akino, and P. V. Orlik, “Model-Driven Deep
mun., vol. 40, no. 7, pp. 1212-1222, Jul 1992 Learning Method for Jammer Suppression in Massive Connectivity
M.-H. Yang, J.-L. Chen, and P.-Y. Cheng, Successive interference cancel- Systems,” arXiv e-prints, p. arXiv:1903.06266, Mar 2019.
lation receiver with neural network compensation in the CDMA systems, in [5] N. Farsad and A. Goldsmith, “Detection Algorithms for Communication
Asilomar Conference on Signals, Systems and Computers, vol. 2, Oct 2000, Systems Using Deep Learning,” CoRR, vol. abs/1705.0, 2017. [Online].
pp. 1417-1420. Available: http://arxiv.org/abs/1705.08044
B. Geevarghese, J. Thomas, G. Ninan, and A. Francis, CDMA in- ter- [6] T. Gruber, S. Cammerer, J. Hoydis, and S. T. Brink, “On deep learning-
ference cancellation techniques using neural networks in rayleigh channels, based channel decoding,” 2017 51st Annual Conference on Information
in International Conference on Information Communication and Embedded Sciences and Systems, CISS 2017, pp. 1–6, 2017.
Systems (ICICES), Feb 2013, pp. 856-860. [7] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” IEEE
Workshop on Signal Processing Advances in Wireless Communications,
SPAWC, vol. 2017-July, pp. 1–5, 2017.
VI. M ODULATION CLASSIFICATION [8] X. Gao, S. Jin, C. K. Wen, and G. Y. Li, “ComNet: Combination of
Deep Learning and Expert Knowledge in OFDM Receivers,” pp. 1–11,
Algorithm deficit, complex problem, optimal solutions are 2018.
hard. Has been attempted many times with OK results. [9] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learn-
ing at physical layer without channel models,” IEEE Communications
Letters, 2018.
VII. U NSUPERVISED MACHINE LEARNING [10] T. O’Shea and J. Hoydis, “An Introduction to Deep Learning for the
Physical Layer,” IEEE Transactions on Cognitive Communications and
Step 1: Model selection. Select a model (family of distri- Networking, vol. 3, no. 4, pp. 563–575, 2017. [Online]. Available:
http://ieeexplore.ieee.org/document/8054694/
butions parameterized by a vector θ.) [11] S. Dörner, S. Cammerer, J. Hoydis, and S. Ten Brink, “Deep Learning
Step 2: Learning. Data should be used to choose the value Based Communication Over the Air,” Conference Record of 51st Asilo-
for the parameter vector θ. mar Conference on Signals, Systems and Computers, ACSSC 2017, vol.
2017-Octob, no. 1, pp. 1791–1795, 2018.
Step 3: Model is applied to carry out the task of interest. [12] T. Matsumine, T. Koike-akino, and Y. Wang, “Deep Learning-Based
e.g. Clustering, dimensionality reduction or generation of new Constellation Optimization for Physical Network Coding in Two-Way
samples. Relay Networks,” -, -.
[13] A. Balatsoukas-Stimming, “Non-Linear Digital Self-Interference Can-
cellation for In-Band Full-Duplex Radios Using Neural Networks,” in
IEEE Workshop on Signal Processing Advances in Wireless Communi-
A. Autoencoders cations, SPAWC, vol. 2018-June, 2018.
[14] T. J. O’Shea, T. Roy, and N. West, “Approximating the Void:
The transmitted input message x has an intermediate repre- Learning Stochastic Channel Models from Observation with Variational
sentation z which is the received signal and the output should Generative Adversarial Networks,” CoRR, vol. abs/1805.0, 2018.
match the input. ML should only be used if a model or an al- [Online]. Available: http://arxiv.org/abs/1805.06350
gorithm deficit exists. Algorithm deficit: Non-linear dynamical
models (optical links), multiple access channels with sparse
transmission codes and joint source channel coding.
End-to-end communication described in Section III-A has
examples of auto-encoder use.
Auto-encoders can also be used to compress Channel State
Information (CSI) for Frequency Division Duplex (FDD) links.
B. Generative models
1) Channel realizations [14]: Example: Learn to generate
samples from a given channel. Reasonable for scenarios that
lack straightforward channel models. Can be used to mimic
and identify non-linear channels for satellite communications.
Can be generally used to augment a dataset used for training.
2) Detecting anomalies by learning the typical distribution
of features: Can be used for spectrum sensing, identifying
covert transmissions.
R EFERENCES
[1] O. Simeone, “A Very Brief Introduction to Machine Learning With
Applications to Communication Systems,” IEEE Transactions on
Cognitive Communications and Networking, vol. 4, no. 4, pp. 648–664,
2018. [Online]. Available: http://arxiv.org/abs/1808.02342
Deep Learning for Synchronization and Channel
Estimation in NB-IoT Random Access Channel
Mads H. Jespersen∗ , Milutin Pajovic† , Toshiaki Koike-Akino† , Ye Wang† , Petar Popovski∗ , and Philip V. Orlik†
∗ Department of Electronic Systems, Aalborg University, Denmark
† Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA
Abstract—The central challenge in supporting massive IoT The timing offset comprises of propagation time, downlink
connectivity is the uncoordinated, random access by sporadically synchronization errors and channel delay spread [5].
active devices. The random access protocol and activity detection NB-IoT is a recent standard proposed by the 3rd Generation
have been widely studied, while the auxiliary procedures, such
as synchronization, channel estimation and equalization, have Partnership Project (3GPP) to accommodate the emerging
received much less attention. However, once the protocol is fixed, number of wireless devices connected to the Internet. It is
the access performance can only be improved by a more effective designed to co-exist with Long-Term Evolution (LTE) and
receiver, through more accurate execution of the auxiliary provide low-cost and low-power devices with low throughput
procedures. This motivates the pursuit of joint synchronization connectivity. The random access procedure in the NB-IoT
and channel estimation, rather than the traditional approach
of handling them separately. The prohibitive complexity of the is initiated by the Narrowband Physical Random Access
conventional analytical solutions leads us to employ the tools of CHannel (NPRACH). The NB-IoT has a system bandwidth
deep learning in this paper. Specifically, the proposed method of 180 kHz that accommodates 48 orthogonal channels from
is applied to the random access protocol of Narrowband IoT which a user attempting to establish a connection chooses one
(NB-IoT), preserving its standard preamble structure. We ob- at random. If NB-IoT users choose different (i.e., orthogonal)
tain excellent performance in estimating Time-of-Arrival (ToA),
Carrier-Frequency Offset (CFO), channel gain and collision preambles, the base station is able to estimate Time of Arrival
multiplicity from a received mixture of transmissions. The (ToA) and Carrier Frequency Offset (CFO) of each user [5],
proposed estimator achieves a ToA Root-Mean-Square Error [6]. However, given a possibly large number of users and
(RMSE) of 0.99 µs and a CFO RMSE of 1.61 Hz at 10 dB relatively small number of orthogonal preambles, it is likely
Signal-to-Noise Ratio (SNR), whereas a conventional estimator that two or more users choose the same preamble. The
using two cascaded stages have RMSEs of 15.85 µs and 8.05 Hz,
respectively. resulting collision may lead to a user back-off time of up
Index Terms—Deep learning, IoT standards, massive random to almost 9 minutes [7], [8]. In order to avoid unnecessary
access, joint estimation backoff periods and consequently improve channel utilization
and overall capacity of the NB-IoT system, we propose in
I. I NTRODUCTION this paper a Deep Learning (DL)-based method for separating
colliding users, detecting their number and estimating their
A massive number of devices are expected to be connected respective ToAs and CFOs. We validate the proposed method
to the Internet and several standards have been proposed to using simulations and demonstrate significantly improved
enable connectivity of low-complexity devices operating over performance compared to the conventional approaches.
a shared wireless channel. Most prominent technologies are
Sigfox, LoRa and Narrowband IoT (NB-IoT) [1]. In Internet A. Related Work
of Things (IoT) applications, the random access procedure Several papers have explored methods for activity detec-
has a high impact on device battery life and number of tion, ToA and CFO estimation using the NB-IoT NPRACH
devices that can be supported concurrently [2]. Random access preamble structure. As such, [5] estimates the ToA by search-
is used to request uplink allocation from the base station ing for highest correlation between the received signal and
without requiring users to be constantly connected to the delayed/frequency-shifted preamble on a grid of possible
base station. Most IoT data packets are on the order of bits delays and frequencies. To reduce the complexity of the
and users transmit them sporadically by establishing a new algorithm in [5], the ToA and CFO are estimated using the
connection for every transmission. Establishing a connection residual phase difference between symbol groups and channel
using random access is a four step procedure [3], [4], which hops in a two-stage procedure in [6]. With the goal to improve
is initiated by a user that has packet to transmit by sending the ToA estimation, [4] suggests a novel hopping pattern
a random access preamble. The random access preamble that renders more accurate ToA estimation compared to that
is designed such that the base station is able to efficiently achieved with the already defined NB-IoT preamble.
detect the transmitting user and estimate any timing offset We consider in this paper a problem of separating colliding
between the user and base-station from the received signal. NB-IoT users that choose the same random access preamble
in the NPRACH scheme, and propose a method to detect
The first author performed this work as an intern at MERL. the number of colliding users and estimate their ToA, CFO
Symbol group
for a full preamble length of L = 4 × 2J symbol groups.
The repetition of the symbol groups occurs within an uplink
266.7 µs ε=5 slot, and the number of repetitions is decided by the upper
3.75 KHz
m=1
N = 12, 24, 36 or 48 subcarriers with 3.75 kHz spacing out
of the available 48 subcarriers. This paper focuses on the
m=3 preamble frame structure type 1 where N = 12. At the start
m=2
of the NPRACH preamble transmission, the subcarrier of the
first symbol group is chosen at random. After each symbol
L=4 group the subcarrier will change using a deterministic channel
hopping sequence so in the duration of a preamble there will
Uplink slot n = 1, ..., 128 be L subcarrier hops. Since the hopping pattern is determinis-
tic, several users choosing the same initial subcarrier will thus
Fig. 1. Overview of NPRACH preamble and packet structure. collide for the entirety of the NPRACH preamble sequence.
The number of orthogonal preamble sequences is therefore
and channel gains. Motivated by recent success in leveraging the number of allocated NPRACH subcarriers, K [7].
learning-based methods for addressing problems related to For frame structure type 1 and preamble format 0, two “lev-
physical layer communications [9], our method builds upon els” of hopping are employed as shown in Fig. 1. The hopping
deep learning framework. In particular, we jointly detect pattern is deterministic within a cell, but the subcarrier of
the number of active users and estimate their parameters, every 4th symbol group appears random to neighbouring cells.
with the aim to improve the capacity of the critical random The hopping procedure aids in the estimation of ToA and also
access phase by not discarding interfering signals in order to reduces inter- and intra-cell interference [5]. The ToA should
utilize channel resources better, which in turn reduces back-off be estimated by the base station for successful uplink signal
periods. In addition to handling much richer class of scenarios, decoding and it further enables device positioning. Error in
the proposed method outperforms [6] in their own scenario the ToA estimation results in the user being unable to receive
where users transmit orthogonal preambles and do not collide. the response sent by the base station. ToA estimation therefore
In comparison to [4], the random access preamble in this has a great impact on performance in NB-IoT [4].
work is as suggested by the NB-IoT standard, ensuring the III. S YSTEM M ODEL
proposed method is practical in the NB-IoT systems currently The received signal at the base station is a superposition of
being deployed. Finally, looking outside the NB-IoT scope, signals from multiple users, given by
we believe that this work is the first application of deep
K−1
X
learning techniques for user separation in massive connectivity
y[n] = ak sk [n] + w[n], (1)
systems.
k=0
II. NB-I OT R ANDOM ACCESS P REAMBLE D ESIGN where K is the maximum number of concurrent users, ak ∈
The preamble format and packet structure are illustrated in {0, 1} indicates whether the kth user is active or not, and
Fig. 1. The preamble is divided into symbol groups, where w[n] ∼ CN (0, 1/ρn ) denotes the additive noise with a per
each group consists of a Cyclic Prefix (CP) and ε identical symbol Signal-to-Noise Ratio (SNR) of ρn .
symbols. The value of ε depends on preamble format. The At the receiver, the phase of each symbol depends on the
preamble format is chosen by the user based on the downlink ToA τ , the CFO ∆f (which gives the frequency of the user’s
power measurement to estimate its coverage area [3]. chosen channel with respect to the receiver’s uplink carrier
The most common preamble format is format 1 with frequency f ), and the channel rotation given by arg(h), where
preamble frame structure 0 or 1, which has ε = 5 and a h is the complex-valued channel coefficient. These parameters
symbol time TSYM = 266.7 µs. The CP period for frame are assumed to be independent across users and denoted by
format 0 is TCP = 66.7 µs and TCP = 266.7 µs for frame τk , ∆fk and hk for each user k.
format 1 [10]. The CP is designed such that it is long enough The signal from the kth user is given by
to cover the maximum round trip delay to suppress Inter-
sk [n] = hk e−2π(fn +∆fk )(nTsym −τk ) , (2)
Symbol Interference (ISI). Therefore one interpretation of
allowing adaptive CP selection is for the user to use the short where Tsym is the symbol duration. The signal model is
CP in the range 0–8 km and the long CP in the range 8–35 km limited to only considering a single preamble sequence for the
[5]. sake of simplicity. This means that the sub-carrier frequency
The full preamble consists of 4 repetitions of the symbol pattern fn is predetermined and identical for all instances of
group which is again repeated n = 2J , J = 0, . . . , 7 times s.
channel coefficient h of each user. Since the activity indicator
of each user is a random variable, the total number of active
R users in the received signal is unknown. This boils down
r to a notoriously challenging problem of source separation
with unknown number of users [12]. Deep learning has
d significantly improved the field of source separation and the
general idea of using deep learning is to capture non-linear
relationship between inputs and corresponding targets that is
Fig. 2. Geometry of users distribution. often difficult to model with analytically tractable expressions
[12]. In this paper, estimating the unknown parameters is dealt
with by splitting the problem into:
The typical FFT length in LTE is 512 [6], but for simplicity • Classification of the number of active users; and
we describe that each sample, n, corresponds to a symbol. • Estimation of ToAs, CFOs and channel coefficients given
In this model, the contents of the CP are interpreted as a the number of users.
symbol and therefore no distinction is made between the CP
The two separate tasks are combined such that the synchro-
and the ε = 5 repeated symbols in a symbol group. This signal
nization parameters are accurately estimated for each detected
model may be valid only for the long CP which corresponds to
user.
distances between the user and base station within a minimum
of r = 8 km and a maximum of R = 35 km [5]. The users A. Estimation of the Number of Users
are assumed to be uniformly distributed in the coverage area Finding the number of active users, Na , is formulated
of the base station as illustrated in Fig. 2. The distance from as a classification problem where p = OneHot(Na ) is a
the base station to the users d has the following Probability categorical random variable encoded as a one-hot vector
Density Function (PDF) [11]: specifying Na . With a one-hot encoding, the true target
2d p = [p0 , p1 , . . . , pK ] has entry one at index Na , and zero
fD (d) = , r ≤ d ≤ R, (3) entries everywhere else. This is different from a typical way
R2 − r 2
of representing active users where users are ordered in a
which is used to model the ToA τ = dc , where c is the vector and each index indicates the activity of a unique user.
propagation speed. The number of users Na can then be estimated as the l0
The channel coefficient h of the signal model in (2) is norm of that sparse vector. In this collision scenario users are
a complex-valued constant which accounts for small scale transmitting using the same spreading sequence and are not
fading: h ∼ CN (0, 1). This means that the average received uniquely distinguishable. For this reason, only the information
signal power is normalized to one. The narrowband channel is on the number of active users is represented in p.
modeled as a slowly varying single-tap Rayleigh fading chan- Cross-entropy loss is typically used in classification prob-
nel and for this reason, modeled as a single coefficient [6]. lems [13], and [14] suggests that the cross-entropy loss in
Large scale fading is not included in the model since users classification problems leads to faster convergence and better
already have knowledge of the downlink SNR and adjust their generalization compared to the Mean Squared Error (MSE).
transmit power accordingly using power control. For nonbinary classification, we typically use softmax cross
The CFO in (2) is chosen uniformly at random between entropy loss (or negative log-likelihood) expressed as:
−20 and +20 Hz [6]. For the sake of simplicity, the CFO
K
X
and ToA are assumed to be constant throughout an entire
NPRACH transmission for each user. `NLL (p, q) = − pk log qk , (6)
k=0
The activity indicator ak is modeled as Bernoulli random
variable with the probability of transmitting p and a1 , . . . , aK where q is a continuous differentiable softmax function:
are iid. The number of concurrent active users is exp(πk )
qk = P , (7)
i exp(πi )
K
X
Na = ak ∼ B(K, p), (4)
where [π0 , π1 , . . . πK ] are the outputs from the last layer of the
k=1
neural network and [q0 , q1 , . . . qK ] represent the a posteriori
where B is the binomial distribution. We consider the case class probabilities. A hard class prediction could then be found
with K = 4 and p = 0.5 throughout the paper. The probability as arg maxi [πi ].
of exactly k users colliding is then:
B. Parameter Estimation
K k 4!
Pr(k) = p (1−p)K−k = pk (1−p)4−k . (5) The parameters to be estimated are collected in a vector
k k!(4 − k)! T
IV. D EEP L EARNING E STIMATOR xk = τk , ∆fk , <[hk ], =[hk ] . (8)
The goal of the estimator is to use the discrete signal y[n] Note that it was found that representing the complex-valued
to estimate the activity indicator a, ToA τ , CFO ∆f , and channel coefficient h by Cartesian coordinate (i.e., real and
1x1000 1x1000
1x2100
2@96x1 200@91x1 200@45x1 100@43x1 100@21x1 ReLu 1x16
ReLu
1x21
Concatenation
1x1000
1x200
1x5
ReLu
Batch-norm, Batch-norm,
Flatten ReLu
1-D Convolution ReLu, 1-D Convolution ReLu,
Max-Pool Max-Pool
Fig. 3. Overview of DNN architecture for estimating synchronization parameters of up to 4 colliding users.
imaginary parts) shows superior performance to phasor rep- periods. The total number of samples in the received signal
resentation (i.e, amplitude and phase) as seen in Fig. 7. For is: Nrep L( + 1) = 4 · 4 · (5 + 1) = 96, where the real and
K users, the respective vectors are collected in a matrix imaginary parts are represented in 2 individual channels.
X = [x0 , x1 , . . . , xK−1 ]. (9) The output of the network is the flattened matrix X and
the probability vector π. For 4 users there are 4 · 4 = 16
The neural network seeks to find an estimate X̂ such that parameters in X and 5 possible classes in the number of
EkX − X̂k22 is minimal which is equivalent to a Minimum users (including the zero users case). The input to the net-
Mean-Square Error (MMSE) estimator. work is processed so as to extract common features that are
The above formulation is sufficient to derive an estimation subsequently used for multi-task learning, that is, to detect the
procedure. However, X consists of multiple parameters which number of users and estimate their parameters. The first layer
have values on different scales. When using a practical opti- performs a 1-dimensional convolution over the input signal.
mization algorithm to find an estimate, any scaling difference Since the number of users, ToA, CFO and channel coefficient
between the parameters will affect the impact each value has all are assumed to be constant throughout a transmission, a
on the gradient descent step. convolution layer is chosen so as to extract translationally
To circumvent possible issues arising from error varia- invariant features of the input time-domain signal.
tions across parameters, we minimize the reconstruction error Following a typical CNN structure, batch normalization,
instead. The actual received signal without additive noise, non-linear activation and max-pooling are employed. The
s, with the parameters in matrix X can be reconstructed convolution layers, activations and pooling layers are repeated
using (2). The reconstruction is conveniently represented using to form a deep neural network. The features found by the con-
function f (·) such that s = f (X). volution layers are reshaped to a single vector which is then
For each estimate X̂, the equivalent noise-free signal ŝ is used as input to two individual feedforward neural networks.
reconstructed and compared to the actual noise-free received One of the networks performs classification and detects the
signal s. The noise-free signal is known during the training number of users based on the output of the feature extraction
procedure and is used so the output of the neural network layers. The other network performs regression with the goal
does not account for the distribution of the noise. The data to yield parameters so that the reconstructed signal is as close
fidelity (i.e., reconstruction loss) is quantified using the MSE as possible to the received signal in the MSE sense. Each
metric such that feedforward network has two fully connected layers followed
2
2 by the Rectified Linear Unit (ReLU) activation and a linear
`r (X, X̂) = E
f (X) − f (X̂)
2 = E
s − ŝ
2 . (10)
output layer. The network and automatic differentiation are
The number of concurrent users in each sample is known implemented using the PyTorch framework [15] and trained
during training so when reconstructing the signal ŝ, the using multiple Graphics Processing Units (GPUs).
contributions from the correct number of users are taken into In the simulation ToA, CFO and channel coefficient are
account when calculating the reconstruction loss `r for each all drawn according to the distributions given in the system
sample. model and Na is drawn according to Pr(k) for each sample.
The loss function which the neural network seeks to mini- The input to the network y and each parameter in the output
mize is simply the sum of (6) and (10) X is scaled to have zero mean and unit variance. In general
loss = `p (k, q) + `r (X, X̂). (11) the convergence of a neural network is faster if all inputs
to all layers have zero-mean and unit covariance between
C. Network Implementation training examples in the case when all examples are of equal
An overview of the neural network that estimates both the importance [16]. From the system model the variance and
number of users and synchronization parameters is illustrated mean of each parameter (CFO, ToA and h) are known and
in Fig. 3. The input to the network is the received signal which used to normalize the parameters to have mean zero and unit
consists of 4 NPRACH repetitions each with L( + 1) symbol variance. The mean and variance of τ can be derived from (3)
1.0
35 NN estimator, 1 user
Phase-difference based estimator, 1 user
0.8 NN estimator, 2 users
30 Phase-difference based estimator, 2 users
The learning rate is 0.0001 and each batch consists of 50,000 This gives the conventional estimator an artificial advantage.
realizations of y from (1). The stochastic optimization method The RMSE of ToA and CFO estimation with a varying
based on adaptive momentum (ADAM) [17] is used and a total number of users are shown in Figures 5 and 6. The neural
of 20,000 different batches are used in training. network-based estimator shows lower estimation error for both
In Fig. 4, the estimation of collision multiplicity is shown ToA and CFO compared to the phase-difference-based estima-
for the proposed classification method compared to a simple tor even for a single user. For two users the proposed estimator
amplitude-based method. As colliding signals will add non- is superior to the conventional estimator when estimating ToA.
coherently, the amplitude of the signal is not a good indicator At 10 dB the proposed estimator has an RMSE of 0.99 µs and
on collision multiplicity. 1 and 2 users are successfully 1.61 Hz for a single user compared to 15.85 µs and 8.05 Hz
30
error of a conventional approach in NB-IoT is compared to the
NN estimator, 1 user NN estimator, 3 users
PD based estimator, 1 user PD based estimator, 3 users performance of the proposed scheme. Traditional synchroniza-
25 NN estimator, 2 users NN estimator, 4 users tion methods fail in the case of collisions with high Signal-to-
PD based estimator, 2 users PD based estimator, 4 users Interference Ratio (SIR) whereas, with the proposed algorithm
CFO estimation RMSE [Hz]