0% found this document useful (0 votes)
26 views

Deep Learning For Synchronization and Channel Estimation in NB IoT Random Access Channel PDF

Uploaded by

ANAND
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Deep Learning For Synchronization and Channel Estimation in NB IoT Random Access Channel PDF

Uploaded by

ANAND
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Master’s Thesis

Deep Learning for Synchronization and


Channel Estimation in NB-IoT Random
Access Channel

Supervisor:
Petar Popovski
Author: Industry supervisor:
Mads Helge Jespersen Milutin Pajovic
(Mitsubishi Electric
Research Laboratories)
Copyright © Group 1050, Wireless Communication Systems 4th semester, Aalborg University 2019
This report is compiled in LATEX. Figures are made using Inkscape.
Department of Electronic Systems
Wireless Communication Systems
Aalborg University
http://www.aau.dk

Title: Abstract:
Deep Learning for Synchronization Effective decoding of wireless signals requires
and Channel Estimation in NB-IoT various parameter acquisition techniques in-
Random Access Channel cluding user activity detection, synchronization,
channel estimation, and channel equalization.
Project Period: In traditional systems, these unknown, under-
Spring Semester 2019 lying parameters of the communication channel
are individually estimated. This work proposes
Project Group: a novel joint estimation process applying deep
Group 1050 learning. The proposed method shows supe-
rior performance to traditional methods and
Participant:
is further able to find the multiplicity of colli-
Mads Helge Jespersen
sions, handle synchronization and channel esti-
mation in the case of colliding non-orthogonal
Supervisor:
transmissions, and is able to discover supe-
Petar Popovski
rior preamble sequences using an auto-encoder
Industry supervisor: structure. The proposed method is intended for
Milutin Pajovic (Mitsubishi Electric decoding transmissions at the base-station in a
Research Laboratories) massive connectivity scenario with many low-
complexity devices operating concurrently. Ex-
Number of Pages: cellent performance is demonstrated in estimat-
51 ing Time-of-Arrival (ToA), Carrier-Frequency
Offset (CFO), channel gain and collision multi-
Date of Completion: plicity from a received mixture of transmissions
June 6th, 2019 using the random access preamble structure
structure of the NB-IoT standard. The pro-
posed estimation scheme, employing a convolu-
tional neural network (CNN), achieves a ToA
Root-Mean-Square Error (RMSE) of 2.88 µs
and a CFO RMSE of 3.44 Hz at 10 dB Signal-
to-Noise Ratio (SNR), whereas a conventional
estimator using two cascaded stages have RM-
SEs of 16.20 µs and 7.98 Hz, respectively.

The content of this report is freely available, but publication may only be pursued with reference.
P R E FA C E
This thesis is written as a part of the Master of Science in Engineering - Wireless
Communication Systems at Aalborg University, Denmark.
The author would like to thank Milutin Pajovic, Toshiaki Koike-Akino and Ye Wang
from Mitsubishi Electric Research Laboratories for their collaboration and supervision
during the work of this thesis.
For citation the report employs IEEE referencing method. If citations are not present
by figures or tables, these are made by the author of the thesis. All units are indicated
according to the SI system.

Aalborg University, June 5, 2019

Mads Helge Jespersen


<[email protected]>

V
TA B L E O F C O N T E N T S

Preface V
Abbreviations IX
1 Introduction 1
1.1 Project scope . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Machine learning in wireless physical layer . . . . . . . . . . . . . 3
2 Narrowband IoT 5
2.1 Preamble design . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Baseband model of Narrowband Physical Random Access CHannel . . . 7
2.3 Connection establishment in NB-IoT . . . . . . . . . . . . . . . 10
2.4 Traditional synchronization parameter estimation . . . . . . . . . . 11
3 Deep Learning Estimator 15
3.1 Deep learning basics . . . . . . . . . . . . . . . . . . . . . 15
3.2 Estimation procedure . . . . . . . . . . . . . . . . . . . . . 16
3.3 Estimation of the Number of Users. . . . . . . . . . . . . . . . 17
3.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 17
3.5 Network Implementation . . . . . . . . . . . . . . . . . . . . 18
4 Results 21
5 Auto-encoder 27
5.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Discussion and conclusion 31
6.1 Future improvements . . . . . . . . . . . . . . . . . . . . . 31
6.2 The performance of DL in NB-IoT . . . . . . . . . . . . . . . . 31
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 32

Bibliography 33

Appendices 37
A Cramér–Rao lower bound 39
B Literature Study 41
C GlobeCom article 46

VII
A B B R E V I AT I O N S

3GPP . . . . . . . . . . . . . 3rd Generation Partnership Project


AWGN . . . . . . . . . . . . Additive White Gaussian Noise
BS . . . . . . . . . . . . . . . Base Station
CDF . . . . . . . . . . . . . . Cumulative Distribution Function
CFO . . . . . . . . . . . . . . Carrier Frequency Offset
CNN . . . . . . . . . . . . . . Convolutional Neural Networks
CP . . . . . . . . . . . . . . . Cyclic Prefix
CRB . . . . . . . . . . . . . . Cramér–Rao lower Bound
DAE . . . . . . . . . . . . . . Denoising Auto-Encoder
DL . . . . . . . . . . . . . . . Deep Learning
eMBB . . . . . . . . . . . . . Enhanced Mobile BroadBand
GPU . . . . . . . . . . . . . . Graphics Processing Unit
HARQ . . . . . . . . . . . . . Hybrid Automatic Repeat Request
IoT . . . . . . . . . . . . . . Internet of Things
LTE . . . . . . . . . . . . . . Long-Term Evolution
MAC . . . . . . . . . . . . . Medium Access Control
MCRB . . . . . . . . . . . . Modified Cramér–Rao Bound
MMSE . . . . . . . . . . . . Minimum Mean-Square Error
mMTC . . . . . . . . . . . . massive Machine-Type Communication
MSE . . . . . . . . . . . . . . Mean Squared Error
NB-IoT . . . . . . . . . . . . Narrowband IoT
NPBCH . . . . . . . . . . . . Narrowband Physical Broadcast CHannel
NPDSCH . . . . . . . . . . . Narrowband Physical Downlink Shared CHannel
NPRACH . . . . . . . . . . . Narrowband Physical Random Access CHannel
NPSS . . . . . . . . . . . . . Narrow Band Primary Synchronization Signal
NPUSCH . . . . . . . . . . . Narrowband Physical Uplink Shared CHannel
PAPR . . . . . . . . . . . . . Peak-to-Average Power Ratio
PD . . . . . . . . . . . . . . . Phase-Difference
PDF . . . . . . . . . . . . . . Probability Density Function
ReLU . . . . . . . . . . . . . Rectified Linear Unit
RMSE . . . . . . . . . . . . . Root-Mean-Square Error
SDR . . . . . . . . . . . . . . Software-Defined Radio
SGD . . . . . . . . . . . . . . Stochastic Gradient Descent
SIC . . . . . . . . . . . . . . Successive Interference Cancellation
SIR . . . . . . . . . . . . . . Signal-to-Interference Ratio
SNR . . . . . . . . . . . . . . Signal-to-Noise Ratio
TA . . . . . . . . . . . . . . . Timing Advance
ToA . . . . . . . . . . . . . . Time of Arrival
URLLC . . . . . . . . . . . . Ultra Reliable Low Latency Communication
VRAM . . . . . . . . . . . . Video Random Access Memory

IX
1 INTRODUCTION
An emerging communication paradigm is to provide low-power and low-complexity
devices with wireless Internet connectivity which is commonly referred to as Internet of
Things (IoT). A challenging task in IoT development is to transform the existing human-
centered communication structure into an object-centered communication system. This
challenge involves redesigning many aspects of the protocol such as data representation
and even changing the physical layer.
It is common to divide the communication system into three distinct services: a
basic stable connection, low-power massive connectivity and reliable communication. In
the 5G standard these are classified as Enhanced Mobile BroadBand (eMBB), massive
Machine-Type Communication (mMTC) and Ultra Reliable Low Latency Communication
(URLLC) [1]. In the traditional view of IoT communication, devices are expected to
generate data packets in the order of bits that are transmitted sporadically which is
consistent with the mMTC scenario and this project focuses only on low-power massive
connectivity. In order to save power, most devices will not be constantly connected
and are therefore required to establish a new connection for every data transmission
[2]. Further, the low-complexity of devices means that short-range multi-hop type
communication systems may not a suitable. For this reason, a cellular communication
structure is expected to play a significant role in IoT connectivity [3]. The performance
of the entire IoT system relies on which technology is used and most prominent cellular
IoT technologies using unlicensed frequency bands are Sigfox and LoRa. Unlicensed
spectrum communication suffers from cross-technology interference and very limited
available bandwidth. Narrowband IoT (NB-IoT) is standard proposed to operate
in the licensed Long-Term Evolution (LTE) spectrum by the 3GPP meaning it is
backed by already-established infrastructure owners and is expected to impact the IoT
connectivity landscape [2, 4]. The potentially massive number of devices attempting to
gain communication access through the Base Station (BS) simultaneously may become
a system bottleneck [2]. Much attention must therefore be paid to reduce signaling
overhead and improve efficiency of detection algorithms in the random access phase.
Random access is used to request uplink allocation from the base station and the random
access procedure has a high impact on device battery life and number of devices that
can be supported concurrently [5].
A typical random access procedure is initiated by a user that has a packet to transmit,
which transmits a random access preamble. The random access preamble is designed
such that the base station is able to efficiently detect the transmitting user and estimate
any timing offset between the user and base-station from the received signal. The timing
offset comprises of propagation time, downlink synchronization errors and channel delay
spread [6].

1.1 PROJECT SCOPE

NB-IoT is a recent standard proposed by the 3rd Generation Partnership Project (3GPP)
to accommodate the emerging number of wireless devices connected to the Internet and
is chosen as the example application in this project. NB-IoT is designed to co-exist with
LTE and provide low-cost and low-power devices with low throughput connectivity.
In NB-IoT random access there are only a relatively small number of different (i.e.,

1
2 1. Introduction

orthogonal) random access preambles for users to choose from. Users attempting to
gain channel access at the same time choose preambles in an uncoordinated manner.
It is likely that two or more users choose the same preamble, given a possibly large
number of users and relatively small number of orthogonal preambles. Transmitting
using the same preamble results in colliding packets which results in either discarding
both transmissions or only discarding one. The resulting collision may lead to a user
back-off time of up to almost 9 minutes [7, 8]. The main purpose of the random access
preamble is to make the base station able to detect each transmitting user and estimate
the synchronization parameters, Time of Arrival (ToA) and Carrier Frequency Offset
(CFO), of each user [6, 9].
In order to avoid unnecessary backoff periods and consequently improve channel
utilization and overall capacity of the NB-IoT system, this project seeks to find a method
for multi-user detection and synchronization parameter estimation in the case of packet
collision. The non-coherent addition of colliding signals means that the received signal
is simply a superposition of the individual transmissions and still contain information
from each user. Multi-user detection is a method used to increase capacity by detecting
interference and exploiting it to mitigate its effect on the desired signal [10]. Optimal
multi-user detection methods use the Viterbi algorithm which has a high complexity
which increases exponentially with the number of users. Further, most methods require
exact channel information at the receiver [11]. The optimal multi-user detector is not
used in practice but instead approximation such as Successive Interference Cancellation
(SIC) and turbo receivers are used instead [10]. Methods exist for multi-user detection
but traditional methods have several drawbacks for the scope of this project:
 Optimal decoding has exponentially increasing complexity with the number of
users
 The receiver require channel information of each user
 Multi-user detection methods in literature are not developed for synchronization
and channel estimation
 The receiver requires knowledge of the number of interfering signals
Deep Learning (DL)-based methods are well-suited for tackling algorithm deficit
problems [12] and can utilize non-linear relationships present in the signal to extract
the desired information. DL shows good results for source separation in speech pro-
cessing and is well suited for classification problems such as classifying the number of
interfering signals. DL algorithms are straightforward to develop and are typically not
computationally complex.
In summary this project proposes a DL-based method for separating colliding users,
detecting their number and estimating their respective ToAs and CFOs. The pro-
posed method is validated using simulations which demonstrate significantly improved
performance compared to the conventional approaches.

1.1.1 related work


Several papers have explored methods for activity detection, ToA and CFO estimation
using the NB-IoT Narrowband Physical Random Access CHannel (NPRACH) preamble
structure. As such, [6] estimates the ToA by searching for highest correlation between
the received signal and delayed/frequency-shifted preamble on a grid of possible delays
and frequencies. To reduce the complexity of the algorithm in [6], the ToA and CFO
1.2. Machine learning in wireless physical layer 3

are estimated using the residual phase difference between symbol groups and channel
hops in a two-stage procedure in [9]. With the goal to improve the ToA estimation, [13]
suggests a novel hopping pattern that renders more accurate ToA estimation compared
to that achieved with the already defined NB-IoT preamble.

1.2 MACHINE LEARNING IN WIRELESS PHYSICAL LAYER

The application of machine learning has shown promise in the physical layer where
optimal algorithms, e.g., in multi-user networks, tend to be computationally complex.
Neural networks have been previously employed to perform detection and successive
interference cancellation in multi-user CDMA systems [12]. For an overview of deep
learning applied to the wireless physical layer see Appendix B.
2 NARROWBAND IOT
NB-IoT is one of the most prominent cellular techologies to accommodate the emerging
number of wireless devices connected to the internet.
The system bandwidth for NB-IoT is only 180 kHz for both downlink and uplink.
The uplink supports both single tone transmission and multi-tone transmission. Multi-
tone transmission uses SC-FDMA with a sub-carrier spacing of 15 kHz and single-tone
transmission uses a subcarrier spacing of either 15 kHz or 3.75 kHz [6]. The following
description and implementations are limited to the case of single-tone transmission with
a 3.75 kHz subcarrier spacing yielding a total of 48 subcarriers and this project will focus
only on uplink.
NB-IoT specifies three physical layer channels: Narrowband Physical Uplink Shared
CHannel (NPUSCH), Narrowband Physical Downlink Shared CHannel (NPDSCH) and
NPRACH. Particularly interesting for low-complexity IoT devices is NPRACH which
is used to request uplink allocation from the base station.Establishing a connection
using random access is a four step procedure: First the user initiates a connection by
transmitting a random access preamble, the base station transmits a response with
allocated radio resources, the user transmits its identity and finally the base station
transmits a contention resolution to resolve potential colliding users using the same
preamble [13, 14].

2.1 PREAMBLE DESIGN

The inital premable sent by a user should provide enough information to the base station
such that the start of a frame can be precisely determined (ToA estimation) and any
CFO can be accounted for to improve symbol demodulation. The ToA estimation is sent
by the base station to the user to achieve uplink synchronization in the OFDMA system
[6].

Symbol group

266.7 µs ε=5
3.75 KHz

CP

m=0
12 sub-carriers Ω(m)

m=1

m=3

m=2

L=4

Uplink slot n = 1, ..., 128

Figure 2.1: Overview of NPRACH preamble and packet structure

5
6 2. Narrowband IoT

The preamble format and packet structure is illustrated in Figure 2.1. The preamble
is divided into symbol groups, where each group consists of a Cyclic Prefix (CP) and ε
identical symbols. The value of ε depends on preamble format. The preamble format
is chosen by the user measuring the downlink power to estimate its coverage area [14].
The most common preamble is preamble format 1 with preamble frame structure 0 or 1
which has  = 5 and a symbol time TSYM = 266.7 µs. The CP period for frame format 0
is TCP = 66.7 µs and TCP = 266.7 µs for frame format 1 [15]. The CP is designed such
that it is long enough to cover the maximum round trip delay to suppress inter-symbol
interference. Therefore one interpretation of allowing adaptive CP selection is for the
user to use the short CP in the range 0-8 km and the long CP in the range 8-35 km [6].
The full preamble consists of 4 repetitions of the symbol group which is again repeated
n = 2J , J = 0, . . . , 7 times for a full preamble length of L = 4 × 2J symbol groups. The
repetition of symbol groups occurs within an uplink slot, and the number of repetitions
is decided by the upper Medium Access Control (MAC)-layer depending on estimated
link quality [15]. For simplicity this projects lets J = 2 for all transmissions i.e., four
symbol groups are repeated 4 times.
The user chooses a contiguous set of K = 12, 24, 36 or 48 subcarriers with 3.75 kHz
spacing out of the available 48 subcarriers. This project focuses on preamble frame
structure type 1 where K = 12.
At the start of the NPRACH preamble transmission, the subcarrier of the first symbol
group is chosen at random. After each symbol group the subcarrier will change using a
deterministic channel hopping sequence so in the duration of a preamble there will be
L subcarrier hops. Since the hopping pattern is deterministic, several users choosing
the same initial subcarrier will thus collide for the entirety of the NPRACH preamble
sequence. The number of orthogonal preamble sequences is therefore the number of
allocated NPRACH subcarriers, K [7].
The channel hopping scheme for frame structure type 1 and preamble format 0 is
defined as follows: [15]

where n e RA
sc (i) is the frequency location of the ith symbol group. In the following the
symbol group index will be denoted m and the function which maps symbol groups to
a frequency channel is denoted Ω(m). The function c(n) is a pseudo random sequence
generator that is initialised with the base station ID. This means that the hopping
pattern is deterministic within a cell, but the subcarrier of every 4th symbol group
appears random to neighbouring cells. The above specification means there are two
“levels” of hopping as also illustrated in Figures 2.1 and 2.2. The hopping distance is
1 between symbol groups at m = 0 and m = 1, and between m = 2 and m = 3. The
hopping distance is always 6 between m = 1 and m = 2.
2.2. Baseband model of Narrowband Physical Random Access CHannel 7

Subcarrier (k)

Symbol index (n)

Figure 2.2: Illustration of active channels and users in the scenario where 4 users all choose different initial subcarriers.
Color indicates user number where no color indicates an inactive channel slot. Horizontal axis is symbol index n
where 5 consecutive symbols correspond to a symbol group. The vertical axis is subcarrier index k.

The channel hopping procedure aids in the estimation of ToA and also reduces inter-
and intra-cell interference [6]. The ToA should be estimated by the base station for
successful uplink signal decoding and it further enables device positioning. Error in the
ToA estimation results in the user not being able to receive the response sent by the base
station. ToA estimation therefore has a great impact on performance in NB-IoT [13].

2.2 BASEBAND MODEL OF NARROWBAND PHYSICAL RANDOM ACCESS CHAN-


NEL

At the receiver the phase of each symbol depends on ToA, denoted τ , the CFO, denoted
∆f and the frequency of the user’s chosen channel with respect to the receiver’s uplink
carrier which is Ω(m). The time-domain signal at the receiver can be expressed as:

ˆ
s(t) = e−j2πf (t)(t−τ (t)) (2.1)

Where: fˆ(t) = Ω(m) + ∆f (t) is received frequency with CFO [Hz]


τ (t) is time-varying ToA representing a non-stationary transmission [s]

In the baseband formulation each symbol is represented as a single complex number


and therefore Equation 2.1 is discretized. The total number of symbols in the preamble
(Counting the CP as a symbol) is: Nsym = 4(1 + ) · 2J = 24 · 4 = 96. For the sake of
simplicity the CFO and ToA are assumed to be constant for each user. The typical FFT
length in LTE is 512 [9], but for simplicity this model lets each sample, n, corresponds
to a symbol. In this model, the contents of the CP are interpreted as a symbol and
therefore no distinction is made between the CP and the ε = 5 repeated symbols in a
symbol group.
The nth time-domain sample represents the nth symbol and the signal from the kth
user is given by:

sk [n] = hk e−2π(fn +∆fk )(nTsym −τk ) (2.2)


8 2. Narrowband IoT

Where: h is the complex channel coefficient which is assumed to constant [·]


throughout a transmission
fn is the subcarrier frequency which is a function of the symbol [Hz]
group number m: f = Ω(m)
∆f is CFO which is asumed to be constant throughout a transmis- [Hz]
sion
Tsym is symbol time [s]
τ is a constant ToA representing a stationary user [s]

The received signal at the base station is a superposition of signals from multiple users,
given by
K−1
X
y[n] = ak sk [n] + w[n], (2.3)
k=0

where K is the maximum number of concurrent users, ak ∈ {0, 1} indicates whether the
kth user is active or not, and w[n] ∼ CN (0, 1/ρn ) denotes the additive noise with a per
symbol Signal-to-Noise Ratio (SNR) of ρn .

2.2.1 time of arrival


It is chosen to only model the signal for the long CP which corresponds to to distances
between the user and base station within a minimum of r = 8 km and a maximum of
R = 35 km [6]. The users are assumed to be uniformly distributed in the coverage area
of the base station as illustrated in Figure 2.3.

Figure 2.3: Geometry of users distribution.

The distance from the base station to the users d has the following Probability Density
Function (PDF) [16]:
2d
fD (d) = 2 , r ≤ d ≤ R, (2.4)
R − r2
which is used to model the ToA τ = dc , where c is the propagation speed.
The mean and the variance of the distance can then be found as:

2(r2 + r · R + R2 )
Z R
E[D] = d · fD (d)dd = ≈ 24.326 km (2.5)
r 3(r + R)

(r − R)2 (r2 + 4r · R + R2 )
Z R
Var(D) = d2 · fD (d) dd − E[D]2 = ≈ 52.767 km2 (2.6)
r 18(r + R)2
2.2. Baseband model of Narrowband Physical Random Access CHannel 9

According to Equation 2.5 the mean value of τ becomes

24.326 km
E[τ ] = = 81.1 µs (2.7)
3 × 108 m/s

The variance of the ToA is:


!2
1
Var(τ ) = Var(D) · = 0.59 ns2 (2.8)
c

The distribution of the ToA represents a realistic cellular system but is only valid
for stationary transmitters. A more complete model will include the dynamics of ToA
originating from channel sampling time offset and delay spread [6] as well.

2.2.2 carrier frequency offset


The CFO in 2.2 is chosen uniformly at random between −20 and +20 Hz and does not
account for frequency drift [9]. The mean and variance of ∆f are easily derived. The
means and the variances of τ and ∆f are used for feature scaling of the variables for the
neural network as described in section 3.5.1.
The CFO should be estimated already in the downlink cell search procedure and this
model should reflect the residual CFO that is due to imperfect estimation or oscillator
drift. Again this model does not include the dynamic aspect of frequency drift and
motion (Doppler) but rather considers it as a constant offset.

2.2.3 channel gain


The channel coefficient h of the signal model in 2.2 is a complex-valued constant which
accounts for small scale fading: h ∼ CN (0, 1). This means that the average received
signal power is normalized to one. The narrowband channel is modeled as a slowly
varying single-tap Rayleigh fading channel and for this reason, modeled as a single
coefficient [9]. Large scale fading is not included in the model since users already have
knowledge of the downlink SNR and adjust their transmit power accordingly using power
control.

2.2.4 number of active users


The activity indicator ak is modeled as Bernoulli random variable with the probability
of transmitting p and a1 , . . . , aK are iid. The number of concurrent active users is
K
X
Na = ak ∼ B(K, p), (2.9)
k=1

where B is the binomial distribution. The case with K = 4 and p = 0.5 is considered
throughout this project. The probability of exactly k users colliding is then:
!
K k 4!
Pr(k) = p (1 − p)K−k = pk (1 − p)4−k . (2.10)
k k!(4 − k)!
10 2. Narrowband IoT

2.3 CONNECTION ESTABLISHMENT IN NB-IOT

When a user is activated it has to select a base station to access the network through
and is considered to be in idle mode. In idle mode the user acquires time and frequency
synchronization using Narrow Band Primary Synchronization Signal (NPSS) which is
a known sequence transmitted in every 6th out of 10 radio subframes. The user can
achieve slot-time synchronization and estimate CFO using correlation between the known
sequence and received signal [17]. Joint time and frequency estimation is costly for low
complexity IoT devices and is therefore often implemented as a two step procedure:
First the timing offset is estimated from the first received NPSS in the presence of CFO,
second the CFO is estimated from subsequent received NPSSs using the estimated timing
offset [17].

User BS

NPSS

NSSS

NPBCH

NPRACH (Msg1)

NPDSCH (Msg2)
Time

NPUSCH (Msg3)

NPDSCH (Msg4)

NPUSCH HARQ Ack

NPUSCH (Msg5)

Figure 2.4: Random access procedure of NB-IoT [17]. This work focuses on NPRACH (Msg1).

Before a user can send the NPRACH it needs to be aware of the system configuration
acquired from the base station through the Narrowband Physical Broadcast CHannel
(NPBCH). The time-domain allocation of the NPRACH (Msg1) transmission is defined
by the number of repetitions, n, chosen by the user, and a specific period starting time.
The frequency allocation of the NPRACH transmission is the subset of 12 subcarriers
chosen by the user [17].
The random access procedure in NB-IoT is a 5 message procedure as illustrated in
Figure 2.4. The first message is the NPRACH (Described in section 2.1).
When the base station successfully detects an NPRACH it will respond on NPDSCH
with Msg2 (Random Access Response). The RA Response window starts on the subframe
containing the end of the preamble repetition plus 4 subframes (for preamble format 0 or 1
and n < 64 repetitions) [15]. The response contains a Random Access Preamble identifier,
a Timing Advance (TA) parameter, and allocated radio resource for transmitting Msg3.
The Random Access Preamble identifier is a number calculated from the subframe index
and initial carrier frequency (Ω(0)) of the received NPRACH [8]. The TA is used to
time-align users’ signals at the base station to account for propagation delay. The TA is
an integer number between 0 and 1282 where each integer step corresponds to a time
correction of 16Ts [18]. Ts is a basic unit of time in LTE and is defined as Ts = 32.55 ns,
2.4. Traditional synchronization parameter estimation 11

resulting in timing corrections in increments of 0.52 µs.


Msg3 is the Connection Resume Request in which a user will transmit using the
allocated radio resource in the NPUSCH. Msg3 contains the user identity, a scheduling
request and the user’s power and buffer status [17].
It can happen that two users transmit using the same NPRACH in Msg1 and will
therefore receive the same allocation for Msg3 without the user or base station being
aware that a packet collision occurred.
To resolve potential contentions, the user starts a “Contention Resolution Timer” in
which it expects a Msg4 response for the user identity. The user transmits incremental
parity bits using Hybrid Automatic Repeat Request (HARQ) until the timer reaches
zero or it receives the expected response [7].
In Msg4, the base station has resolved potential contentions and transmits a connection
setup with allocated radio resources or a connection resume message. Both Msg4 and
Msg3 are transmitted using HARQ [7].
In Msg5, the user responds with a connection setup complete or resume complete
message. The resume procedure is used to reduce the number of message exchanges
between user and base station by resuming configurations from a previously established
connection [17].

2.4 TRADITIONAL SYNCHRONIZATION PARAMETER ESTIMATION

From the received NPRACH (MSG1) the BS must detect active users and perform
synchronization parameter estimation.
The Phase-Difference (PD)-based method proposed in [9] utilizes the relationship
between the phase trace of the received signal and the ToA and CFO. Phase differences
between symbols in the received signal are averaged to estimate CFO. The ToA is found
by subtracting the phase contribution due to the estimated CFO from the phase of the
received signal and averaging the phase difference between symbol groups on different
frequencies.
As a benchmark for the detection of the number of users, an amplitude-based estimator
is considered. The mean amplitude of the received signal for different number of colliding
users is compared to the amplitude of the received signal. The closest match then yields
an estimate of the number of colliding users present in the received signal.

2.4.1 estimating the number of active users


The number of active users Na in the received signal can be estimated by comparing the
amplitude of y from Equation 2.3 to the expected amplitude with varying number of
users. The amplitude of the received signal is a sum of Na complex random variables:

Na
X
ANa =
hk (2.11)
k=1

and will therefore vary a lot. h i


The mean amplitude of the received signal is E |y| and the mean amplitude of a
h i
signal with Na users is E ANa . The estimated number of active users is the closest
12 2. Narrowband IoT

matching value of Na
h !
i h i
N̂a = arg min E ANa − E |y| .

(2.12)
Na

This is only used as a benchmark for detecting the number of users Na .

2.4.2 estimating carrier frequency offset


The synchronization parameters ∆f and τ can be determined by realizing that the phase
of the received signal is proportional to both CFO and ToA. The method presented in
[9] is a two-step procedure where first the CFO-induced phase is estimated and then
subtracted from the phase of the received signal to estimate the ToA-induced phase.
The phase-trace of a noise-free received signal for user k can be expressed [9]:

βk [n] = −2πτk fn − 2π∆fk nTsym + C (2.13)

where C is a random constant phase offset. In practice the phase-trace of the received
signal is not straightforward to obtain due to 2π-ambiguity but the unwrap-function and
complex argument function of the received signal provides a good approximation [9]:

βk [n] = unwrap arg(sk [n]) . (2.14)

Phase differences between symbols with the same subcarrier frequency fn can be used
to estimate the phase contribution of the CFO. Symbols groups contain five consecutive
identical symbols which phase-differences should be averaged to reduce noise variance.
Again then the average of all these estimates are used is used to estimate the CFO-induced
phase:
−1 X4
1 1 NX
βk,∆f = βk [5n + i + 1] − βk [5n + i]. (2.15)
N 4 n=0 i=1
The CFO estimate of the kth user is then simply

ˆ = 1 βk,∆f .
∆f (2.16)
k

This estimate is only valid if the phase-trace of the received signal only contains the
contribution from a single user.

2.4.3 estimating time of arrival


The estimated CFO-induced phase is subtracted from the phase-trace of the received
signal:
ˆ nTsym .
βk [n] = βk [n] − 2π ∆f (2.17)
k

Differences when varying the frequency fn of this phase-trace are averaged to filter out
noise and are used to estimate ToA. The phase difference of the same symbol between
channel hopping is proportional to the channel hopping distance and the ToA can be
estimated more accurately using more channel hops.
This two-stage synchronization parameter estimation is computationally efficient and,
the approach decouples detection and estimation [9].
2.4. Traditional synchronization parameter estimation 13

Another approach which jointly detects users and estimation is presented in [6]. This
method computes the correlation with the received signal and preambles with different
time and frequency corrections to find the point of highest correlation. This 2-D grid
search approach finds the ToA and CFO simultaneously but not very computationally
efficient and does not specify a detection threshold.
3 D E E P L E A R N I N G E S T I M ATO R

3.1 DEEP LEARNING BASICS

This section briefly explains the basics of deep learning as used in the estimator described
in the following sections.

3.1.1 convolutional neural networks


Convolutional Neural Networkss (CNNs) are a specialized type of neural networks which
employs one or more convolution operations. The convolution is defined as:

X
s[n] = x[a]w[n − a] (3.1)
a=−∞
where the discrete input signal is x and w is called the kernel. A CNN typically has
multiple cascaded convolutions with different kernels and the training procedure attempts
to find the kernels which minimize the loss function.
CNNs have equivariance to translation meaning that the convolution operation will
produce an output shifted in time by the same amount as the input is shifted in time.
For this reason, a CNN can only learn to extract features which are a function of local
interactions and are equivariant to translation. A typical layer in a CNN consists of
three operations: Convolution over the input signal, a non-linear activation function
(typically the Rectified Linear Unit (ReLU) function) and a pooling operation. The
pooling operation replaces the output with a lower-dimensional statistical summary
of the response over a small range such as the average within a neighborhood. This
makes the output representation approximately invariant to small time-shifts. The most
popular pooling-operation is the max-pooling [19]. CNNs are used extensively in image
processing and speech recognition where they are useful for 1-D or 2-D data where local
features affect a particular output. CNNs structures alleviates computation complexity
and storage requirements compared to a feed-forward neural network [20].

3.1.2 loss function


The goal of training a network is minimizing the average loss over training pairs. Each
training pair consists of an input x and a target t: (xn , tn ) and all training pairs are
assumed i.i.d. The neural network should be an estimator which uses the input to predict
the target, t̂(x). The general loss function is written `(t, t̂(x)) and a typical example is
squared error `(t, t̂(x)) = (t − t̂(x))2 or cross-entropy for categorical models [12].

3.1.3 learning
When using machine learning to minimize ` this is not solved analytically but instead
addressed by Stochastic Gradient Descent (SGD). The model is defined as a set of
probability distributions parameterized by a vector θ: p(t|x, θ). The learning task can
be defined to obtain a parameter vector θ which can accurately describe this probability
distribution.
Using a maximum likelihood formulation this can be written as maximizing the
log-likelihood [12]:
max ln p(D|θ) (3.2)

15
16 3. Deep Learning Estimator

where D is the set of training pairs.


Each iteration of the SGD uses a set of training pairs called batches to update θ. The
parameter vector θ is updated in the direction of the gradient of the loss function [12]:

θnew ← θold + γ∆θ ln p(tx |xn , θ)|θ=θold (3.3)

where γ is the learning rate and ∆θ is the nabla operator used to denote the gradient
of ln p(tx |xn , θ) with respect to θ averaged over the training batch. For a multilayered
network the computation of the gradient becomes the backpropagation algorithm [12].
Increasing the number of SGD iterations decreases the loss and iterations are repeated
until sufficient performance is achieved or the loss-function has converged.

3.1.4 hyper-parameters
The learning rate γ is an example of a hyper-parameter which typically must be defined
prior to training the network. Other hyper-parameters include the number of layers to
include in the model, the type of layers, the number of weights in each layer and many
more. Hyper-parameters must be fine-tuned in order to obtain the desired performance
which is a time-consuming task. The hyper-parameters define the architecture and
capacity of the network as well as how the network is trained. A network with too little
capacity will not generalize well and too much capacity increases the risk of over-fitting.
To avoid over-fitting regularization is typically applied which adds a penalty to the loss
function which prevents a priori unlikely values of θ e.g. large weights. To test whether
the model is overfitting to the training data the performance is often tested using a
separate validation set.
In this project the data is generated by a simulation corresponding to an infinitely
large set of available training data D. For each iteration of the SGD process a new
batch is generated by the simulation which means that no training data is used more
than once. This reduces overfitting and eliminates the need for separate training and
validation datasets.

3.2 ESTIMATION PROCEDURE

The goal of the estimator is to use the discrete signal y[n] to estimate the activity
indicator a, ToA τ , CFO ∆f , and channel coefficient h of each user. Since the activity
indicator of each user is a random variable, the total number of active users in the
received signal is unknown. This boils down to a notoriously challenging problem of
source separation with unknown number of users [21]. Deep learning has significantly
improved the field of source separation and the general idea of using deep learning is to
capture non-linear relationship between inputs and corresponding targets that is often
difficult to model with analytically tractable expressions [21]. In this project, estimating
the unknown parameters is dealt with by splitting the problem into:

 Classification of the number of active users; and


 Estimation of ToAs, CFOs and channel coefficients given the number of users.

The two separate tasks are combined such that the synchronization parameters are
accurately estimated for each detected user.
3.3. Estimation of the Number of Users 17

3.3 ESTIMATION OF THE NUMBER OF USERS

Finding the number of active users, Na , is formulated as a classification problem where


p = OneHot(Na ) is a categorical random variable encoded as a one-hot vector specifying
Na . With a one-hot encoding, the true target p = [p0 , p1 , . . . , pK ] has entry one at index
Na , and zero entries everywhere else. This is different from a typical way of representing
active users where users are ordered in a vector and each index indicates the activity of
a unique user. The number of users Na can then be estimated as the l0 norm of that
sparse vector. In this collision scenario users are transmitting using the same spreading
sequence and are not uniquely distinguishable. For this reason, only the information on
the number of active users is represented in p.
Cross-entropy loss is typically used in classification problems [22], and [23] suggests that
the cross-entropy loss in classification problems leads to faster convergence and better
generalization compared to the Mean Squared Error (MSE). For nonbinary classification,
we typically use softmax cross entropy loss (or negative log-likelihood) expressed as:
K
X
`NLL (p, q) = − pk ln qk , (3.4)
k=0

where q is a continuous differentiable softmax function:


exp(πk )
qk = P , (3.5)
i exp(πi )

where [π0 , π1 , . . . πK ] are the outputs from the last layer of the neural network and
[q0 , q1 , . . . qK ] represent the a posteriori class probabilities. A hard class prediction could
then be found as arg maxi [πi ].
The simple arg max is not differentiable, and thus the softmax approximation of
argmax is used instead [24]. The softmax function is commonly used together with
the negative log-likelihood loss function where it is equivalent to maximum likelihood
estimation and the ln in the loss-function keeps the exponential in the softmax function
from saturating the output [25].
The loss function for estimating the number of users is written:
!
exp(πk )
`p (k, π) = − ln P . (3.6)
i exp(πi )

and the objective of the training procedure is to minimize `p .

3.4 PARAMETER ESTIMATION

The parameters to be estimated are collected in a vector


h iT
xk = τk , ∆fk , <[hk ], =[hk ] . (3.7)

Note that it was found that representing the complex-valued channel coefficient h by
Cartesian coordinate (i.e., real and imaginary parts) shows superior performance to
phasor representation (i.e, amplitude and phase) as seen in Figure 4.5. For K users, the
respective vectors are collected in a matrix

X = [x0 , x1 , . . . , xK−1 ]. (3.8)


18 3. Deep Learning Estimator

The neural network seeks to find an estimate X̂ such that EkX − X̂k22 is minimal which
is equivalent to a Minimum Mean-Square Error (MMSE) estimator.
The above formulation is sufficient to derive an estimation procedure. However, X
consists of multiple parameters which have values on different scales. When using a
practical optimization algorithm to find an estimate, any scaling difference between the
parameters will affect the impact each value has on the gradient descent step.
To circumvent possible issues arising from error variations across parameters, we
minimize the reconstruction error instead. The actual received signal without additive
noise, s, with the parameters in matrix X can be reconstructed using Equation 2.2. The
reconstruction is conveniently represented using function f (·) such that

s = f (X). (3.9)

For each estimate X̂, the equivalent noise-free signal ŝ is reconstructed and compared
to the actual noise-free received signal s. The noise-free signal is known during the
training procedure and is used so the output of the neural network does not account for
the distribution of the noise. The data fidelity (i.e., reconstruction loss) is quantified
using the MSE metric such that
2 2
`r (X, X̂) = E f (X) − f (X̂) = E s − ŝ . (3.10)

2 2

The number of concurrent users in each sample is known during training so when
reconstructing the signal ŝ, the contributions from the correct number of users are taken
into account when calculating the reconstruction loss `r for each sample.
The loss function which the neural network seeks to minimize is simply the sum
of Equation 3.4 and Equation 3.10

loss = `p (k, q) + `r (X, X̂). (3.11)

3.5 NETWORK IMPLEMENTATION

An overview of the neural network that estimates both the number of users and synchro-
nization parameters is illustrated in Figure 3.1.
The output of the network is the flattened matrix X and the probability vector π.
For 4 users there are 4 · 4 = 16 parameters in X and 5 possible classes in the number
of users (including the zero users case). The input to the network is processed so as to
extract common features that are subsequently used for multi-task learning, that is, to
detect the number of users and estimate their parameters. The first layer performs a
1-dimensional convolution over the input signal. Since the number of users, ToA, CFO
and channel coefficient all are assumed to be constant throughout a transmission, a
convolution layer is chosen so as to extract translationally invariant features of the input
time-domain signal.
Following a typical CNN structure, batch normalization, non-linear activation and
max-pooling are employed. The convolution layers, activations and pooling layers are
repeated to form a deep neural network. The features found by the convolution layers
are reshaped to a single vector which is then used as input to two individual feedforward
neural networks. One of the networks performs classification and detects the number of
users based on the output of the feature extraction layers. The other network performs
regression with the goal to yield parameters so that the reconstructed signal is as close
3.5. Network Implementation 19

1x1000 1x1000
1x2100
2@96x1 200@91x1 200@45x1 100@43x1 100@21x1 ReLu 1x16
ReLu

1x21

Concatenation
1x1000
1x200
1x5

ReLu
Batch-norm, Batch-norm,
Flatten ReLu
1-D Convolution ReLu, 1-D Convolution ReLu,
Max-Pool Max-Pool

Figure 3.1: Overview of DNN architecture for detecting and estimating synchronization parameters of up to 4 colliding
users.

as possible to the received signal in the MSE sense. Each feedforward network has two
fully connected layers followed by the ReLU activation and a linear output layer. The
network and automatic differentiation are implemented using the PyTorch framework
[26] and trained using multiple Graphics Processing Units (GPUs).
Another similar network architecture is in Figure 3.2 where each network can either
be trained individually or jointly. By comparing the performance of different network
architectures the network in Figure 3.1 was eventually decided as performing adequately.
ToA CFO

0.52 0.11
1.56 3.12
ToA + CFO NaN NaN
estimator ... ...
network NaN NaN
Mx2

Re(A) Im(A)

0.63 5.17
0.44 3.20
0 0
Active user ... ...
Received mixed signal detection 0 0
network
Mx2

Figure 3.2: Block diagram of network architecture for joint activity detection, ToA and CFO estimation.

3.5.1 scaling
In general the convergence of a neural network is faster if all inputs to all layers have
zero-mean and unit covariance between training examples in the case when all examples
are of equal importance [27]. Weights of a network w are updated according to the error
δ by: w + δx. If x has non-zero mean the updates on all the entries in w will be biased
in a particular direction resulting in inefficient training. The unit covariance ensures
that the learning rate stays consistent between each example and that all input examples
are made equally important [27].
The input to the network y and each parameter in the output X is scaled to have
zero mean and unit variance. In the simulation ToA, CFO and channel coefficient are
all drawn according to the distributions given in the system model and Na is drawn
according to Pr(k) for each sample as described in section 2.2. The variance and mean
of each parameter (CFO, ToA and h) are known in advance and are used to normalize
20 3. Deep Learning Estimator

the parameters to have mean zero and unit variance. The standardized ToA is given by

τ − E[τ ]
τ0 = . (3.12)
Var(τ )

The CFO, ∆f , is scaled similarly. No normalization is necessary for the channel


coefficients since h ∼ CN (0, 1) and thus no scaling is necessary for the signal y.
4 R E S U LT S
The neural network is trained using samples generated with up to K = 4 concurrent
users and at an SNR of 10 dB. New batches are generated for every step in the training
procedure, and for this reason the over-fitting is suppressed as no training instance is ever
used twice. The learning rate is found using empirical trials to 0.0001 which provides
steady convergence of the loss function. Each batch consists of 50,000 realizations of
y from Equation 2.3. The batch size is chosen as large as possible using the available
Video Random Access Memory (VRAM) on the GPUs. Intuitively the size of each batch
is linked to the accuracy of each gradient descent step and is therefore chosen as large
as possible.
The stochastic optimization method based on adaptive momentum (ADAM) [28] is
used due its effectiveness and popularity in recent deep learning research. A total of
20,000 different batches are used in training. In general the number of batches used are
increased until the loss converges or sufficient results are achieved.
In Figures 4.1 and 4.2, the estimation of collision multiplicity is shown for the proposed
classification method compared to the simple amplitude-based method described in
section 2.4.1. As colliding signals will add non-coherently, the amplitude of the signal is
not a good indicator on collision multiplicity. Using the proposed estimator 1 and 2 users
are successfully identified with 98.1 % and 92.9 % at an SNR of 10 dB and the estimation
accuracy decreases with the number of concurrent users. The amplitude-based method
successfully identified 1 and 2 users with 44.8 % and 30.2 % accuracy but is better at
estimating 0 and 4 users. The proposed method often miss-classifies a signal containing
4 colliding users as resulting from transmissions of 3 users. Further, the classification
accuracy depends on the SNR used during training. Training at testing using the same
SNR provides better results.

Confusion matrix Confusion matrix


630 60
0 629 1 0 0 0 99.84% 0 60 0 0 0 0 100%
6.29% 0.01% 0.0% 0.0% 0.0% 6.00% 0.0% 0.0% 0.0% 0.0%
0.16% 0.00%
2451 257
1 19 2405 27 0 0 98.12% 1 64 115 45 19 14 44.75%
0.19% 24.05% 0.27% 0.0% 0.0% 6.40% 11.50% 4.50% 1.90% 1.40%
1.88% 55.25%
True number of users

True number of users

3792 371
2 0 128 3523 140 1 92.91% 2 17 115 112 47 80 30.19%
0.0% 1.28% 35.23% 1.40% 0.01% 1.70% 11.50% 11.20% 4.70% 8.00%
7.09% 69.81%
2511 258
3 0 9 593 1831 78 72.92% 3 2 48 53 45 110 17.44%
0.0% 0.09% 5.93% 18.31% 0.78% 0.20% 4.80% 5.30% 4.50% 11.00%
27.08% 82.56%
616 54
4 0 0 50 514 52 8.44% 4 0 7 13 7 27 50.00%
0.0% 0.0% 0.50% 5.14% 0.52% 0.0% 0.70% 1.30% 0.70% 2.70%
91.56% 50.00%
648 2543 4193 2485 131 10000 143 285 223 118 231 1000
97.07% 94.57% 84.02% 73.68% 39.69% 84.40% 41.96% 40.35% 50.22% 38.14% 11.69% 35.90%
2.93% 5.43% 15.98% 26.32% 60.31% 15.60% 58.04% 59.65% 49.78% 61.86% 88.31% 64.10%
0

Predicted number of users Predicted number of users

(a) Proposed method: 84.4% total accuracy at 10 dB SNR. (b) Conventional power-detection method: 35.9% total
accuracy at 10 dB SNR.

Figure 4.1: Confusion matrices for estimating the multiplicity of collisions comparing the neural network classifier
with a simple classifier based on the amplitude of the received signal.

21
22 4. Results

1.0

0.8
Probability of detection
0.6
NN estimator, 1 user
Amplitude-based estimator, 1 user
0.4 NN estimator, 2 users
Amplitude-based estimator, 2 users
NN estimator, 3 users
0.2 Amplitude-based estimator, 3 users
NN estimator, 4 users
Amplitude-based estimator, 4 users
0.0
4 6 8 10 12 14 16 18
SNR [dB]

Figure 4.2: Accuracy of estimating the number of colliding users. A signal is deemed correctly detected if the number
of users are estimated correctly. The NN estimator is trained for signals with 10 dB SNR.

The accuracy of the parameter estimation X̂ is found by comparing with the target
X. Since the loss function defined in Equation 3.10 only depends on the reconstruction
error, the estimated parameters in X̂ are arbitrarily ordered across users. To compare
the output with the target X the parameters are chosen to be ordered according to the
estimated amplitudes. In cases where the estimated amplitudes are similar, the ordering
may be wrong which leads to an artificially high error when evaluating performance
for multiple users. Any of the parameters can be used to dictate the ordering but the
amplitude is chosen as it is linked to SNR at the receiver in practice.
The Root-Mean-Square Error (RMSE) of each parameter in X̂ is calculated as:
r h i
RMSEk = E kek k22 , (4.1)

where e.g. the estimation error of τ is: ek = τk − τ̂k . The RMSE of the proposed neural
network-based estimator is the average of all RMSEs up to user k:
k
1X
RMSENN,k = RMSEi . (4.2)
k i=1

The performance of the neural network estimator is to be compared with the con-
ventional estimator described in section 2.4. The conventional estimator is only able
to estimate a single set of parameters, regardless of the actual number of users k. The
error of the conventional estimator is therefore measured as the estimate which has the
smallest error over all actual sets of parameters in X, e.g. the estimated ToA error is

eτ,PD = min(|τk − τ̂PD |). (4.3)


k

This gives the conventional estimator an artificial advantage.


The RMSE of ToA and CFO estimation with a varying number of users are shown
in Figures 4.3 and 4.4. The neural network-based estimator shows lower estimation
error for both ToA and CFO compared to the PD-based estimator even for a single
user. For two users the proposed estimator is superior to the conventional estimator
when estimating ToA. At 10 dB the proposed estimator has an RMSE of 2.88 µs and
23

40 NN estimator, 1 user NN estimator, 3 users


PD-based estimator, 1 user PD-based estimator, 3 users
35 NN estimator, 2 users NN estimator, 4 users
PD-based estimator, 2 users PD-based estimator, 4 users

ToA estimation RMSE [us]


30
25
20
15
10
5
0
4 6 8 10 12 14 16 18
SNR [dB]

Figure 4.3: RMSE of ToA estimation across SNRs.

45
NN estimator, 1 user NN estimator, 3 users
40 PD-based estimator, 1 user PD-based estimator, 3 users
NN estimator, 2 users NN estimator, 4 users
35 PD-based estimator, 2 users PD-based estimator, 4 users
CFO estimation RMSE [Hz]

30
25
20
15
10
5
0
4 6 8 10 12 14 16 18
SNR [dB]

Figure 4.4: RMSE of CFOs estimation across SNRs.

3.44 Hz for a single user compared to 16.20 µs and 7.98 Hz for the conventional estimator.
The relatively high RMSE of the conventional estimator is likely due to the noise which
causes wrong phase unwrapping at low SNRs [9].
To explore the accuracy of the conventional and the proposed estimator the distribution
of the errors are plotted in Figures 4.6a and 4.6b. The model is trained with the number
of users varying from 0-4 but for exploring the results Cumulative Distribution Functions
(CDFs) are shown for signals with each number of collisions individually. It is seen
that there is a clear advantage using the NN estimator for 1 user but the performance
advantage is not convincing for multiple users. This is believed attributed to the
unfair advantage of the conventional estimator where estimates are chosen as the closest
matching value across all different users.
ToA and CFO both show similar distributions and it better performance can be achieved
by including more capacity in the network and fine-tuning of the hyper-parameters.
The convergence of the loss function during training of a network trained for a single
user transmission is seen in Figure 4.7 along with the CDF of estimation error. This
shows the excellent performance that can be achieved even at high SNR.
The accuracy in estimating the channel coefficient h is shown in Figure 4.5. The
RMSE is 0.101 for the in-phase part and 0.103 for the quadrature part for a single user.
24 4. Results

The RMSE shows a similar trend as in ToA and CFO estimation with deteriorating
performance as the number of concurrent users increases.

1.4 NN estimator, 1 user


NN estimator, 2 users
Channel coefficient estimation RMSE

1.2 NN estimator, 3 users


NN estimator, 4 users
1.0 NN estimator using phasor representation, 1 user

0.8
0.6
0.4
0.2
0.0
4 6 8 10 12 14 16 18
SNR [dB]

Figure 4.5: RMSE of channel coefficient estimation across SNRs.

Overall the proposed method presents considerably improved performance compared to


the traditional estimator in scenarios with a single, as well as multiple users. Despite the
success of the numerical results, they are far from the theoretical achievable performance
bound as derived in Appendix A implying that there are still opportunities for further
enhancements.
25

CDF of ToA estimation conditioned on 1 user(s) CDF of ToA estimation conditioned on 2 user(s)
100 NN estimator RMSE: 2.88 us 100 NN estimator RMSE: 13 us
Mean adjacent symbol group PD-based estimator RMSE: 16.2 us Mean adjacent symbol group PD-based estimator RMSE: 11.8 us
80 80

60 60
%

%
40 40

20 20

0 0
125 100 75 50 25 0 25 50 125 100 75 50 25 0 25 50 75
ToA estimation error [us] ToA estimation error [us]
CDF of ToA estimation conditioned on 3 user(s) CDF of ToA estimation conditioned on 4 user(s)
100 NN estimator RMSE: 19.9 us 100 NN estimator RMSE: 24.6 us
Mean adjacent symbol group PD-based estimator RMSE: 11.6 us Mean adjacent symbol group PD-based estimator RMSE: 12.5 us
80 80

60 60
%

40 40

20 20

0 0
100 75 50 25 0 25 50 75 75 50 25 0 25 50 75
ToA estimation error [us] ToA estimation error [us]

(a) CDFs for ToA estimation conditioned on varying number of collisions.


CDF of CFO estimation conditioned on 1 user(s) CDF of CFO estimation conditioned on 2 user(s)
100 NN estimator RMSE: 3.44 Hz 100
Mean adjacent symbol PD-based estimator, RMSE: 7.98 Hz
80 80

60 60
%

40 40

20 20
NN estimator RMSE: 15.7 Hz
0 0 Mean adjacent symbol PD-based estimator, RMSE: 8.42 Hz
60 40 20 0 20 40 60 75 50 25 0 25 50 75 100
CFO estimation error [Hz] CFO estimation error [Hz]
CDF of CFO estimation conditioned on 3 user(s) CDF of CFO estimation conditioned on 4 user(s)
100 NN estimator RMSE: 24.3 Hz 100
Mean adjacent symbol PD-based estimator, RMSE: 8.68 Hz
80 80

60 60
%

40 40

20 20
NN estimator RMSE: 30.9 Hz
0 0 Mean adjacent symbol PD-based estimator, RMSE: 9.06 Hz
100 75 50 25 0 25 50 75 100 100 75 50 25 0 25 50 75 100
CFO estimation error [Hz] CFO estimation error [Hz]

(b) CDFs for CFO estimation conditioned on varying number of collisions.

Figure 4.6: Distribution of estimation errors for the conventional and proposed estimator.
4. Results

Loss function during training of 1 user(s)


0.6
Batchsize: 25000
Learning rate: 0.0001
SNR: 20 dB
0.5 Dropout rate: 0
0.4
MSE

0.3
0.2
0.1
0.0
0 2500 5000 7500 10000 12500 15000 17500 20000
Number of training batches
CDF of ToA estimation CDF of CFO estimation
100 100
80 80
60 60
%

%
40 40
20 20
NN estimator NN estimator
Mean adjacent symbol group phase difference estimator Mean adjacent symbol phase difference estimator
0 ToA Target 0 CFO Target
3 2 1 0 1 2 3 10 5 0 5 10 15
ToA estimation error [us] CFO estimation error [Hz]
Figure 4.7: Loss function and CDF of ToA and CFO estimations using the trained deep learning estimator.
26
5 AUTO-ENCODER
Current communication systems are optimized block by block for performance in relation
to a model [29]. A novel concept is to consider the communication system in an end-to-
end manner and jointly optimize both the blocks within the transmitter and receiver
as well as optimizing the transmitter and receiver structure jointly. In [29] an end-to-
end auto-encoder is proposed to derive modulation and coding schemes which shows
performance comparative to state-of-the art communication methods. The channel is
treated as a layer in a the neural network and encoder and decoder are both neural
networks. Applications of an auto-encoder applied to the physical layer of a wireless
communication systems shows great promise [30] and for this reason, are considered
applied to preamble encoding and decoding.
The general auto-encoder attempts to reconstruct the given input based on a latent
representation called h. Traditionally it is designed for dimensionality reduction and
feature learning where h is a much lower dimensionality space. In its simplest form the
encoder h = f (x) produces the latent representation and the decoder reconstructs the
input from the latent representation x̂ = g(f (x)). In the noise-free scenario x̂ = x [19].
A modern application for auto-encoders is the Denoising Auto-Encoder (DAE) where
the decoder predicts the original input based on a corrupted sample [19]. In this project
a modified DAE is used to find a preamble sequence and decoding method jointly. The
computational graph for the auto-encoder structure is illustrated in Figure 5.1.

Figure 5.1: Structure of auto-encoder mapping the input x to latent variable y using the function f (encoder). The
corruption process is the simple Additive White Gaussian Noise (AWGN) channel producing y + w. The function g
(decoder) takes the corrupted latent variable to create a reconstruction of the input x̂. The learning process attempts
to minimize the scalar reconstruction error described by the loss L.

The DAE maps the input:


h iT
xk = τk , ∆fk , <[hk ], =[hk ] (5.1)

to a transmitted preamble sequence y = f (x) where f (·) is a parametric function of


frequency hopping pattern and symbol sequence. The typical DAE has the corruption
process applied to the input space x but in this context the corruption process applied
to the latent representation. Only the simple AWGN channel is considered where the
corruption process is C(ỹ|y) = y + w where w ∼ CN (0, 1/ρ).

27
28 5. Auto-encoder

The decoder is a deep neural network which takes the corrupted latent representation
ỹ as input to reconstruct the input x with minimum error. The reconstruction loss is
calculated by some function L(·), such as the l2 norm.

L(x, g(f (x) + w)) = L(x, g(ỹ)) = L(x, x̂) (5.2)

The training process attempts to minimize this loss by updating the weights in f (·) and
g(·).

5.1 ENCODING

The preamble sequence in NB-IoT is a deterministic channel hopping pattern pre-


determined by the initial sub-carrier and the QPSK-symbol 1 + 0j is continuously
repeated. The encoding seeks to find a different channel hopping pattern and symbol
sequence which enables the decoder to provide a more accurate estimate of ToA, CFO
and channel coefficient.
The function f (·) is used similarly as in Equation 3.9 where the transmitted sequence
is:
y = f (x, f , s) = h e−2π(f +∆f )(nTsym −τ ) s (5.3)
Where: f is a vector containing frequency hopping pattern [Hz]
s is a vector containing symbol sequence [·]
is component-wise multiplication [·]

The frequency pattern f and symbol sequence s are fixed sequences that are valid for all
x and are only adjusted during the training procedure to “discover” a superior preamble
sequence. Once a well-suited preamble sequence is found, the preamble sequence should
remain fixed for all future transmissions. f and s are implemented as learnable parameters
that are updated according to their gradients with a learning rate of γ = 1 × 10−4 .

5.2 DECODING

Decoding refers to detection and synchronization parameter estimation. In practice


decoding happens at the BS where it is unknown whether a transmission has occurred
or not. In this set-up the encoder and decoder are a joint operation and therefore
coordinated such that the decoding each transmission is always decoded.
The corrupted received signal ỹ is the input to the decoder function g(·)

g(ỹ) = x̂. (5.4)

The decoder is a neural network and the same pre-trained network from section 3.5
can be used as initialization. However, for the sake of simplicity to demonstrate the
concept a simple feedforward neural network is used and the input signal consists only
of 1 user. Figure 5.2 shows the development of the frequency pattern which is found
throughout the training process. In Figure 5.2a an example of the NB-IoT NPRACH
frequency pattern is illustrated and the encoding function is initialized with this pattern.
In Figure 5.2b is the frequency pattern found after 100 iterations. It is seen that the
frequency pattern does not converge towards an orderly sequence and is not restricted
to the 12 frequency channels defined by the NB-IoT standard. This is a limitation in
5.2. Decoding 29

the implementation which requires learnable parameters to be differentiable. In this


implementation the parameters are continuous and further development should address
how to make frequency and symbol pattern adhere to the pre-defined discrete subcarriers
and symbols.
Note that the current NPRACH sequence is designed to minimize the Peak-to-Average
Power Ratio (PAPR) compared to Zadoff–Chu sequence used in LTE. When tweaking the
preamble sequence the NPRACH PAPR will become worse which is a trade-off between
performance and power. When using the auto-encoder the best estimation performance
is 4.38 µs and 6.00 Hz at 10 dB after 100 000 iterations. This is not improved compared
to the NN estimator using the NPRACH pattern which is the performance goal. Further
development is necessary since it is believed the method still has basis for achieving
superior performance.
30 5. Auto-encoder

Frequency pattern
40000
35000
30000
Frequency [Hz]

25000
20000
15000
10000
5000
0 20 40 60 80
Symbol index
(a) One example of NB-IoT frequency pattern

Frequency pattern

40000

30000
Frequency [Hz]

20000

10000

0
0 20 40 60 80
Symbol index
(b) Learned frequency pattern using auto-encoder initialized with NB-IoT frequency hopping pattern

Frequency pattern
100000

80000
Frequency [Hz]

60000

40000

20000

20000
0 20 40 60 80
Symbol index

(c) Learned frequency pattern using auto-encoder after 100 000 training iterations

Figure 5.2: Development of frequency pattern throughout auto-encoder training


6 DISCUSSION AND
CONCLUSION

6.1 FUTURE IMPROVEMENTS

In an actual implementation much attention should be made to ensure that the data
used to train the model is representative. The performance guarantees that can be
provided using machine learning are only numerical using the available data. Either
real-world data or realistic models of dynamics should be included in the model.
The deep neural network model should be able to estimate ToA and CFO in real time.
Using a Software-Defined Radio (SDR) wideband signals can be captured and processed
in real-time. AIR-T is an SDR specifically designed for deep learning deployment that
combines an AD9371 transceiver with an FPGA for signal processing and a GPU for
deep learning [31].

6.2 THE PERFORMANCE OF DL IN NB-IOT

The goal of this project is to find a method to increase system capacity. The increase in
number of supported users is therefore investigated. The number of concurrent users
each cell supports will be limited by the number of allocated channels and user traffic
pattern. In NB-IoT there are 48 available channels for NPRACH for each cell. This
gives an average traffic intensity of 13 erlangs according to the Erlang B loss system at a
required probability of blocking less than 1% [32].
It is assumed that each users has an average holding time of 50 ms corresponding to
the periodicity of the normal NPRACH transmission. The traffic pattern is assumed to
consist of independent users transmitting once every 10th second which gives a channel
usage per user:

360 Transmission/h · 50 ms = 5.0 × 10−3 erlangs (6.1)

Since the system can support 13 erlangs in total the maximum number of users are:

13 erlangs
= 2600 users (6.2)
5.0 × 10−3 erlangs

Using a simple model for user activity and only accounting for the normal NPRACH
configuration a total of 2600 users can be supported simultaneously if NPRACH trans-
mission are rejected in the case of collisions.
When using the proposed estimation procedure not only will synchronization be more
accurate but simultaneous preamble transmissions can be detected which improves
the access probability. 2 user transmissions are detected correctly 93 % of the time
which effectively increases the available number of random access preambles by 93
%. This increase of successfully transmitted preambles, can be used to allocate fewer
NPRACH resources, reduce access delay since collision induced back-offs will be reduced
or increase the number of supported users to 5018. However, this project does not
consider limitations of NPUSCH resources which may prove to be a bottleneck for the
RA process.

31
32 6. Discussion and conclusion

The intuition behind being able to detect and decode identical simultaneous preamble
transmissions is to exploit diversity in user locations, specific channel conditions and
oscillator imperfections to realize multiplexing.
When the base station detects multiple preambles, users are not separable by any ID
and the RAR (MSG2) cannot be specifically addressed to each user. A procedure to
associate each response to each user will be to have individual users estimate distance
from the BS using the reference signal. The estimated distance is used by the user to
approximate a scope of TA which it can use to select the RAR with the closest matching
TA.

6.3 CONCLUSION

This project considers the problem of separating colliding NB-IoT users that choose the
same random access preamble in the NPRACH scheme, and propose a method to detect
the number of colliding users and estimate their ToA, CFO and channel gains. Motivated
by recent success in leveraging learning-based methods for addressing problems related
to physical layer communications [12], the proposed method builds upon deep learning
framework. In particular, the method jointly detects the number of active users and
estimates their parameters, with the aim to improve the capacity of the critical random
access phase by not discarding interfering signals in order to utilize channel resources
better, which in turn reduces back-off periods. In addition to handling much richer class
of scenarios, the proposed method outperforms [9] in their own scenario where users
transmit orthogonal preambles and do not collide.
The method is demonstrated in NB-IoT NPRACH where the number of orthogonal
preambles is limited ensuring the proposed method is practical in IoT systems currently
being deployed. The estimation error of a conventional approach in NB-IoT is compared
to the performance of the proposed scheme. Traditional synchronization methods fail
in the case of collisions with high Signal-to-Interference Ratio (SIR) whereas, with the
proposed algorithm users can be distinguished and respective synchronization parameters
can still be estimated with a reasonable performance.
Deep learning is a promising tool for developing joint estimation procedures, which
are notoriously difficult in traditional model-based methods, and enables separation
of synchronization parameters even when users transmit using the same preamble. A
deep learning building block the denoising auto-encoder, is applied in a novel concept
to discover an alternative superior preamble sequence. The found preamble sequence
reflects the distribution of the input data but further works is required to to achieve
an increase in performance compared to the neural network estimator. Although deep
learning-based estimation will lead to sub-optimal estimators compared to an analytically
derived joint estimator, it allows for practical, straightforward development and efficient
computation.
BIBLIOGRAPHY

[1] P. Popovski, K. F. Trillingsgaard, O. Simeone, and G. Durisi, “5g wireless network


slicing for embb, urllc, and mmtc: A communication-theoretic view,” IEEE Access,
vol. 6, pp. 55 765–55 779, 2018.
[2] M. Centenaro, L. Vangelista, A. Zanella, and M. Zorzi, “Long-range communications
in unlicensed bands: the rising stars in the iot and smart city scenarios,” IEEE
Wireless Communications, vol. 23, no. 5, pp. 60–67, October 2016.
[3] A. Biral, M. Centenaro, A. Zanella, L. Vangelista, and M. Zorzi, “The challenges of
m2m massive access in wireless cellular networks,” Digital Communications and
Networks, vol. 1, no. 1, pp. 1–19, 2015.
[4] I. Catherine, R. Tardy, N. Aakvaag, B. Myhre, R. Bahr, I. Catherine, R. Tardy,
N. Aakvaag, B. Myhre, and R. Bahr, “Comparison of wireless techniques applied
to environmental sensor monitoring,” SINTEF Digital, Trondheim Norway, Tech.
Rep., 2017.
[5] A. Azari, P. Popovski, G. Miao, and C. Stefanovic, “Grant-free radio access for
short-packet communications over 5G networks,” in GLOBECOM 2017 - 2017
IEEE Global Communications Conference, Dec 2017, pp. 1–7.
[6] X. Lin, A. Adhikary, and Y. P. Eric Wang, “Random access preamble design and
detection for 3GPP narrowband IoT systems,” IEEE Wireless Communications
Letters, vol. 5, no. 6, pp. 640–643, 2016.
[7] L. Feltrin, G. Tsoukaneri, M. Condoluci, C. Buratti, T. Mahmoodi, M. Dohler,
and R. Verdone, “Narrowband IoT: A survey on downlink and uplink perspectives,”
IEEE Wireless Communications, vol. 26, no. 1, pp. 78–86, February 2019.
[8] 3GPP TS, 36.321 - Evolved Universal Terrestrial Radio Access (E-UTRA); Medium
Access Control (MAC) protocol specification, V. 15.4.0, 2018.
[9] J. Hwang, C. Li, and C. Ma, “Efficient detection and synchronization of super-
imposed NB-IoT NPRACH preambles,” IEEE Internet of Things Journal, vol. 6,
no. 1, pp. 1173–1182, Feb 2019.
[10] A. Molisch, Wireless Communications, ser. Wiley - IEEE. Wiley, 2010.
[11] A. Goldsmith and K. (Firm), Wireless Communications. Cambridge University
Press, 2005.
[12] O. Simeone, “A Very Brief Introduction to Machine Learning With Applications to
Communication Systems,” arXiv e-prints, p. arXiv:1808.02342, Aug 2018.
[13] W. S. Jeon, S. B. Seo, and D. G. Jeong, “Effective frequency hopping pattern for
ToA Estimation in NB-IoT Random Access,” IEEE Transactions on Vehicular
Technology, vol. 67, no. 10, pp. 10 150–10 154, 2018.
[14] Y. . E. Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blankenship, J. Bergman,
and H. S. Razaghi, “A primer on 3GPP narrowband Internet of things,” IEEE
Communications Magazine, vol. 55, no. 3, pp. 117–123, March 2017.

33
34 Bibliography

[15] 3GPP TS, 36.211 - Evolved Universal Terrestrial Radio Access (E-UTRA); Physical
channels and modulation, V. 15.4.0, 2018.

[16] T. Hien, Z. Wang, S. Kim, J. Nielsen, and P. Popovski, “Preamble detection in


NB-IoT random access with limited-capacity backhaul,” in Proceedings of IEEE
ICC’19, 2 2019.

[17] O. Liberg, M. Sundberg, E. Wang, J. Bergman, and J. Sachs, Cellular Internet of


Things: Technologies, Standards, and Performance. Elsevier Science, 2017.

[18] 3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layer
procedures,” 3rd Generation Partnership Project (3GPP), Technical Specification
(TS) 36.213, 03 2019, version 15.4.0.

[19] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

[20] T. Diamandis, “Survey on Deep Learning Techniques for Wireless Communications,”


Stanford.Edu, vol. 521, no. July, pp. 3–4, 2017.

[21] J. Chien, Source Separation and Machine Learning. Elsevier Science, 2018.

[22] C. Bishop, Pattern Recognition and Machine Learning, ser. Information Science
and Statistics. Springer New York, 2016.

[23] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional


neural networks applied to visual document analysis,” in Seventh International
Conference on Document Analysis and Recognition, 2003. Proceedings., Aug 2003,
pp. 958–963.

[24] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with Gumbel-softmax,”


arXiv e-prints, p. arXiv:1611.01144, Nov 2016.

[25] J. Jagannath, N. Polosky, A. Jagannath, F. Restuccia, and T. Melodia, “Machine


Learning for Wireless Communications in the Internet of Things: A Comprehensive
Survey,” arXiv e-prints, p. arXiv:1901.07947, Jan 2019.

[26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmai-


son, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS-W,
2017.

[27] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural
Networks: Tricks of the Trade. Springer-Verlag, 1998.

[28] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv
e-prints, p. arXiv:1412.6980, Dec 2014.

[29] T. O’Shea and J. Hoydis, “An Introduction to Deep Learning for the Physical
Layer,” IEEE Transactions on Cognitive Communications and Networking, vol. 3,
no. 4, pp. 563–575, 2017.

[30] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learning at physical
layer without channel models,” IEEE Communications Letters, 2018.
Bibliography 35

[31] I. Deepwave Digital, “Deepwave digital,” Website, Mar. 2019. [Online]. Available:
https://www.deepwavedigital.com/sdr

[32] R. Freeman, Fundamentals of Telecommunications, ser. Wiley Series in Telecommu-


nications and Signal Processing. Wiley, 2005.

[33] A. van den Bos, Parameter Estimation for Scientists and Engineers. Wiley, 2007.

[34] A. N. D’Andrea, U. Mengali, and R. Reggiannini, “The modified cramer-rao bound


and its application to synchronization problems,” IEEE Transactions on Communi-
cations, vol. 42, no. 234, pp. 1391–1399, February 1994.
Appendices

37
A) C R A M É R – R A O L O W E R
BOUND
The achievable precision of an unbiased estimators may be given by the lower bound on
the variance of the estimation, called the Cramér–Rao lower Bound (CRB) [33]. The
CRB is used as reference to compare the performance of the traditional estimator and
the deep learning estimator in chapter 3.
The observed vector r depends on the synchronization parameters: ∆f the carrier
frequency offset, τ the ToA and the channel carrier phase θ which are desired to be
estimated. For practical reasons the challenging case of a joint estimation of τ, ∆f, θ
is overlooked and the CRB can be derived for an estimator of a single parameter and
treating the other parameters as unwanted parameters. In the general case a single
element of ∆f, τ, θ, is denoted λ and is assumed deterministic while other elements are
random variables collected in a vector u. The vector u is assumed to have a known PDF
that does not depend on λ.
The CRB is formulated as:
1
CRB(λ) = " 2 # (A.1)
δ ln p(r|λ)
Er δλ

Equation A.1 is often difficult to evaluate and therefore a different bound, the Modified
Cramér–Rao Bound (MCRB) is considered instead [34]:
1
M CRB(λ) = " 2 # (A.2)
δ ln p(r|u,λ)
Er δλ

Generally M CRB(λ) ≤ CRB(λ) meaning it is a more loose bound. However, for most
practical applications is still useful [34].
For the Gaussian channel it is much easier to derive the conditional probability p(r|u, λ)
than evaluating p(r|λ).
The received complex signal waveform with additive noise can be written:

r(t) = s(t) + w(t) (A.3)

where s(t) is the information signal as described in Equation 2.1 and w(t) is additive
noise distributed according to a complex normal distribution.
p(r|u, λ) is replaced by the likelihood function Λ(λ, u) and after some manipulation
Equation A.2 becomes [34]:

N0
M CRB(λ) = (A.4)
δs(t) 2
 
Z
Eu  dt


T0 δλ

where N0 is the noise variance.


Going from the general case of estimating λ to the specialised case of estimating CFO,
∆f , by setting λ = ∆f and u = u∆f = {τ, θ}. The integral in the denominator can be
evaluated to:

39
40 A. Cramér–Rao lower bound

δs(t) 2 Z −j2π(f +∆f )(t−τ ) 2



δe
Z

dt = dt (A.5)


T0 δ∆f T0 δv


Z
= 4π 2 |t − τ |2 dt (A.6)
T0

According to Equation A.4 the expectation should be calculated over u∆f that is over
both θ and τ . However, Equation A.5 does not depend on θ and the limitations is only
calculated over τ .
"Z # "Z #
2 2 2 2
Eτ 4π (t − τ ) dt = 4π Eτ (t − τ ) dt (A.7)
T0 T0
"Z #
NT
2 2
= 4π Eτ (t − τ ) dt (A.8)
0

where N T is the length of T0 . A simple case can be assumed where τ ∼ U(a, b) and the
expectation can be evaluated to:

(N T )3
" #
2
= 4π Eτ − τ (N T )2 + τ 2 (N T ) (A.9)
3
2
= π 2 (2a2 (N T ) + 2ab(N T ) − 3a(N T )2 + 2b2 (N T ) − 3b(N T )2 + 2(N T )3 ) (A.10)
3

setting limits a = 0 and b = N T we get:

2π 2 (N T )3
= (A.11)
3
Finally, to summarize we calculated the denominator of the MCRB in Equation A.4:

δs(t) 2
 
2π 2 (N T )3
Z
Eu∆f dt = (A.12)


T0 δ∆f 3

The SNR, Es /N0 is normalized according to symbol period T , and is therefore omitted.
From Equation A.4 we have:

3N0
M CRB(∆f ) = (A.13)
2π 2 N 3
This states that the a lower bound on estimation error is a simple analytical expression
which is a function of (normalized) SNR and number of samples. The MCRB can be
similarly derived for M CRB(τ ) and M CRB(θ) but the derivation is not included here.
The expression is used to provide perspective on performance in Table A.1.
The derived expression uses a simple model that does not account for the channel
coefficient as used in the model in Equation 2.2. The frequency is normalized to the
interval [−1/2, 1/2] and the expression is used to compare performance between the
PD-based (section 2.4) and the NN estimator (section 3.2). The number of samples is
chosen to N = 96 samples and the SNR is 10 dB .
41

Table A.1: Performance comparison with MCRB, traditional estimator and NN estimator.

MCRB Error variance of tradi- Error variance of NN esti-


tional estimator mator
1.7 × 10−8 0.78 2.8 × 10−2

It is seen that both estimator are far from the achievable performance bound.
1

Literature Study on Deep Learning for the Wireless


Physical Layer
Mads Helge Jespersen

I. OVERVIEW can be hard to account for in an algorithm and maybe a ML


algorithm can perform better.
This is not a comprehensive review of state of the art on
Only a few papers investigate the applicability of machine
deep learning for the physical layer but this is the areas that
learning in massive connectivity [3], for this reason there
deep learning has been applied to the physical layer, examples
is a high probability of publication using this is as project
of successful implementations and ideas for research areas to
motivation. MERL work has focused on a CDMA decoding
focus on.
in presence of a structured jammer with deep learning with
The overview paper [1] presents the progress in the field
initial good results in cases where a matched filter method
well and provides advice on when to use ML with a set of
does not work [4].
criteria. Most importantly the model at hand should be either
There are few ML applications in the emerging LPWAN
model deficit or algorithm deficit.
technologies such as LoRa, SigFox and NB-IoT.
a) Model deficit: No mathematical model, conventional
3) Apply the latest development within Machine Learning
design methods not applicable. Statistical performance guar-
to a Wireless Problem: Adversarial training is a very popular
antees cannot be provided. The algorithm can only be relied
method training robust neural networks. The same concept can
upon so far data used is trusted to be representative for the
be used to create a robust jamming resistant communication.
whole set of possible realizations.
Adversarial neural networks could also be used for encryption
b) Algorithm deficit: A mathematical model is available and decryption in relay networks.
but algorithms are either too complex to derive or too com- Two recent trends in deep learning are attention networks
putationally complex to be implemented. Neural networks has and Generative Adversarial Networks (GANs).
possibility to yield lower-complexity solutions. A computer
simulation can be carried out to obtain numerical performance
guarantees. II. C HANNEL D ETECTION AND D ECODING
Design of a channel decoder based on samples is only
possible if the channel is stationary over a long period of time,
A. Possible directions
meaning it does not change too rapidly over time (fast fading).
1) End-to-end Constellation Design and Detection: Moti- Could be made possible for fast fading channels of channel
vation can be decoding in non-coherent scenario (algorithm estimation is part of the learning process [1].
deficit) or transmission through a non-linear medium or non- Model deficit: Molecular communication [5] Algorithm
linear transceiver chain (algorithm deficit). Research has con- deficit: Strong non-linearities: Satellite communication, op-
ducted to derive maximum likelihood decoders from simu- tical communications, modulation schemes such as continuous
lation samples in non-coherent MIMO communication (Un- phase modulation or in multi-user networks [1].
known CSI) and in a two-way relay network. Most simulations
assume perfect frequency synchronization between transmitter
and receiver but vary the channel coefficient between symbols. A. Paper [5]
Commonly methods are restricted to only training for one strict 1) Purpose: Detection algorithm in the unexplored molec-
channel scenario and will probably perform poorly in different ular communication channel.
channel conditions. 2) Challenges: It is a realistic scenario with strong ISI
2) Machine Learning for Emerging Communication Tech- and the channel is not memoryless which requires sequence
nologies: The research area to focus on can be identified by detection.
finding an application for ML in one of following emerg- 3) Methods: A molecular communication experimental
ing technologies: mmWave, Massive Connectivity, Massive platform is established to generate an adequate dataset by
MIMO. repeatedly transmitting a consecutive sequence of N symbols
Current mmWave front end design suffers from being very from M possible types. Chemical signals, acids (representing
expensive, bad efficiency and non-linearity [2]. Non-linearity bit-0), and bases (representing bit-1) are used to encode pH
of amplifiers could be accounted for by pre-distortion but a level information.
neural network receiver might also give better performance. The detector is implemented using LSTM network, a typical
Expensive receiver chain can be alleviated by using low- algorithm for sequence processing belonging to RNN trained
resolution ADCs and phase shifter (1 or 2 bit). Effect of these using the acquired experimental samples.
2

4) Results: LSTM based detector shows outstanding per- 3) Results: Performs comparably to traditional decoding
formance in a communication system with ISI however, OFDM implementation allows symbols to be de-
5) Conclusion: A sequence based estimator should be used coded when the CP is removed. Related works: Equalization
in case of ISI. DL is promising when there is no model for and synchronization in OFDM.
the physical channel.
E. Idea
B. Paper: [6] Inspired by OFDM detection papers as [8], an mmWave
1) Purpose: Deep neural network for channel decoding model-driven receiver can be developed that uses expert
(NND). knowledge to replace receiver blocks. NN can learn to com-
2) Challenges: Should learn a decoding structure rather pensate for non-linearities of mmWave components.
than learning to classify 2k different codewords. Block lengths
are normally long but complexity of training increases expo-
F. Idea
nentially with k.
3) Methods: Compares neural network decoding perfor- Most neural network detection methods are only created
mance on polar block codes and random codes. for a stationary channel. Work can be made to extend some of
4) Results: NND for polar codes show possibility for the already developed ML receiver detection methods to many
generalization but not for random codes. The BER for a large different channel conditions.
NND is worse than MAP decoder for both random codes and
polar codes. Neural network is able to generalize from a certain III. E ND - TO - END C OMMUNICATION S YSTEM DESIGN
fixed SNR at training to any arbitrary SNR.
5) Conclusion: High decoding complexity and only A. Paper [9]:
achieves MAP for very short block lengths. But could possibly Rethinking the communication system which is currently
be improved with RNN and parallelized computation. Deep optimized block by block for performance in relation to a
learning for channel decoding does not seem too promising. model [10] proposes an end-to-end autoencoder to derive
modulation and decoding schemes which shows performance
comparative to traditional communication systems. The chan-
C. Paper [7]
nel is treated as a layer in the neural network and therefore
1) Purpose: Considers a MIMO channel with known chan- its differentiable functional form is needed. A functional form
nel matrix H. Goal is to apply deep machine learning in the will always be a simplification which does not factor in all
classical MIMO detection. Can be used as an example where impairments of a real system such as hardware imperfections,
deep learning can be used to trade off some of the exactness of varying channel conditions. Since the applications of deep
an existing algorithm provides with faster computation times. learning in the physical layer shows great promise we need
2) Challenges: Maximum likelihood already has really a model which can robustly account for these situations [9]
good performance but high computational complexity. A sub- 1) Purpose: To make an end-to-end communication ap-
optimal implementation is desired. proach which does not need a functional description of the
3) Methods: Unfolding an iterative projected gradient de- channel.
scent method. Each iteration is represented as a layer in a 2) Challenges: Needs channel gradients in order to perform
neural network. backpropagation (optimization).
4) Results: Tested against a fixed channel model and a 3) Related work: Previous work have circumvented a func-
varying channel model with H drawn from a known distribu- tional model using a two-phase training alternative. One ap-
tion. Perforns promising both in fixed channels and a varying proach has trained on a functional model and fine-tuned neural
channel scenario. Performance is comparable to the advanced network parameters using a realistic channel [11]. Realizations
detector ”semidefinite relaxation (SDR)” but computation is of a realistic channel has been approximated using a GAN.
30x faster. Another approach applied supervised learning at the receiver
and reinforcement learning at the transmitter.
4) Methods: Use a stochastic approximation technique to
D. Paper: [8] approximate gradients for the model called Simultaneous
1) Purpose: Channel estimation and signal detection use perturbation stochastic approximation that does not require
traditional communication solutions as initialization and uses knowledge of exact channel model.
DL networks to refine the coarse inputs. 5) Results: Achieves the theoretical BER for an AWGN
2) Methods: Calls implementation ComNet. CE subnet first channel without any assumption about the channel model but
estimates OFDM channel from pilot symbols using LS. SD takes more epochs to converge compared to the case where
obtains ZF (zero forcing) estimate of the transmitted symbol. channel model is known.
The obtained estimate, along with the estimated channel and 6) Conclusion: Successful end-to-end design is created
received signal to the DL model to further refine the symbol when channel model is not a available or gradient calculation
estimates. is too complex.
3

B. Paper [11] E. Implementation Practices


1) Purpose: Communication system solely composed of Paper [12] by Toshiki, Toshiaki and Ye (MERL) will be
NNs using unsynchronized SDRs. used for inspiration for practical considerations.
2) Challenges: Does not know functional channel. Training Their DNN fails to learn multiplication between received
data is obtained from over-the-air transmissions. Needs to deal signal and channel coefficients. Therefore the original input
with a continuous transmission with ISI and synchronization and the multiplication is used as input to the DNN. This
issues. increase of input dimensions may delay the convergence of
3) Methods: Two phase procedure. First: train the autoen- DNN.
coder using a stochastic channel model that should approx- Mini-batch size: 128. 128x1000 pseudo random bits with the
imate as closely as possible the behavior of the expected same channel coefficients over a mini-batch while Gaussian
channel. Second: transmitter sends a large number of messages noise varies from sample to sample.
over the actual channel and the corresponding IQ-samples are BER may not be appropriate performance measure for
recorded at the receiver. These samples, together with the demodulation performance since practical systems use soft
corresponding message indices, are then used as a labeled data decision error-correction.
set for supervised finetuning of the receiver. Can be seen as Trained DNN at every 5 dB, can be regarded as adaptive
an way to speed-up the training. selection of mapping depending on the channel SNR.
4) Results: Performance comes close to a well designed Almost all papers I have read use Adam for stochastic
conventional system. (What is the purpose if it is worse?) optimization in DNN.
5) Conclusion: First prototype of its kind. Does not work Sees good performance attributed to the DNN flexibly
in varying channel conditions. Could be achieved by sporadic controlling the transmission rate depending on the channel
transmission of known messages or a very robust error- condition at low SNRs. But is bad performance is expected at
correcting code that would allow gathering a fine-tuning high SNRs since amplify and forward is maximum likelihood
dataset on the fly. decision and DNN only approximates the ML decision with a
non-linear function.
C. Idea Assumes perfect synchronization and stable channel condi-
1) Purpose: Find actual channel measurements to use as tions. Only goal is to choose the constellation points.
channel impairments: h(z). Set up conventional transmitter- Amplitude and phase information in signal constellation can
receiver and compare performance to neural network receiver. be useless for unknown CSI.
2) Challenges: Channel model should represent all effects Complex inputs are almost always represented as a one heat
such as synchronization, fast fading, slow fading, additive vector which is a concatenation of the real part and imaginary
noise. part.
3) Methods: Setup random symbol transmissions. Capture
high resolution raw (unsynchronized, unequalized) signal (us- IV. F ULL - DUPLEX C OMMUNICATION
ing SDR or spectrum analyzer) along with actual transmitted
symbol (training data)). Learn the representation that maps the A. Paper [13]
received signal back to symbols. Will be difficult because of 1) Purpose: Learn to cancel self-interference for a full-
ISI and varying channel conditions. Need a fixed block length duplex link in order to overcome the model deficit in the non-
as input. linear transmitter-receiver chain.
Should account for many varying channel scenarios but this 2) Challenges: Self interference is present at the receiver
can turn out to be difficult. Areas of interest could be to create after the analog cancellation stage. Typically linear cancella-
a clustering of channel conditions for which a subset of end- tion is used but this is usually not sufficient due to non-linear
to-end networks are trained. effects created by various transceiver impairments. Polynomial
4) Extension: More channel realizations can be generated non-linear canceller comes with high implementation com-
using a GAN which should capture the effects of the real plexity.
channel. Does not need to map to discrete symbols. Could 3) Related work: The author is not aware of any SI
also be used for analogue transmissions since it will just be a cancellation in full-duplex radios in the literature using neural
regression problem. networks.
4) Methods: Using measured samples from a hardware
D. Idea testbed. IQ imbalance and PA non-linearities are normally
Transmission of analogue sequences is not really considered the dominant non-linearities. Better performance is found if
in literature. The continuous nature of a neural network a linear-cancellation method is first applied to the received
makes it straightforward to treat the communication prob- signal and then used as an input to the neural network.
lem as a high-dimensional regression problem. Continuous 5) Results: Matches the performance of the polynomial
phase modulation is notoriously difficult and could be used non-linear canceller with significantly lower computational
in combination with analogue sequence transmissions. The complexity.
application can be audio transmission, video transmission 6) Conclusion: Reduces computation by 36% compared to
(multi-cast streaming) or continuous sensor data transmission. polynomial non-linear canceller
4

V. CDMA/G RANT FREE RANDOM ACCESS /M ASSIVE [2] X. Yang, M. Matthaiou, J. Yang, C.-K. Wen, F. Gao, and
CONNECTIVITY S. Jin, “Hardware-Constrained Millimeter-Wave Systems for 5G:
Challenges, Opportunities, and Solutions,” IEEE Communications
Neural networks were employed in order to perform detec- Magazine, vol. 57, no. 1, pp. 44–50, 2019. [Online]. Available:
https://ieeexplore.ieee.org/document/8613274/
tion and intra-user (and mostly linear) successive interference [3] G. Gui, H. Huang, Y. Song, and H. Sari, “Deep learning for an effective
cancellation in multi-user CDMA systems in: nonorthogonal multiple access scheme,” IEEE Transactions on Vehicular
B. Aazhang, B. P. Paris, and G. C. Orsak, Neural networks for multiuser Technology, vol. 67, no. 9, pp. 8440–8450, Sep. 2018.
detection in code-division multiple-access communications, IEEE Trans. Com- [4] M. Pajovic, T. Koike-Akino, and P. V. Orlik, “Model-Driven Deep
mun., vol. 40, no. 7, pp. 1212-1222, Jul 1992 Learning Method for Jammer Suppression in Massive Connectivity
M.-H. Yang, J.-L. Chen, and P.-Y. Cheng, Successive interference cancel- Systems,” arXiv e-prints, p. arXiv:1903.06266, Mar 2019.
lation receiver with neural network compensation in the CDMA systems, in [5] N. Farsad and A. Goldsmith, “Detection Algorithms for Communication
Asilomar Conference on Signals, Systems and Computers, vol. 2, Oct 2000, Systems Using Deep Learning,” CoRR, vol. abs/1705.0, 2017. [Online].
pp. 1417-1420. Available: http://arxiv.org/abs/1705.08044
B. Geevarghese, J. Thomas, G. Ninan, and A. Francis, CDMA in- ter- [6] T. Gruber, S. Cammerer, J. Hoydis, and S. T. Brink, “On deep learning-
ference cancellation techniques using neural networks in rayleigh channels, based channel decoding,” 2017 51st Annual Conference on Information
in International Conference on Information Communication and Embedded Sciences and Systems, CISS 2017, pp. 1–6, 2017.
Systems (ICICES), Feb 2013, pp. 856-860. [7] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” IEEE
Workshop on Signal Processing Advances in Wireless Communications,
SPAWC, vol. 2017-July, pp. 1–5, 2017.
VI. M ODULATION CLASSIFICATION [8] X. Gao, S. Jin, C. K. Wen, and G. Y. Li, “ComNet: Combination of
Deep Learning and Expert Knowledge in OFDM Receivers,” pp. 1–11,
Algorithm deficit, complex problem, optimal solutions are 2018.
hard. Has been attempted many times with OK results. [9] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learn-
ing at physical layer without channel models,” IEEE Communications
Letters, 2018.
VII. U NSUPERVISED MACHINE LEARNING [10] T. O’Shea and J. Hoydis, “An Introduction to Deep Learning for the
Physical Layer,” IEEE Transactions on Cognitive Communications and
Step 1: Model selection. Select a model (family of distri- Networking, vol. 3, no. 4, pp. 563–575, 2017. [Online]. Available:
http://ieeexplore.ieee.org/document/8054694/
butions parameterized by a vector θ.) [11] S. Dörner, S. Cammerer, J. Hoydis, and S. Ten Brink, “Deep Learning
Step 2: Learning. Data should be used to choose the value Based Communication Over the Air,” Conference Record of 51st Asilo-
for the parameter vector θ. mar Conference on Signals, Systems and Computers, ACSSC 2017, vol.
2017-Octob, no. 1, pp. 1791–1795, 2018.
Step 3: Model is applied to carry out the task of interest. [12] T. Matsumine, T. Koike-akino, and Y. Wang, “Deep Learning-Based
e.g. Clustering, dimensionality reduction or generation of new Constellation Optimization for Physical Network Coding in Two-Way
samples. Relay Networks,” -, -.
[13] A. Balatsoukas-Stimming, “Non-Linear Digital Self-Interference Can-
cellation for In-Band Full-Duplex Radios Using Neural Networks,” in
IEEE Workshop on Signal Processing Advances in Wireless Communi-
A. Autoencoders cations, SPAWC, vol. 2018-June, 2018.
[14] T. J. O’Shea, T. Roy, and N. West, “Approximating the Void:
The transmitted input message x has an intermediate repre- Learning Stochastic Channel Models from Observation with Variational
sentation z which is the received signal and the output should Generative Adversarial Networks,” CoRR, vol. abs/1805.0, 2018.
match the input. ML should only be used if a model or an al- [Online]. Available: http://arxiv.org/abs/1805.06350
gorithm deficit exists. Algorithm deficit: Non-linear dynamical
models (optical links), multiple access channels with sparse
transmission codes and joint source channel coding.
End-to-end communication described in Section III-A has
examples of auto-encoder use.
Auto-encoders can also be used to compress Channel State
Information (CSI) for Frequency Division Duplex (FDD) links.

B. Generative models
1) Channel realizations [14]: Example: Learn to generate
samples from a given channel. Reasonable for scenarios that
lack straightforward channel models. Can be used to mimic
and identify non-linear channels for satellite communications.
Can be generally used to augment a dataset used for training.
2) Detecting anomalies by learning the typical distribution
of features: Can be used for spectrum sensing, identifying
covert transmissions.

R EFERENCES
[1] O. Simeone, “A Very Brief Introduction to Machine Learning With
Applications to Communication Systems,” IEEE Transactions on
Cognitive Communications and Networking, vol. 4, no. 4, pp. 648–664,
2018. [Online]. Available: http://arxiv.org/abs/1808.02342
Deep Learning for Synchronization and Channel
Estimation in NB-IoT Random Access Channel
Mads H. Jespersen∗ , Milutin Pajovic† , Toshiaki Koike-Akino† , Ye Wang† , Petar Popovski∗ , and Philip V. Orlik†
∗ Department of Electronic Systems, Aalborg University, Denmark
† Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA

Abstract—The central challenge in supporting massive IoT The timing offset comprises of propagation time, downlink
connectivity is the uncoordinated, random access by sporadically synchronization errors and channel delay spread [5].
active devices. The random access protocol and activity detection NB-IoT is a recent standard proposed by the 3rd Generation
have been widely studied, while the auxiliary procedures, such
as synchronization, channel estimation and equalization, have Partnership Project (3GPP) to accommodate the emerging
received much less attention. However, once the protocol is fixed, number of wireless devices connected to the Internet. It is
the access performance can only be improved by a more effective designed to co-exist with Long-Term Evolution (LTE) and
receiver, through more accurate execution of the auxiliary provide low-cost and low-power devices with low throughput
procedures. This motivates the pursuit of joint synchronization connectivity. The random access procedure in the NB-IoT
and channel estimation, rather than the traditional approach
of handling them separately. The prohibitive complexity of the is initiated by the Narrowband Physical Random Access
conventional analytical solutions leads us to employ the tools of CHannel (NPRACH). The NB-IoT has a system bandwidth
deep learning in this paper. Specifically, the proposed method of 180 kHz that accommodates 48 orthogonal channels from
is applied to the random access protocol of Narrowband IoT which a user attempting to establish a connection chooses one
(NB-IoT), preserving its standard preamble structure. We ob- at random. If NB-IoT users choose different (i.e., orthogonal)
tain excellent performance in estimating Time-of-Arrival (ToA),
Carrier-Frequency Offset (CFO), channel gain and collision preambles, the base station is able to estimate Time of Arrival
multiplicity from a received mixture of transmissions. The (ToA) and Carrier Frequency Offset (CFO) of each user [5],
proposed estimator achieves a ToA Root-Mean-Square Error [6]. However, given a possibly large number of users and
(RMSE) of 0.99 µs and a CFO RMSE of 1.61 Hz at 10 dB relatively small number of orthogonal preambles, it is likely
Signal-to-Noise Ratio (SNR), whereas a conventional estimator that two or more users choose the same preamble. The
using two cascaded stages have RMSEs of 15.85 µs and 8.05 Hz,
respectively. resulting collision may lead to a user back-off time of up
Index Terms—Deep learning, IoT standards, massive random to almost 9 minutes [7], [8]. In order to avoid unnecessary
access, joint estimation backoff periods and consequently improve channel utilization
and overall capacity of the NB-IoT system, we propose in
I. I NTRODUCTION this paper a Deep Learning (DL)-based method for separating
colliding users, detecting their number and estimating their
A massive number of devices are expected to be connected respective ToAs and CFOs. We validate the proposed method
to the Internet and several standards have been proposed to using simulations and demonstrate significantly improved
enable connectivity of low-complexity devices operating over performance compared to the conventional approaches.
a shared wireless channel. Most prominent technologies are
Sigfox, LoRa and Narrowband IoT (NB-IoT) [1]. In Internet A. Related Work
of Things (IoT) applications, the random access procedure Several papers have explored methods for activity detec-
has a high impact on device battery life and number of tion, ToA and CFO estimation using the NB-IoT NPRACH
devices that can be supported concurrently [2]. Random access preamble structure. As such, [5] estimates the ToA by search-
is used to request uplink allocation from the base station ing for highest correlation between the received signal and
without requiring users to be constantly connected to the delayed/frequency-shifted preamble on a grid of possible
base station. Most IoT data packets are on the order of bits delays and frequencies. To reduce the complexity of the
and users transmit them sporadically by establishing a new algorithm in [5], the ToA and CFO are estimated using the
connection for every transmission. Establishing a connection residual phase difference between symbol groups and channel
using random access is a four step procedure [3], [4], which hops in a two-stage procedure in [6]. With the goal to improve
is initiated by a user that has packet to transmit by sending the ToA estimation, [4] suggests a novel hopping pattern
a random access preamble. The random access preamble that renders more accurate ToA estimation compared to that
is designed such that the base station is able to efficiently achieved with the already defined NB-IoT preamble.
detect the transmitting user and estimate any timing offset We consider in this paper a problem of separating colliding
between the user and base-station from the received signal. NB-IoT users that choose the same random access preamble
in the NPRACH scheme, and propose a method to detect
The first author performed this work as an intern at MERL. the number of colliding users and estimate their ToA, CFO
Symbol group
for a full preamble length of L = 4 × 2J symbol groups.
The repetition of the symbol groups occurs within an uplink
266.7 µs ε=5 slot, and the number of repetitions is decided by the upper
3.75 KHz

CP Medium Access Control (MAC)-layer depending on estimated


link quality [10]. For simplicity we consider J = 2, i.e., four
m=0
symbol groups are repeated 4 times.
Before transmission, the user chooses a contiguous set of
12 sub-carriers Ω(m)

m=1
N = 12, 24, 36 or 48 subcarriers with 3.75 kHz spacing out
of the available 48 subcarriers. This paper focuses on the
m=3 preamble frame structure type 1 where N = 12. At the start
m=2
of the NPRACH preamble transmission, the subcarrier of the
first symbol group is chosen at random. After each symbol
L=4 group the subcarrier will change using a deterministic channel
hopping sequence so in the duration of a preamble there will
Uplink slot n = 1, ..., 128 be L subcarrier hops. Since the hopping pattern is determinis-
tic, several users choosing the same initial subcarrier will thus
Fig. 1. Overview of NPRACH preamble and packet structure. collide for the entirety of the NPRACH preamble sequence.
The number of orthogonal preamble sequences is therefore
and channel gains. Motivated by recent success in leveraging the number of allocated NPRACH subcarriers, K [7].
learning-based methods for addressing problems related to For frame structure type 1 and preamble format 0, two “lev-
physical layer communications [9], our method builds upon els” of hopping are employed as shown in Fig. 1. The hopping
deep learning framework. In particular, we jointly detect pattern is deterministic within a cell, but the subcarrier of
the number of active users and estimate their parameters, every 4th symbol group appears random to neighbouring cells.
with the aim to improve the capacity of the critical random The hopping procedure aids in the estimation of ToA and also
access phase by not discarding interfering signals in order to reduces inter- and intra-cell interference [5]. The ToA should
utilize channel resources better, which in turn reduces back-off be estimated by the base station for successful uplink signal
periods. In addition to handling much richer class of scenarios, decoding and it further enables device positioning. Error in
the proposed method outperforms [6] in their own scenario the ToA estimation results in the user being unable to receive
where users transmit orthogonal preambles and do not collide. the response sent by the base station. ToA estimation therefore
In comparison to [4], the random access preamble in this has a great impact on performance in NB-IoT [4].
work is as suggested by the NB-IoT standard, ensuring the III. S YSTEM M ODEL
proposed method is practical in the NB-IoT systems currently The received signal at the base station is a superposition of
being deployed. Finally, looking outside the NB-IoT scope, signals from multiple users, given by
we believe that this work is the first application of deep
K−1
X
learning techniques for user separation in massive connectivity
y[n] = ak sk [n] + w[n], (1)
systems.
k=0
II. NB-I OT R ANDOM ACCESS P REAMBLE D ESIGN where K is the maximum number of concurrent users, ak ∈
The preamble format and packet structure are illustrated in {0, 1} indicates whether the kth user is active or not, and
Fig. 1. The preamble is divided into symbol groups, where w[n] ∼ CN (0, 1/ρn ) denotes the additive noise with a per
each group consists of a Cyclic Prefix (CP) and ε identical symbol Signal-to-Noise Ratio (SNR) of ρn .
symbols. The value of ε depends on preamble format. The At the receiver, the phase of each symbol depends on the
preamble format is chosen by the user based on the downlink ToA τ , the CFO ∆f (which gives the frequency of the user’s
power measurement to estimate its coverage area [3]. chosen channel with respect to the receiver’s uplink carrier
The most common preamble format is format 1 with frequency f ), and the channel rotation given by arg(h), where
preamble frame structure 0 or 1, which has ε = 5 and a h is the complex-valued channel coefficient. These parameters
symbol time TSYM = 266.7 µs. The CP period for frame are assumed to be independent across users and denoted by
format 0 is TCP = 66.7 µs and TCP = 266.7 µs for frame τk , ∆fk and hk for each user k.
format 1 [10]. The CP is designed such that it is long enough The signal from the kth user is given by
to cover the maximum round trip delay to suppress Inter-
sk [n] = hk e−2π(fn +∆fk )(nTsym −τk ) , (2)
Symbol Interference (ISI). Therefore one interpretation of
allowing adaptive CP selection is for the user to use the short where Tsym is the symbol duration. The signal model is
CP in the range 0–8 km and the long CP in the range 8–35 km limited to only considering a single preamble sequence for the
[5]. sake of simplicity. This means that the sub-carrier frequency
The full preamble consists of 4 repetitions of the symbol pattern fn is predetermined and identical for all instances of
group which is again repeated n = 2J , J = 0, . . . , 7 times s.
channel coefficient h of each user. Since the activity indicator
of each user is a random variable, the total number of active
R users in the received signal is unknown. This boils down
r to a notoriously challenging problem of source separation
with unknown number of users [12]. Deep learning has
d significantly improved the field of source separation and the
general idea of using deep learning is to capture non-linear
relationship between inputs and corresponding targets that is
Fig. 2. Geometry of users distribution. often difficult to model with analytically tractable expressions
[12]. In this paper, estimating the unknown parameters is dealt
with by splitting the problem into:
The typical FFT length in LTE is 512 [6], but for simplicity • Classification of the number of active users; and
we describe that each sample, n, corresponds to a symbol. • Estimation of ToAs, CFOs and channel coefficients given
In this model, the contents of the CP are interpreted as a the number of users.
symbol and therefore no distinction is made between the CP
The two separate tasks are combined such that the synchro-
and the ε = 5 repeated symbols in a symbol group. This signal
nization parameters are accurately estimated for each detected
model may be valid only for the long CP which corresponds to
user.
distances between the user and base station within a minimum
of r = 8 km and a maximum of R = 35 km [5]. The users A. Estimation of the Number of Users
are assumed to be uniformly distributed in the coverage area Finding the number of active users, Na , is formulated
of the base station as illustrated in Fig. 2. The distance from as a classification problem where p = OneHot(Na ) is a
the base station to the users d has the following Probability categorical random variable encoded as a one-hot vector
Density Function (PDF) [11]: specifying Na . With a one-hot encoding, the true target
2d p = [p0 , p1 , . . . , pK ] has entry one at index Na , and zero
fD (d) = , r ≤ d ≤ R, (3) entries everywhere else. This is different from a typical way
R2 − r 2
of representing active users where users are ordered in a
which is used to model the ToA τ = dc , where c is the vector and each index indicates the activity of a unique user.
propagation speed. The number of users Na can then be estimated as the l0
The channel coefficient h of the signal model in (2) is norm of that sparse vector. In this collision scenario users are
a complex-valued constant which accounts for small scale transmitting using the same spreading sequence and are not
fading: h ∼ CN (0, 1). This means that the average received uniquely distinguishable. For this reason, only the information
signal power is normalized to one. The narrowband channel is on the number of active users is represented in p.
modeled as a slowly varying single-tap Rayleigh fading chan- Cross-entropy loss is typically used in classification prob-
nel and for this reason, modeled as a single coefficient [6]. lems [13], and [14] suggests that the cross-entropy loss in
Large scale fading is not included in the model since users classification problems leads to faster convergence and better
already have knowledge of the downlink SNR and adjust their generalization compared to the Mean Squared Error (MSE).
transmit power accordingly using power control. For nonbinary classification, we typically use softmax cross
The CFO in (2) is chosen uniformly at random between entropy loss (or negative log-likelihood) expressed as:
−20 and +20 Hz [6]. For the sake of simplicity, the CFO
K
X
and ToA are assumed to be constant throughout an entire
NPRACH transmission for each user. `NLL (p, q) = − pk log qk , (6)
k=0
The activity indicator ak is modeled as Bernoulli random
variable with the probability of transmitting p and a1 , . . . , aK where q is a continuous differentiable softmax function:
are iid. The number of concurrent active users is exp(πk )
qk = P , (7)
i exp(πi )
K
X
Na = ak ∼ B(K, p), (4)
where [π0 , π1 , . . . πK ] are the outputs from the last layer of the
k=1
neural network and [q0 , q1 , . . . qK ] represent the a posteriori
where B is the binomial distribution. We consider the case class probabilities. A hard class prediction could then be found
with K = 4 and p = 0.5 throughout the paper. The probability as arg maxi [πi ].
of exactly k users colliding is then:
  B. Parameter Estimation
K k 4!
Pr(k) = p (1−p)K−k = pk (1−p)4−k . (5) The parameters to be estimated are collected in a vector
k k!(4 − k)!  T
IV. D EEP L EARNING E STIMATOR xk = τk , ∆fk , <[hk ], =[hk ] . (8)
The goal of the estimator is to use the discrete signal y[n] Note that it was found that representing the complex-valued
to estimate the activity indicator a, ToA τ , CFO ∆f , and channel coefficient h by Cartesian coordinate (i.e., real and
1x1000 1x1000
1x2100
2@96x1 200@91x1 200@45x1 100@43x1 100@21x1 ReLu 1x16
ReLu

1x21

Concatenation
1x1000
1x200
1x5

ReLu
Batch-norm, Batch-norm,
Flatten ReLu
1-D Convolution ReLu, 1-D Convolution ReLu,
Max-Pool Max-Pool

Fig. 3. Overview of DNN architecture for estimating synchronization parameters of up to 4 colliding users.

imaginary parts) shows superior performance to phasor rep- periods. The total number of samples in the received signal
resentation (i.e, amplitude and phase) as seen in Fig. 7. For is: Nrep L( + 1) = 4 · 4 · (5 + 1) = 96, where the real and
K users, the respective vectors are collected in a matrix imaginary parts are represented in 2 individual channels.
X = [x0 , x1 , . . . , xK−1 ]. (9) The output of the network is the flattened matrix X and
the probability vector π. For 4 users there are 4 · 4 = 16
The neural network seeks to find an estimate X̂ such that parameters in X and 5 possible classes in the number of
EkX − X̂k22 is minimal which is equivalent to a Minimum users (including the zero users case). The input to the net-
Mean-Square Error (MMSE) estimator. work is processed so as to extract common features that are
The above formulation is sufficient to derive an estimation subsequently used for multi-task learning, that is, to detect the
procedure. However, X consists of multiple parameters which number of users and estimate their parameters. The first layer
have values on different scales. When using a practical opti- performs a 1-dimensional convolution over the input signal.
mization algorithm to find an estimate, any scaling difference Since the number of users, ToA, CFO and channel coefficient
between the parameters will affect the impact each value has all are assumed to be constant throughout a transmission, a
on the gradient descent step. convolution layer is chosen so as to extract translationally
To circumvent possible issues arising from error varia- invariant features of the input time-domain signal.
tions across parameters, we minimize the reconstruction error Following a typical CNN structure, batch normalization,
instead. The actual received signal without additive noise, non-linear activation and max-pooling are employed. The
s, with the parameters in matrix X can be reconstructed convolution layers, activations and pooling layers are repeated
using (2). The reconstruction is conveniently represented using to form a deep neural network. The features found by the con-
function f (·) such that s = f (X). volution layers are reshaped to a single vector which is then
For each estimate X̂, the equivalent noise-free signal ŝ is used as input to two individual feedforward neural networks.
reconstructed and compared to the actual noise-free received One of the networks performs classification and detects the
signal s. The noise-free signal is known during the training number of users based on the output of the feature extraction
procedure and is used so the output of the neural network layers. The other network performs regression with the goal
does not account for the distribution of the noise. The data to yield parameters so that the reconstructed signal is as close
fidelity (i.e., reconstruction loss) is quantified using the MSE as possible to the received signal in the MSE sense. Each
metric such that feedforward network has two fully connected layers followed
2 2 by the Rectified Linear Unit (ReLU) activation and a linear
`r (X, X̂) = E f (X) − f (X̂) 2 = E s − ŝ 2 . (10)
output layer. The network and automatic differentiation are
The number of concurrent users in each sample is known implemented using the PyTorch framework [15] and trained
during training so when reconstructing the signal ŝ, the using multiple Graphics Processing Units (GPUs).
contributions from the correct number of users are taken into In the simulation ToA, CFO and channel coefficient are
account when calculating the reconstruction loss `r for each all drawn according to the distributions given in the system
sample. model and Na is drawn according to Pr(k) for each sample.
The loss function which the neural network seeks to mini- The input to the network y and each parameter in the output
mize is simply the sum of (6) and (10) X is scaled to have zero mean and unit variance. In general
loss = `p (k, q) + `r (X, X̂). (11) the convergence of a neural network is faster if all inputs
to all layers have zero-mean and unit covariance between
C. Network Implementation training examples in the case when all examples are of equal
An overview of the neural network that estimates both the importance [16]. From the system model the variance and
number of users and synchronization parameters is illustrated mean of each parameter (CFO, ToA and h) are known and
in Fig. 3. The input to the network is the received signal which used to normalize the parameters to have mean zero and unit
consists of 4 NPRACH repetitions each with L( + 1) symbol variance. The mean and variance of τ can be derived from (3)
1.0
35 NN estimator, 1 user
Phase-difference based estimator, 1 user
0.8 NN estimator, 2 users
30 Phase-difference based estimator, 2 users

ToA estimation RMSE [us]


NN estimator, 3 users
Probability of detection

25 Phase-difference based estimator, 3 users


0.6
NN estimator, 4 users
NN estimator, 1 user 20 Phase-difference based estimator, 4 users
Amplitude-based estimator, 1 user
0.4 NN estimator, 2 users 15
Amplitude-based estimator, 2 users
NN estimator, 3 users
0.2 10
Amplitude-based estimator, 3 users
NN estimator, 4 users
Amplitude-based estimator, 4 users 5
0.0
4 6 8 10 12 14 16 18 0
SNR [dB] 4 6 8 10 12 14 16 18
SNR [dB]
Fig. 4. Accuracy of estimating the number of colliding users. A signal is
deemed correctly detected if the number of users are estimated correctly. The Fig. 5. Root-Mean-Square Error (RMSE) of ToA estimation across SNRs.
NN estimator is trained for signals with 10 dB SNR.

identified with 98.0 % and 93.2 % at an SNR of 10 dB and the


and the standardized ToA is given by estimation accuracy decreases with the number of concurrent
τ − E[τ ] users. The proposed method often miss-classifies a signal
τ0 = . (12) containing 4 colliding users as resulting from transmissions
Var(τ )
of 3 users.
The CFO, ∆f , is scaled similarly. No normalization is neces- Since the loss function only depends on the reconstruction
sary for the channel coefficients since h ∼ CN (0, 1) and thus error, the estimated parameters in X̂ are arbitrarily ordered
no scaling is necessary for the signal y. across users. To compare the output with the target X the
parameters are ordered according to the estimated amplitudes.
V. E STIMATION R ESULTS
In cases where the estimated amplitudes are similar, the
A. Traditional Methods ordering may be wrong which leads to an artificially high
The phase-difference based method proposed in [6] utilizes error when evaluating performance for multiple users.
the relationship between the phase trace of the received signal The RMSE of each parameter in X is calculated as:
and the ToA and CFO. Phase differences between symbols in q  
the received signal are averaged to estimate CFO. The ToA RMSEk = E kek k22 , (13)
is found by subtracting the phase due to the estimated CFO where e.g. the estimation error of τ is: ek = τk − τ̂k . The
from the phase of the received signal and averaging the phase RMSE of the proposed neural network-based estimator is the
difference between symbol groups on different frequencies. average of all RMSEs up to user k:
As a benchmark for the detection of the number of users, an
k
amplitude-based estimator is considered. The mean amplitude 1X
RMSENN,k = RMSEi . (14)
of the received signal for different number of colliding users is k i=1
compared to the amplitude of the received signal. The closest
match then yields an estimate of the number of colliding users The conventional estimator is only able to estimate a single
present in the received signal. set of parameters, regardless of the actual number of users k.
The error of the conventional estimator is therefore measured
B. Simulation as the estimate which has the smallest error over all actual
The neural network is trained using samples generated with sets of parameters in X, e.g. the estimated ToA error is
up to K = 4 concurrent users and at an SNR of 10 dB. New eτ,PD = min(|τk − τ̂P D |). (15)
batches are generated for every step in the training procedure. k

The learning rate is 0.0001 and each batch consists of 50,000 This gives the conventional estimator an artificial advantage.
realizations of y from (1). The stochastic optimization method The RMSE of ToA and CFO estimation with a varying
based on adaptive momentum (ADAM) [17] is used and a total number of users are shown in Figures 5 and 6. The neural
of 20,000 different batches are used in training. network-based estimator shows lower estimation error for both
In Fig. 4, the estimation of collision multiplicity is shown ToA and CFO compared to the phase-difference-based estima-
for the proposed classification method compared to a simple tor even for a single user. For two users the proposed estimator
amplitude-based method. As colliding signals will add non- is superior to the conventional estimator when estimating ToA.
coherently, the amplitude of the signal is not a good indicator At 10 dB the proposed estimator has an RMSE of 0.99 µs and
on collision multiplicity. 1 and 2 users are successfully 1.61 Hz for a single user compared to 15.85 µs and 8.05 Hz
30
error of a conventional approach in NB-IoT is compared to the
NN estimator, 1 user NN estimator, 3 users
PD based estimator, 1 user PD based estimator, 3 users performance of the proposed scheme. Traditional synchroniza-
25 NN estimator, 2 users NN estimator, 4 users tion methods fail in the case of collisions with high Signal-to-
PD based estimator, 2 users PD based estimator, 4 users Interference Ratio (SIR) whereas, with the proposed algorithm
CFO estimation RMSE [Hz]

20 users can be distinguished and respective synchronization pa-


rameters can still be estimated with a reasonable performance.
15 Deep learning is a promising tool for developing joint
estimation procedures, which are notoriously difficult in tra-
10
ditional model-based methods, and enables separation of syn-
5
chronization parameters even when users transmit using the
same preamble. Although deep learning-based estimation will
0 lead to sub-optimal estimators compared to an analytically
4 6 8 10 12
SNR [dB]
14 16 18 derived joint estimator, it allows for practical, straightforward
development and efficient computation.
Fig. 6. RMSE of CFOs estimation across SNRs. R EFERENCES
[1] I. Catherine, R. Tardy, N. Aakvaag, B. Myhre, R. Bahr, I. Catherine,
1.0 R. Tardy, N. Aakvaag, B. Myhre, and R. Bahr, “Comparison of wire-
NN estimator, 1 user NN estimator, 4 users less techniques applied to environmental sensor monitoring,” SINTEF
NN estimator, 2 users NN estimator using phasor representation, 1 user Digital, Trondheim Norway, Tech. Rep., 2017.
NN estimator, 3 users
Channel coefficient estimation RMSE

0.8 [2] A. Azari, P. Popovski, G. Miao, and C. Stefanovic, “Grant-free radio


access for short-packet communications over 5G networks,” in GLOBE-
COM 2017 - 2017 IEEE Global Communications Conference, Dec
0.6 2017, pp. 1–7.
[3] Y. . E. Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blankenship,
J. Bergman, and H. S. Razaghi, “A primer on 3GPP narrowband Internet
0.4 of things,” IEEE Communications Magazine, vol. 55, no. 3, pp. 117–
123, March 2017.
[4] W. S. Jeon, S. B. Seo, and D. G. Jeong, “Effective frequency hopping
0.2 pattern for ToA Estimation in NB-IoT Random Access,” IEEE Trans-
actions on Vehicular Technology, vol. 67, no. 10, pp. 10 150–10 154,
2018.
0.0 [5] X. Lin, A. Adhikary, and Y. P. Eric Wang, “Random access preamble
4 6 8 10 12 14 16 18
SNR [dB] design and detection for 3GPP narrowband IoT systems,” IEEE Wireless
Communications Letters, vol. 5, no. 6, pp. 640–643, 2016.
[6] J. Hwang, C. Li, and C. Ma, “Efficient detection and synchronization of
Fig. 7. RMSE of channel coefficient estimation across SNRs. superimposed NB-IoT NPRACH preambles,” IEEE Internet of Things
Journal, vol. 6, no. 1, pp. 1173–1182, Feb 2019.
[7] L. Feltrin, G. Tsoukaneri, M. Condoluci, C. Buratti, T. Mahmoodi,
M. Dohler, and R. Verdone, “Narrowband IoT: A survey on downlink
for the conventional estimator. The relatively high RMSE of and uplink perspectives,” IEEE Wireless Communications, vol. 26, no. 1,
the conventional estimator is likely due to the noise which pp. 78–86, February 2019.
causes wrong phase unwrapping at low SNRs [6]. [8] 3GPP TS, 36.321 - Evolved Universal Terrestrial Radio Access (E-
UTRA); Medium Access Control (MAC) protocol specification, V. 15.4.0,
The accuracy in estimating the channel coefficient h is 2018.
shown in Fig. 7. The RMSE is 0.101 for the in-phase part and [9] O. Simeone, “A Very Brief Introduction to Machine Learning
0.103 for the quadrature part for a single user. The RMSE With Applications to Communication Systems,” arXiv e-prints, p.
arXiv:1808.02342, Aug 2018.
shows a similar trend as in ToA and CFO estimation with [10] 3GPP TS, 36.211 - Evolved Universal Terrestrial Radio Access (E-
deteriorating performance as the number of concurrent users UTRA); Physical channels and modulation, V. 15.4.0, 2018.
increases. [11] T. Hien, Z. Wang, S. Kim, J. Nielsen, and P. Popovski, “Preamble
detection in NB-IoT random access with limited-capacity backhaul,” in
Overall the proposed method presents considerably im- Proceedings of IEEE ICC’19, 2 2019.
proved performance compared to the traditional estimator in [12] J. Chien, Source Separation and Machine Learning. Elsevier Science,
2018.
scenarios with a single, as well as multiple users. [13] C. Bishop, Pattern Recognition and Machine Learning, ser. Information
Science and Statistics. Springer New York, 2016.
VI. D ISCUSSION AND C ONCLUSION [14] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolu-
tional neural networks applied to visual document analysis,” in Seventh
We proposed a novel approach to synchronization and chan- International Conference on Document Analysis and Recognition, 2003.
nel estimation. The system model consists of a superposition Proceedings., Aug 2003, pp. 958–963.
[15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
of an unknown number of users transmitting with the same A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in
preamble sequence. Deep learning is used to classify the PyTorch,” in NIPS-W, 2017.
multiplicity of collisions and estimate ToA, CFO and the [16] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,”
in Neural Networks: Tricks of the Trade. Springer-Verlag, 1998.
channel coefficients for all user simultaneously. [17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
The method is demonstrated in NB-IoT NPRACH where arXiv e-prints, p. arXiv:1412.6980, Dec 2014.
the number of orthogonal preambles is limited. The estimation

You might also like