0% found this document useful (0 votes)
35 views

Mobile Traffic Classification Through

This document proposes using deep learning algorithms to classify mobile network traffic based on physical control channel fingerprints without decoding or decrypting data flows. It collected two datasets from an LTE network - a labeled one to train classifiers and an unlabeled one to evaluate them. A convolutional neural network achieved 98% accuracy classifying traffic into services and applications based solely on information from the downlink control channel. This allows fine-grained traffic profiling and classification without needing access to encrypted user data.

Uploaded by

snilloc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Mobile Traffic Classification Through

This document proposes using deep learning algorithms to classify mobile network traffic based on physical control channel fingerprints without decoding or decrypting data flows. It collected two datasets from an LTE network - a labeled one to train classifiers and an unlabeled one to evaluate them. A convolutional neural network achieved 98% accuracy classifying traffic into services and applications based solely on information from the downlink control channel. This allows fine-grained traffic profiling and classification without needing access to encrypted user data.

Uploaded by

snilloc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Mobile Traffic Classification through Physical


Control Channel Fingerprinting: a Deep Learning
Approach
Hoang Duy Trinh∗ , Angel Fernandez Gambin† , Lorenza Giupponi∗ , Michele Rossi† and Paolo Dini∗

Abstract—The automatic classification of applications and algorithms consists in the identification, and in the subsequent
services is an invaluable feature for new generation mobile computation, of a number of representative features. These
networks. Here, we propose and validate algorithms to perform features are then used to train algorithms that classify the data
this task, at runtime, from the raw physical control channel of
an operative mobile network, without having to decode and/or flows at runtime. Most of the surveyed approaches leverage
decrypt the transmitted flows. Towards this, we decode Downlink some domain knowledge, which is utilized to manually obtain
Control Information (DCI) messages carried within the LTE the feature set, i.e., crafted by a skilled human expert. How-
Physical Downlink Control CHannel (PDCCH). DCI messages ever, the use of deep learning techniques has recently paved
are sent by the radio cell in clear text and, in this paper, are the way to automatic feature discovery and extraction, often
utilized to classify the applications and services executed at the
connected mobile terminals. Two datasets are collected through leading to superior performance. For example, in [4] encrypted
a large measurement campaign: one labeled, used to train the traffic is categorized through deep learning architectures, prov-
classification algorithms, and one unlabeled, collected from four ing their better performance with respect to shallow neural
radio cells in the metropolitan area of Barcelona, in Spain. network classifiers. The authors of [5] present a mobile traffic
Among other approaches, our Convolutional Neural Network super-resolution technique to infer narrowly localized traffic
(CNN) classifier provides the highest classification accuracy of
98%. The CNN classifier is then augmented with the capability consumption from coarse measurements: a deep-learning ar-
of rejecting sessions whose patterns do not conform to those chitecture combining Zipper Network (ZipNet) and Generative
learned during the training phase, and is subsequently utilized Adversarial neural Network (GAN) models is proposed to
to attain a fine grained decomposition of the traffic for the four accurately reconstruct spatio-temporal traffic dynamics from
monitored radio cells, in an online and unsupervised fashion. measurements taken at low resolution. In [6], the identification
Index Terms—Traffic Classification, Traffic Modeling, Mobile of mobile apps is carried out by automatically extracting
Networks, LTE, 5G, Machine Learning, Neural Networks, Deep features from labeled packets through Convolutional Neural
Learning, Data Analytics. Networks (CNNs), which are trained using raw Hypertext
I. I NTRODUCTION Transfer Protocol (HTTP) requests, achieving a high classi-
fication accuracy. We stress that the work in these papers,
W IRELESS mobile technology is advancing at a fast
pace, through better monitor resolutions, larger mem-
ories, higher communication speeds, a higher number of
as the majority of the other techniques discussed in Sec-
tion VI, use statistical features obtained from application or
Internet Protocol (IP) level information for both service and
connected devices, etc., and, with that, more requirements in
app identification, along with UDP/TCP port numbers.
terms of supported data rates [1], [2], new services, and a
The solution here presented sharply differs from previous
higher network responsiveness across diverse physical con-
approaches. In fact, it accurately classify mobile traffic from
texts [3]. As mobile systems become more complex, network
radio-link data by solely processing the information coming
operators attempt to transform their architecture through new
from the control channel, without requiring any prior knowl-
functionalities and procedures including security, reliability
edge and without having to decode and/or decrypt the transmit-
and enhanced service management. Traffic classification is
ted data flows. Specifically, we design and evaluate, via proof-
necessary in this context to prioritize and/or protect certain
of-concept implementations, non-intrusive tools for the online
flows, to prevent the injection of malicious data, and to allocate
estimation of Long Term Evolution (LTE) cellular activity,
the needed network resources to serve the traffic generated by
i.e., the type of traffic that users exchange with their serving
the end users.
base station. As we quantify, our technology allows one to
A large body of work exists in the area of mobile traffic
infer with high accuracy the service (e.g., audio-streaming,
classification (see Section VI for an in depth discussion of
video-streaming, video-conferencing) and the application (e.g.,
the related work). The key challenge of existing classification
Skype, Vimeo, You Tube, etc.) that are being used by the
∗ CTTC/CERCA, Av. Carl Friedrich Gauss, 7, 08860, Castelldefels, connected mobile users. This is accomplished by decoding
Barcelona, Spain {hoangduy.trinh, lorenza.giupponi, paolo.dini}@cttc.es, LTE Physical Downlink Control CHannel (PDCCH) messages
† DEI, University of Padova, Via G. Gradenigo, 6/B, 35131 Padova, Italy.
(i.e., radio-link level data), which are transmitted in clear text
{afgambin, rossi}@dei.unipd.it.
This work has received funding from the European Union Horizon 2020 without having to break any security protocol (encryption).
research and innovation programme under the Marie Sklodowska-Curie grant To do this, we leverage OWL [7], a tool that allows decoding
agreement No. 675891 (SCAVENGE), by the Spanish Government under the Downlink Control Information (DCI) messages carried in
project TEC2017-88373-R (5G-REFINE) and has been supported, in part,
by MIUR (Italian Ministry of Education, University and Research) through the LTE PDCCH, where control information is exchanged
the initiative “Departments of Excellence” (Law 232/2016). between the LTE Base Station (eNodeB) and the connected
2

User Equipments (UEs). From such messages, radio-link level reproduced by anyone interested in getting data. This kind
settings for the user communication, e.g., modulation and of raw data is hardly released by operators, especially with
coding scheme, transport block size, allocated resource blocks, the high level of granularity that we have been able to
etc., are obtained. retrieve. This approach allows any researcher without access
From DCI data, two datasets are created: to industrial data, to carry out research, by relying on real
1) a labeled dataset, used to train different service and app cellular data, in a key area like Artificial Intelligence (AI)-
classification algorithms, where labeling is made possible enabled Beyond 5G networks, which is also aligned with the
by injecting traffic generated by a mobile terminal under Next Generation Self-Organized Networks Next Generation
our control into the network; Self Organized Networks (NG-SON) vision promoted by,
2) an unlabeled dataset, used for traffic profiling purposes, e.g., the 3GPP and the Next Generation Mobile Networks
which is populated by monitoring, for a full month, mo- (NGMN) alliance of mobile network operators. The data on
bile traffic from four operative radio cell sites with differ- which we have worked is extracted from an LTE network, but
ent demographic characteristics within the metropolitan the methodology can be reproduced, adapting the underlying
area of Barcelona, in Spain. software, also to New Radio (NR), as the PDCCH flows are
For the traffic analysis, the focus is put on a few selected ser- transmitted unencrypted also in 5G NR [12] [13].
vices and applications that dominate the radio resource usage, In summary, the original contributions of this work are:
but the approach can be readily extended to other scenarios.
• Mobile Data Labeling at the Edge of the Network: we
Raw DCI data is used directly as input into deep learning
collect and label LTE PDCCH DCI data traces from six
classifiers (automatic feature extraction), achieving accuracies
mobile apps to create a unique correspondence between
as high as 98% for both mobile service and app identification
the software programs (the apps) and the session iden-
tasks. Moreover, an original technique to use our best classifier
tifiers that were assigned to them by the eNodeB. The
in unsupervised settings is presented, to profile the mobile
result is a labeled dataset of real DCI data from selected
traffic from operative radio cell sites at runtime. The developed
applications, i.e., YouTube, Vimeo, Spotify, Google Mu-
classification algorithms, as well as our experimental results,
sic, Skype and WhatsApp video calls.
are highly novel within the traffic monitoring literature, which
• Classification and Benchmarks: we tailor deep artificial
only provides hourly and aggregated measures for typical
Neural Networks (NNs), namely Multi-Layer Perceptron
days [8] [9], and where traffic profiling is performed from
(MLP), Recurrent Neural Networks (RNNs) and CNNs,
UDP/TCP, IP or above IP flows, e.g., [4]–[6].
to perform classification tasks for mobile services and app
We believe that the approach proposed in this work brings
identification on the labeled dataset. Moreover, we com-
a high value along several dimensions, as discussed next.
pare their performance against a number of benchmark
Collecting and processing control channel information may
state-of-the-art classifiers.
reduce the storage capabilities of the network elements, since
• Mobile Data Collection at the radio-link level from an
the volume of DCI messages is much smaller than that from
Operative Mobile Network: we collect real LTE PD-
the user plane. Moreover, consumer privacy is maintained
CCH DCI data traces, with a time granularity of 1 ms,
as no deep packet inspection mechanism on user content is
from four eNodeBs located in the metropolitan area of
required. The proposed solution works directly at the network
Barcelona, in Spain. Each of these datasets has a duration
edge, so any action to attain a better and secure management
of 1 month.
of the network (e.g., network slicing optimization) can be
• Mobile Service Profiling from Unlabeled Data: the CNN
promptly taken. Classification time does not depend on the
classifier, which is found to be the best among all Neural
number of transmitted user data packets, but on the internal
Network (NN) schemes, is augmented with the capability
system clock (i.e., the frequency of DCI messages in our
of rejecting out of distribution sessions, i.e., sessions
case). Our tool permits a better understanding of spectrum
whose statistical behavior departs from those learned
needs across time and space. Note that there are limited means
during the training phase. This makes it possible to use
to non-intrusively monitor user density and traffic demand in
it with unlabeled traffic, as an online and unsupervised
real-time. These measurements are key for the correct dimen-
traffic monitoring tool. The augmented CNN classifier
sioning of future mobile systems, and the investigation of new
rejects those sessions for which it is uncertain, providing
data communication and processing techniques for wireless
a robust classification outcome. Through its use, the four
channel interfaces. We only found a limited number of datasets
selected eNodeBs are monitored, getting a fine grained
available including mobile data traffic [10] [11], but they are
traffic decomposition.
often incomplete, poorly documented and contain aggregated
data, often hourly averaged. The proposed technique allows The paper is organized as follows. Section II presents
the extraction of data flows with a granularity of one second, the experimental framework and the proposed methodology
tagging the type of service and application exploited by each to obtain the two datasets. Section III introduces the two
mobile user within an LTE cell. classification problems, namely, service and app identification
In addition to the above, we believe that the proposed and presents the classification algorithms. The performance
approach brings a considerable added value to the research of the classification algorithms is assessed in Section IV. In
community in general. In fact, we propose a methodology Section V, the CNN classifier is augmented with the capability
for cellular networks control data generation, which can be of rejecting out of distribution sessions. Thus, the mobile
3

traffic from four selected cell sites of an operative mobile any user. The number of resource blocks that are assigned
network in Spain is decomposed over a full day. The related to a UE (NRB ), is derived based on the DCI bitmap.
work on mobile traffic classification is reviewed in Section VI, • Modulation order and code rate: the MCS is a 5-bit field
and some concluding remarks are provided in Section VII. that determines the modulation order and the code rate
that are used, at the physical layer, for the transmission
II. DATASET C REATION
of data to the UE.
Fig. 1 shows the different building blocks of the experi- • Transport Block Size (TBS): the TBS specifies the length
mental framework that has been developed to populate the of the packet to be sent to the UE in the current
unlabeled and labeled datasets. Briefly, the data measurement Transmission Time Interval (TTI). It is derived by from
and collection block acquires data from the LTE PDCCH chan- a lookup table by using MCS and NRB , see [15].
nel to extract the relevant DCI information. Data preparation,
instead, processes the gathered DCI data so that it can be used The rationale behind this work is that the (unencrypted)
for training and classification purposes. downlink and uplink TBS data of a given UE should pro-
vide sufficient information for learning algorithms to reliably
A. Data Collection System classify the app and the service that the user is running.
In LTE, the eNodeB communicates scheduling information
to the connected UEs through the DCI messages that are B. Unlabeled Dataset
carried within the PDCCH with a time granularity of 1 ms. Thanks to the just described DCI collection system, four cell
While the actual user content is sent over encrypted dedicated sites of a Spanish mobile network operator in the metropolitan
channels, i.e., the Physical Uplink/Downlink Shared Channel area of Barcelona have been monitored for a full month.
(PUSCH/PDSCH respectively), the PDCCH is transmitted The selected eNodeBs are located in areas having different
in clear text and can be decoded. To process DCI data, demographic characteristics and land uses, so as to diversify
we have adapted the OWL monitoring tool [7]. A Software- the captured traffic in terms of service and app behavior.
Defined Radio (SDR) has been programmed, acquiring the We have named the datasets according to the corresponding
PDCCH via an open-source software sitting on top of the neighborhood: PobleSec (mainly residential area), Born (mixed
srs-LTE library [14], which makes it possible to synchronize residential, transport and leisure area), Castelldefels (mixed
and monitor the channel over a specified LTE bandwidth. The suburban and campus area), Camp Nou (mixed residential and
SDR is connected to a PC that performs the actual decoding stadium area). In total, we have collected more than 68 GB of
of DCI data: in our experimental settings, we used a low cost DCI data from the LTE PDCCH. Fig. 2 shows the locations of
Nuand BladeRF x40 SDR and an Intel mini-NUC, equipped the four monitored sites, along with that of the data collection
with an i5 2.7 Ghz multi-core processor, 256 GB Solid State system. After the data collection, the signaling associated
Storage (SSD) storage and 18 GB of RAM. with each active C-RNTI is extracted from the PDCCH DCI
Decoded DCI messages for a connected UE contain the data stream, and is prepared for the classifier. During this,
following scheduling information [15]: we discarded short-length traces, which are mainly due to
signaling, paging and background traffic. These accounted for
• Radio Network Temporary Identifier (RNTI),
less than 3% of the total traffic in the monitored radio cells.
• Resource Block (RB) assignment,
C. Labeled Dataset
• Modulation and Coding Scheme (MCS).
A labeled dataset is obtained by running specific services
DCI messages use RNTIs to specify their destination. and apps at a mobile terminal under our control, detecting its
RNTIs are 16-bit identifiers that are employed to address UEs C-RNTI within the PDCCH channel and finally associating
in an LTE cell. They are used for different purposes such as to the corresponding DCI trace with a label, which links it
broadcast system information (SI-RNTI), to page a specific UE to the service/app that is executed at the UE. The mobile
(P-RNTI), to carry out a random access procedure (RA-RNTI), device used to generate the labeled dataset is a Huawei Y6
and to identify a connected user, i.e., the cell RNTI (C-RNTI). phone running an Android operating system. YouTube, Vimeo,
In this work, we are interested in the C-RNTI, that is tem- Spotify, Google Music, Skype and WhatsApp video calls have
porarily assigned when the UE is in RRC (Radio Resource been utilized to generate traffic. For video calls, a background
Control) CONNECTED state, to uniquely identify it inside video was run to generate the video and the audio. Music and
the cell. The C-RNTI can take any unreserved value in the videos were selected at random from automatically generated
range [0x003D–FFF3]. Once the C-RNTI is assigned to a thematic lists.
connected UE, the DCI information directed to this terminal Generating data sessions is easy, and boils down to running
is sent using this C-RNTI, which is transmitted in clear text a specific app from a device that we control, and that is
as part of the PDCCH channel. Hence, knowing the C-RNTI connected to the monitored eNodeB. The difficult part is to
allows tracking a specific connected user within the radio cell. identify the generated data flow among those carried by the
Assuming that the C-RNTI is known (see Section II-C), the PDCCH channel, which contains DCI information for all the
following information about the ongoing communication for connected UEs within the radio cell. We made this labeling
this UE can be extracted from its DCI data: possible by injecting a watermark into the traffic that we
• Number of allotted resource blocks: in LTE, a RB repre- generated by the controlled UE, so that it could be easily
sents the smallest resource unit that can be allocated to identified among all other users.
4

Fig. 1: Experimental framework adopted for the creation of the unlabeled and labeled datasets.

(a) Castelldefels: suburban area with (b) Camp Nou: mainly residential area (c) Born: mixed residential, transport (d) PobleSec: mainly residential area.
a university campus. with Barcelona FC stadium. and leisure area.

Fig. 2: Maps of Barcelona metropolitan areas where the measurement campaign took place for the creation of unlabeled and labeled datasets. In the maps,
the eNodeB location is denoted by A, whereas the data collection system and the mobile terminal are marked as B. In Castelldefels, the mobile terminal has
been placed in two different locations (B1 and B2 ).

1) Data preparation and watermarking beled M = 11, 601 mobile sessions, gathering the scheduling
The data preparation procedure is divided into two phases: information contained in the DCI messages for selected apps.
1) the identification of the C-RNTI corresponding to the We considered three data-intensive services: video streaming,
controlled UE, 2) the extraction and labeling of the cor- audio streaming and real-time video calling, which represent
responding DCI trace. In the LTE PDCCH channel, each classes producing a considerable amount of traffic and taking
UE is identified by the C-RNTI, which uniquely identifies most of the network resources [1]. For each service type, we
the mobile terminal within the radio cell. This identifier is chose two popular applications: Spotify and Google Music for
temporary, i.e., it changes after short inactivity periods. This audio, YouTube and Vimeo for video streaming, while for the
is done to prevent the plain tracking of mobile users, since video calling we picked two instant-messaging applications,
the PDCCH is sent unencrypted. To allow traffic labeling (i.e., namely, Skype and WhatsApp Messenger.
user identification), we introduce a watermark into the traffic
generated by our mobile terminal. This watermark amounts to
A large measurement campaign was conducted to expose
producing, for each application, a regular pattern: any instance
the mobile terminal running the selected apps to different
of a given application (e.g., YouTube) is run for a pre-defined
radio link conditions, thus obtaining a comprehensive dataset.
amount of time (80 seconds in our measurements), then, a
In particular, the UE was placed into two different locations
pause interval of fixed duration is inserted before running
(termed B1 and B2 in Fig. 2a) within the Castelldefels radio
another instance of the app for further 80 seconds. We loop
cell to experience different received signal qualities (−84 dBm
this over time, obtaining a duty cycled activity pattern that
and −94 dBm for B1 and B2 , respectively), and in the Camp
is easily distinguishable from all the other activity traces
Nou eNodeB during football matches, to capture data in high
within the radio cell. Through this watermarking procedure,
cell load conditions.
we can successfully associate our UE with the corresponding
C-RNTI from the DCI. Also, we split the traces into different
sessions (the difference instance of the same app running Fig. 3 shows a few radio resource usage patterns collected
for 80 seconds) thanks to the duty cycled pattern, where for the selected apps. Some similarities can be recognized
subsequent sessions are separated by the pause interval (of within the same service class. For example, audio and video
fixed duration). The label, corresponding to the application streaming present similar behaviors. Also, significant differ-
that is executed at the mobile terminal, is finally associated ences can be observed between the radio resource usage of
with the extracted DCI data. We remark that this procedure real time video calls (Skype and WhatApp Video) and the other
makes it possible to capture several instances of a given app apps. Video and audio streaming applications use up a high
in a row, which are stored in our dataset with the associated amount of radio resources at the beginning of the sessions,
app/service label and does not modify the normal network buffering most of the content into the terminal memory. Real
behavior to serve the running app. Also, we found instances time video calling, instead, entails a continuous transmission
of 80 seconds (observation time) to be more than sufficient and a more constant usage of radio resources throughout the
to classify the service/app and longer instances would lead to sessions. Note that the amount of data exchanged in the uplink
negligible improvements. direction is significant only for this service class, since a video
In our measurement campaign, we have recorded and la- call requires a bidirectional communication.
5

1.00
vimeo
0.75 youtube
Norm. Rate

spotify
0.50 skype
0.25 whatsappvideo
googlemusic
0.00
0 10 20 30 40 50 60 70 80 90 100
Time [s]
youtube down youtube up spotify down spotify up skype down skype up
vimeo down vimeo up googlemusic down googlemusic up whatsappvideo down whatsappvideo up
video-streaming audio-streaming video-calls
1.0 1.0 1.0

0.5 0.5 0.5

0.0 0.0 0.0


0 20 40 60 0 20 40 60 0 20 40 60 80
Time [s] Time [s] Time [s]
Fig. 3: Traffic pattern snapshots showing the normalized data rate for different applications as a function of time.

1.0 downlink 1.0 video-streaming


W uplink audio-streaming
0.8 window 0.8 video-calls
S W stride
0.6
Norm. Rate

S W 0.6
0.4
correlation
S W 0.4
0.2
S W 0.2
0.0
S W
0.0
0.2
0 20 40 60 80 100
Time [s] 0.2
Fig. 4: Sliding window of 20s length and 15s stride, applied to a sample
0 2 5 8 10 12 15 18
video-streaming session. n
Fig. 5: Pearson correlation between the initial and the following sessions
D. Synchronous and Asynchronous Sessions running in the controlled UE.
Through the watermarking approach and the splitting pro-
cedure, we obtained a labeled dataset, where each session, transmission of user data in the form of duty-cycled patterns
depending on the service, presents patterns similar to those may affect the way in which the eNodeB handles the com-
shown in Fig. 3. Assuming that the beginning and the end munication from our terminal, e.g., through some advanced
of each session are known is rather optimistic, as in a real channel reservation mechanism. In that case, in fact, our
measurement setup we have no means to accurately track these watermarking strategy would be of little use, as it would
instants. Put it another way, it is unlikely that the LTE PDCCH introduce scheduling artifacts that do not occur in real life
measurements and the application run on the UE will be conditions. To verify this, we evaluated the Pearson correlation
temporally synchronized. Synchronizing the measurement with between the initial session (i.e., when the UE connects to
the beginning of each session would facilitate the classification the LTE PDCCH for the first time and it is assigned a new
task, since most of the generated traffic is buffered on the C-RNTI) and the following ones. Fig. 5 shows that, for each
terminal at the beginning, see Fig. 3, and this behavior is a of the three services, the correlation is high only when we
distinctive feature that is easy to discriminate. compare the first session with itself (n = 0). Instead, low
To ensure the applicability of our classifiers to real world values are observed between the first session and the following
(asynchronous) cases, we account for asynchronous sessions, ones (n > 1), indicating that the behavior of the eNodeB
entailing that the classification algorithm has no knowledge scheduler is not affected by the repetitive actions (i.e., the
about the instants where the sessions start and finish. Specif- duty-cycled activity) performed at the UE side.
ically, each session is split using a sliding window of length III. C LASSIFICATION P ROBLEM
W seconds, moved rightwards from the beginning of the
A. Problem Definition
session with a stride of S seconds, see Fig. 4. The split sessions
(asynchronous sessions), of W seconds each, represent the in- Let M = 11, 601 be the total number of sessions obtained
put data to our classification algorithms. Note that W and S are through the data preparation procedure of Section II-C, L =
hyper-parameters of the proposed classification frameworks. 80 seconds is the duration of each session, and D = 2 is the
number of communication directions (downlink and uplink).
E. Sessions Correlation over Time We define X as the input dataset tensor with size M × L × D,
As a sanity check, we verify the soundness of the wa- where the m-th row vector xm contains the trace associated
termarking strategy: our aim is to understand whether the with W ∈ [0, L] TBS samples per session for both downlink
6

and uplink directions. The time-series described by xm is the


input sequence of our algorithms.
A classifier estimates a function c : X → Y , where the out-
put matrix Y has size M ×K, with K representing the number
of classes. The row vector ym = c(xm ) = [ym1 , . . . , ymK ]
contains the probabilitiesPthat session m belongs to each
of the K classes, with k ymk = 1. The final output of
the classifier is class k ? , where k ? = argmaxk (ymk ). The
following classification objectives are addressed:
O1) Service identification: to classify the collected sessions Fig. 6: RNN architecture.
into K = 3 classes, namely, audio streaming, video
streaming and video calls; gradient descent algorithm [18], by minimizing the categorical
O2) App identification: to identify which app is run at the cross-entropy loss function L(w), defined as [16]
UE. In this case, the number of output classes is K = 6, K
namely, Spotify, Google Music, YouTube, Vimeo, Skype
X X
L(w) = − tk (xm ) log(ymk (w, xm )). (1)
and WhatsApp Messenger. xm ∈B k=1
Next, we present the considered classification algorithms, where t(x)m = [t1 (xm ), . . . , tk (xm )] contains the class
grouping them into two categories: those based on artificial labels associated with the input trace xm , i.e., tk = 1
neural networks and those based on standard machine learning if xm belongs to class k and tk = 0 otherwise (1-of-K
techniques (referred to here as benchmark classifiers). coding scheme). Vector w contains the MLP weights and
B. Deep Neural Networks ymk (w, xm ) is the MLP output obtained for input xm . Eq. (1)
Next, we describe how we tailored three neural network is iteratively minimized using the training examples in the
architectures to solve the above traffic classification problem, batch set B ⊂ X, where B is changed at every iteration so
namely, Multilayer Perceptron (MLP), Recurrent Neural Net- as to span the entire input set X.
works (RNNs) and Convolutional Neural Networks (CNNs). 2) Recurrent Neural Networks
1) Multilayer Perceptron Recurrent Neural Networks (RNNs) have been conceived to
A multilayer perceptron is a feedforward and extract features from temporal (and correlated) data sequences.
fully-connected neural network architecture. The term Long Short-Term Memory (LSTM) networks are a particular
“feed-forward” refers to the fact that the information flows type of RNN, introduced in [19]. They are capable of track-
in one direction, from the input to the output layer. An MLP ing long-term dependencies into the input time series, while
is composed of, at least, three layers of nodes: an input, a solving the vanishing-gradient problem that affects standard
hidden and an output layer. A directed graph connects the RNNs [20].
input with the output layer and each neuron in the graph uses The capability of learning long-term dependencies is due
a non-linear activation function to produce its output. Links to the structure of the LSTM cells, which incorporates gates
are weighted and the backpropagation algorithm is utilized that regulate the learning process. The neurons in the hidden
to train the network in a supervised fashion, i.e., to find the layers of an LSTM are Memory Cells (MCs). A MC has the
set of network weights that minimize a certain error function, ability to retain or forget information about past input values
given an input set of examples and the corresponding labels. (whose effect is stored into the cell states) by using structures
For further details, see [16]. called gates, which consist of a cascade of a neuron with
The MLP that we use for mobile traffic classification has sigmoidal activation function and a pointwise multiplication
three fully connected layers. The first layer M LP 1 con- block. Thanks to this architecture, the output of each memory
tains NM LP 1 = 128 neurons, the second layer M LP 2 has cell possibly depends on the entire sequence of past states,
NM LP 2 = 64 neurons and the third layer M LP3 is fully making LSTMs suitable for processing time series with long
connected, with K neurons and a softmax activation function time dependencies [19]. The input gate of a memory cell is
to produce the final output. The output of M LP 3 is the class a neuron with sigmoidal activation function (σ). Its output
probability vector ym . determines the fraction of the MC input that is fed to the cell
All neurons in layers M LP1 and M LP2 use a leaky state block. Similarly, the forget gate processes the information
version of the Rectified Linear Unit (ReLU) (leaky ReLU) that is recurrently fed back into the cell state block. The output
activation function. Leaky ReLUs help solve the vanishing gate, instead, determines the fraction of the cell state output
gradient problem, i.e., the fact that the error gradients that that is to be used as the output of the MC, at each time
are backpropagated during the training of the network weights step. Gate neurons usually have sigmoidal activation functions
may become very small (zero in the worst case), preventing (σ), while the hyperbolic tangent (tanh) activation function is
the correct (gradient based) adaptation of the weights. To usually adopted to process the input and for the cell state. All
prevent this from happening, leaky ReLUs have a small the internal connections of the MC have unitary weight [19].
negative slope for negative values of their argument [17]. To The proposed RNN based traffic classification architecture is
train the presented MLP architecture, we use the RMSprop shown in Fig. 6. In our design, we consider three stacked layers
7

TABLE I: Configuration parameters for the benchmark classifiers.


Algorithm Parameters Note - Reference

• penalty = L2 • c: penalty parameter for the


Linear error term
• loss = Hinge Loss
SVM • extended to multi-class with
• c = 0.025 one-vs-rest [23]

• c: inverse of the regulariza-


Logistic • penalty = L2 tion strength
Regressor •c=1 • extended to multi-class with
one-vs-rest [23]
Fig. 7: CNN architecture.
• K: number of neighbors for
•K=3 queries
combining two LSTM layers and a final fully connected output Nearest •p=2 • p: distance metric parameter
layer. The first and the second layer (respectively RN N 1 and Neighbours • metric = Minkowski • p = 2 amounts to using the
Euclidean distance [24]
RN N 2 ) have NRN N 1 = NRN N 2 = 180 memory cells. The
fully connected layer RN N 3 uses the softmax activation func- • n. estimators: number of
trees in the forest
tion and its output consists of the class probability estimates, • n. estimators = 10 • max depth: maximum depth
as described in Section III-B1. Random • max depth = 5 of a tree
Forest • criterion = entropy • criterion: function to mea-
sure the quality of a split of
3) Convolutional Neural Networks subsets [26]

• Radial Basis Function (RBF)


Convolutional Neural Networks (CNNs) are feed-forward used as kernel
deep neural networks differing from fully connected MLP for • kernel = RBF • σ is the sigmoid function
used to “squash” the nuisance
the presence of one or more convolutional layers. At each Gaussian • σ = Logistic func.
function
Processes • approx. = Laplacian
convolutional layer, a number of kernels is used. Each kernel is • Laplacian method used to
composed of a number of weights and is convolved across the approximate the non Gaussian
Posterior [27]
entire input signal. Note that the kernel acts as a filter whose
weights are re-used (shared weights) across the entire input C. Benchmark classifiers
and this makes the network connectivity structure sparse, i.e., Other standard classification schemes have been tailored
a small set of parameters (the kernel weights) suffices to map to the considered tasks O1 and O2. The selected algorithms
the input into the output. This leads to a considerably reduced are: Linear Logistic Regression, K-Nearest Neighbours and
computational complexity with respect to fully connected feed Linear SVM, as examples of linear classifiers; Random Forest,
forward neural networks, and to a smaller memory footprint. as an ensemble learning method, and Gaussian Processes as
For more details the reader is referred to [21]. an instance of Bayesian approaches. The implementations of
Linear Logistic Regression, K-Nearest Neighbours and Linear
CNNs have been proven to be excellent feature extractors
SVM are based on [23], [24] and [25], respectively. The
for images and inertial signals [22] and here we show their
Random Forest implementation is based on [26], whereas for
effectiveness for the classification of mobile traffic data. The
the classifier based on Gaussian Processes we refer to [27].
CNN architecture that we designed to this purpose is shown
Configuration parameters and implementation details for the
in Fig. 7. It is a 1-Dimensional CNN with two main parts:
benchmark classifiers are provided in Table I.
the first four layers perform convolutions and max pooling
in cascade, the last two layers are fully connected. The first IV. S UPERVISED T RAINING AND C OMPARISON OF
convolutional layer CN N1 uses one dimensional kernels (1×5 T RAFFIC C LASSIFICATION A LGORITHMS
samples) performing a first filtering of the input and processing The performance tests have been carried out using an Intel
each input vector (rows of X) separately. The activation core i7 machine, with 32 GB of RAM and an NVIDIA
functions are linear and the number of convolutional kernels GTX 980 GPU card. We divided the dataset, featuring 11601
is NCN N 1 = 32. The second convolutional layer, CN N2 , labeled DCI sessions, into training and validation sets with a
uses one dimensional kernels (1 × 5 samples) with non-linear split ratio of 70% - 30%. These sets are balanced, as they
hyperbolic tangents as activation functions, and the number contain the same percentage of traces for all classes. The
of convolutional kernels is NCN N 2 = 64. Max pooling is classification algorithms have been implemented in Python.
separately applied to the outputs of CN N1 and CN N2 to We have used keras library on top of Tensorflow backend
reduce their dimensionality and increase the spatial invariance for the implementation of deep NNs. For the benchmark
of features [22]. In both cases, a one-dimensional pooling classifiers, we used the popular sklearn library.
with a kernel of size 1 × 3 is performed. A third fully
connected layer, CN N3 , performs dimensionality reduction A. Performance Metrics
and has NCN N 3 = 32 neurons with Leaky ReLU activation The classification performance is assessed through the fol-
functions. This layer is used in place of a further convolutional lowing metrics:
layer to reduce the computation time, with a negligible loss 1) Accuracy: defined as the ratio between the number
in accuracy. The last (output) layer CN N4 is fully connected of correctly classified sessions to the total number of
with softmax activation functions, and returns the class prob- sessions.
ability estimates, see Sections III-B1. 2) Precision P : defined, for each class, as the ratio between
8

Algorithm Precision Recall F-Score # Parameters Accuracy Sync% Accuracy Async % Difference %
Linear SVM 0.811 0.812 0.805 36726 81.23 68.41 -12.8
Logistic Regressor 0.806 0.816 0.809 486 81.61 65.72 -15.9
Nearest Neighbours 0.843 0.845 0.841 36720 84.51 79.65 -4.9
Random Forest 0.821 0.835 0.827 41310 83.52 70.21 -13.3
Gaussian Processes 0.874 0.871 0.871 146720 87.43 81.21 -6.2
Neural Networks
MLP 0.900 0.900 0.900 19014 90.04 84.61 -5.4
RNN 0.967 0.968 0.968 392046 96.57 92.93 -3.6
CNN 0.978 0.976 0.977 25062 97.77 93.20 -4.5
TABLE II: Classifiers comparison for the app identification task.
Algorithm Precision Recall F-Score # Parameters Accuracy Sync% Accuracy Async % Difference %
Linear SVM 0.908 0.908 0.907 19843 90.80 79.61 -11.2
Logistic Regressor 0.904 0.904 0.904 243 90.42 81.11 -9.3
Nearest Neighbours 0.925 0.925 0.925 19840 92.76 83.45 -9.3
Random Forest 0.915 0.915 0.915 22320 91.57 84.25 -7.3
Gaussian Processes 0.932 0.928 0.929 73360 93.21 82.51 -10.7
Neural Networks
MLP 0.943 0.939 0.942 18819 94.31 93.38 -0.9
RNN 0.981 0.982 0.981 391503 98.21 95.38 -2.8
CNN 0.986 0.988 0.988 24963 98.87 95.40 -3.5
TABLE III: Classifiers comparison for the service identification task.

true positives Tp and the sum between true positives and but use a small number of parameters. This fact confirms
false positives Fp , the high efficiency of convolution operations in processing
Tp high amount of data with complex temporal structure, and
P = , (2) the effectiveness of CNN parameter sharing. CNNs requires
Tp + Fp
only 6% of the variables used up by RNNs, achieving a better
3) Recall R: defined, for each class, as the ratio between accuracy. This also translates into a faster training: in Fig. 8,
the true positives Tp and the sum of true positives and we show the accuracy as a function of the number of epochs
false negatives Fn , for training and validation for RNNs and CNNs, using an
Tp observation window of W = 80 TBS samples. The number of
R= . (3) epochs required by the CNNs to reach an accuracy higher than
Tp + Fn
90% is fewer than 20 (Fig. 8b), whereas for RNNs convergence
4) F-Score F is defined as the harmonic mean of precision is achieved only after 30 epochs (Fig. 8a).
P and recall R, From Tables II and III, we observe a significant perfor-
1 −1
+ R1 RP mance gap between the service and app classification tasks:
F = P =2 . (4) the difference in the accuracy is higher than 8% for the
2 R+P
benchmark classifiers and ranges from 2 to 6% for deep NNs.
Note that the definition of precision and recall only applies to This is mainly due to the higher number of classes for the
classification tasks with one class. However, tasks O1 and O2 app identification task, which increases the mis-classification
both have a number of classes K > 2, namely, K = {3, 6} probability, as discussed shortly below in the analysis of the
for app and service identification, respectively. Thus, precision Confusion Matrix of Fig. 9.
and recall are separately calculated for all the K classes, and Results to assess the effect of asynchronous readings are
their average is shown in the following numerical results. shown in Tables II, III and Fig. 10. As shown in the last
B. Comparison of Classification Algorithms two columns of Tables II and III, training the algorithms with
asynchronous sessions decreases their classification accuracy.
1) Accuracy and Algorithm Training
We observe a general decrease for all the algorithms (−6.0%
Tables II and III summarize the obtained performance for service identification, −7.7% for app identification, on
metrics for the deep NNs and the benchmark classifiers for average). This occurs as the beginning of the sessions holds
app and service identification, respectively. Results refer to key information on the session type, thus simplifying the
an observation window W = 80. In both synchronous and classification task (as shown in Fig. 3). However, for both
asynchronous cases, higher accuracy is achieved by deep NNs classification problems, neural network-based approaches are
than the benchmarks (up to +13.8% on app identification, more robust to asynchronous readings, showing a performance
+8.7% on service classification). The algorithm based on degradation of −4.3%, while the degradation for the bench-
Gaussian Processes performs the best among the benchmark mark algorithms is −8.4%, see also Fig. 10.
classifiers. In general, the higher the complexity (i.e., the
number of parameters, and also the number of hidden layers 2) Confusion Matrix
for NNs), the higher the performance. The only exception to A deeper look at the performance of CNNs is provided by
this is provided by CNNs, which present the highest accuracy the confusion matrices of Fig. 9, whose rows and columns
9

1.00 1.00
0.95 0.95
0.90 0.90
0.85 0.85
accuracy

accuracy
0.80 0.80
0.75 0.75
0.70 0.70
0.65 training set 0.65 training set
validation set validation set
0.60 0.60
0 20 40 60 80 100 0 20 40 60 80 100
num. of epochs num. of epochs
(a) RNN. (b) CNN.
Fig. 8: Accuracy vs number of epochs for training and validation sets for the app identification task in the asynchronous case.

respectively represent true and predicted labels, and all values tions directions (downlink and uplink). Given X 0 , as input, the
are normalized between 0 and 1. For the service classification classifier c computes the output Y 0 , whose analysis provides
task (Fig. 9a), CNNs only misclassify the video streaming a detailed characterization of the mobile user requests for
sessions: 2% of those are labeled as video calls. For the app the eNodeB within the monitored time span. Vectors x0m and
0
identification task (Fig. 9b), errors (4%) mainly occur for ym = c(x0m ) respectively indicate the m-th sample of X 0 and
0
Skype and WhatsApp videocalls. These errors are understand- Y . In this paper, we restrict our attention to the unsupervised
able, as these are both interactive real-time video applications classification of services, and use the CNN classifier, as it
and, as such, their traffic patterns bear similarities. The lowest yields the highest accuracy.
performance is found for Vimeo traces, for which 88% of
the sessions are correctly classified. Here our CNN-based A. Aggregated Traffic Analysis
classifier confuses them with the other video applications for
both streaming service (Youtube - 3%) and real-time calling Fig. 12 shows the aggregated traffic demand for the four
(WhatsApp and Skype - 6% and 3%, repectively). selected eNodeBs over the 24 hours of a typical day, where
each curve has been normalized with respect to the maximum
3) Impact of Different Window Sizes
traffic peak occurred during the day for the corresponding
Fig. 11 shows the classification accuracy as a function of eNodeB. The four traffic profiles have a different trend, which
the window size, W . Results here are shown for 40 epochs depends on the characteristics of the served area (demo-
training. For the app identification task, 40 seconds suffice graphics, predominant land use, etc.), as confirmed by [28].
for CNNs and RNNs to reach accuracies higher than 90%, PobleSec is a residential neighborhood and, as such, presents
with negligible additional improvements for longer observation traffic peaks during the evening, at 5 and 11 pm. Born is
periods. Periods shorter than 40 seconds provide less accurate instead a downtown district with a mixed residential, transport
results. Similar trends are observed for the service classifica- and leisure land use. Two peaks are detected: the highest is at
tion task. However, in this case after 20 seconds the accuracy lunchtime around 2 pm, whereas the second one is at dinner
of CNNs and RNNs is already higher than 90%, due to the time, from 9 pm. This traffic behavior is likely due to the
smaller number of classes. In summary, the ability of CNNs many restaurants and bars in the area. CampNou is mainly
and RNNs to extract representative statistical features from a residential and presents a similar profile to PobleSec. However,
session grows with the input data length. In our tasks, deep Barcelona FC stadium is located in this area, and three football
NN algorithms become very effective as monitoring intervals matches took place during the monitored period (events started
get longer than 20 seconds. at 8:45 pm and ended at 10:45 pm). As expected, a higher
Contrarily to W , we have experimentally verified that
traffic intensity is observed during the football match hours.
parameter S does not appreciably affect the performance of
In particular, we registered a high amount of traffic exchanged
the algorithm.
between 7 pm to 1 am, i.e., before, during and after the events.
V. U NSUPERVISED T RAFFIC P ROFILING This behavior is probably due to the movement of people
Next, we analyze the mobile traffic exchanged within the attending the matches. Castelldefels is a suburban and low
four selected cell sites (see Section II-B). The traffic load is populated area with a university campus. The traffic variation
modeled in terms of aggregated traffic dynamics and type of suggests a typical office profile with traffic peaks at 10 am
service requests over the 24 hours of a day. The identifica- and 5 pm. However, in this radio cell the amount of traffic
tion of mobile traffic, for each of the considered services, exchanged is the lowest observed across all eNodeB sites, i.e.,
is performed using the trained classifiers from Section III 6.8 Gb/hour in the peak hours. The highest traffic intensity was
with the unlabeled dataset. Formally, for each eNodeB, the measured in Camp Nou, reaching a peak of 106.1 Gb/hour
corresponding unlabeled dataset is stored into the tensor X 0 , (29.5 Mb/s on average). Intermediate peak values are detected
with size M 0 × N × D, where M 0 corresponds to the number in Poble Sec and Born, amounting to 49.7 Mb/s and 46.1 Mb/s,
of monitored RNTIs (sessions, N to the number of collected respectively. The only common pattern among the four areas
samples per session and D = 2 is the number of communica- is the low traffic intensity at night, between 2 am and 7 am.
10

Confusion matrix Confusion matrix


googlemusic 1.00 0.00 0.00 0.00 0.00 0.00
video_streaming 0.98 0.02 0.00
skype 0.00 0.96 0.00 0.00 0.04 0.00

spotify 0.00 0.00 1.00 0.00 0.00 0.00


True label

True label
video_call 0.00 1.00 0.00
vimeo 0.00 0.03 0.00 0.88 0.06 0.03

whatsappvideo 0.00 0.04 0.00 0.00 0.96 0.00


audio 0.00 0.00 1.00
youtube 0.00 0.00 0.00 0.00 0.00 1.00

all
ng

dio

ic
pe

fy

eo

yo o
be
_c
i

e
us
au
am

oti

vim

vid
sky

utu
eo

sp
re

pp
vid

gle
_st

a
o

ats
eo

go
vid

wh
Predicted label Predicted label
(a) Confusion matrix for service identification. (b) Confusion matrix for app identification.
Fig. 9: Confusion matrices for the CNN algorithm.

100 Asynchronous Sessions 100 Asynchronous Sessions


Synchronous Sessions Synchronous Sessions
95 95

90 90

85 85
Accuracy %

Accuracy %

80 80

75 75

70 70

65 65

60 60
SVM

eg

rest

s
MLP

RNN

CNN

SVM

eg

rest

s
MLP

NN

NN
hbor

oces

hbor

oces
tic R

tic R

conv
LSTM
m Fo

m Fo
ar

ar
Neig

Neig
n Pr

n Pr
Logis

Logis
Line

Line
do

do
ssia

ssia
rest

rest
Ran

Ran
Gau

Gau
Nea

Nea

(a) App identification. (b) Service identification.


Fig. 10: Impact of synchronizing the DCI readings with the start of the user’s sessions.

B. Traffic Decomposition at Service Level of automatically detecting all of these, as an aggregate (marked
as OOD), allowing the framework to attain higher accuracies
The set of applications that we have labeled is restricted for the remaining classes, which generate patterns that are
to those apps and services that dominate the radio resource similar to those that were observed during the learning phase.
usage. However, additional apps may also be present in the
monitored traffic, such as Facebook, Instagram, Snapchat, etc. To identify these “statistical outliers”, the DCI data from
These apps generate mixed content, including audio-streaming, each new session, x0m , is fed to the CNN and the corre-
0 0 0
video-streaming and video-calling. Additional service types sponding softmax output vector ym = [ym1 , . . . , ymK ]T is
0
may also be generated by, e.g., web-browsing and file down- used to discriminate whether xm is OOD or not, following
loading. While in the present work the classifiers were not the rationale in [29] [30]. In detail, the k-th softmax output
trained to specifically track these apps, for a robust clas- corresponds to the probability estimate that a given input ses-
sification outcome, it is desirable that the audio and video sion x0m belongs to class k, i.e., ymk
0
= Prob(x0m ∈ class k),
streams that they generate will either be captured and classified with k = 1, . . . , K. The classifier chooses the class k ? that
into the correct service class, or at least flagged as unknown. maximizes this probability, i.e.,
To locate those traffic patterns for which our classifier may 0
k ? = argmax ymk . (5)
produce inaccurate results, in our analysis we additionally k

account for the detection of out-of-distribution (OOD) ses- If a new app, not considered in the training phase, generates
sions, i.e., of DCI traces that show different traffic dynamics sessions having similar characteristics to those in the training
from those learned at training time. Other services, that are set, namely, audio-streaming, video-streaming or real time
classified as OOD fall within the categories of Web browsing, video-calls, we expect the CNN to generalize well and return
file downloading, interactive applications such as texting and similar vectors at the output of the softmax layer. That is,
messaging. These are common but dot not take a large portion the softmax vector that is outputted at runtime for the new
of the radio resources. Having an OOD class has the advantage app should be sufficiently “close” to the output learned by
11

[s] [s]

(a) Accuracy for app identification. (b) Accuracy vs Window Size W for service identification.
Fig. 11: Accuracy vs window lengths for app and service classification tasks in the asynchronous measurement case.

1.0 eNB Camp Nou the highest value of 1 at the spatial median and decreases to
eNB Castelldefels 0
eNB Poble Sec zero as ym moves away from it. The spatial depth can thus
eNB Born
0.8 be used as a measure of “extremeness” of a new data point
Norm. Aggregated Traffic

with respect to a set. In [31], the spatial depth of Eq. (6) is


0.6
kernelized, which means that distances are evaluated using a
0.4
positive definite kernel map. A common choice, that we also
use in this paper, is the generalized Gaussian kernel κ(x, y),
0.2
κ(x, y) = exp(−(x − y)T Σ−1 (x − y)), (7)

0 0 0 0 0
0 0 0 0 which provides a measure of similarity between x and y.
00:0 03:0 06:0
12:0 09:0
15:0 18:0 21:0 00:0
Time [h] Noting that the square norm can be expressed as
Fig. 12: Daily Aggregated Traffic for the four eNodeBs. 2
kx − yk = xT x + xT x − 2xT y, (8)
the classifier from the labeled dataset, as the new signal bears
statistical similarities with those learned in the training phase. kernelizing the sample spatial depth amounts to expanding
In this case, it makes sense to accept the session and classify (6) and replacing the inner products with the kernel function
it as belonging to class k ? . Otherwise, the session would be κ. This returns the sample KSD function (Eq. (4) in [31]).
classified as OOD.
The problem at hand, boils down to assessing whether Session classification procedure in an unsupervised setting:
the softmax output ym 0
belongs to the statistical distribution the CNN classifier is augmented through the detection of OOD
learned by the CNN or it is an outlier. This amounts to sessions, as follows:
performing outlier • Initialization: for each class k = 1, . . . , K in the ser-
P detection in a multivariate setting, with
ym0
∈ [0, 1]K , y 0
= 1. Among the many algorithms vice/app identification task a number of softmax output
k mk
that can be used to this purpose, we adopt the method based vectors is computed by the trained CNN using the
on Kernelized Spatial Depth (KSD) functions of [31] as it is sessions in the training set. These softmax vectors are
lightweight and does not require the direct estimation of the stored in the set Yk . Note that, being the results of a
probability density function (pdf) of the softmax output layer, supervised training of the CNN, we know that the vectors
which is a critical point, as good estimates require training in Yk are all generated by a distribution that is correctly
over many points. Briefly, for a vector y ∈ RK , we define the learned during the supervised training phase.
spatial sign function as S(y) = y/kyk if y 6= 0 and S(y) = 0 • Feature extraction through the pre-trained CNN: at run-

if y = 0, where kyk = (y T y)1/2 is the norm-2. If Yk is a time, as a new DCI vector x0m is measured, it is inputted
training set containing ` softmax output vectors for a certain into the pre-trained CNN, obtaining the corresponding
(k) (k) (k) 0
class k, Yk = {y1 , y2 , . . . , y` }, the sample spatial depth softmax output ym .
0
associated with a new softmax output vector ym 0
is: • Classification and OOD detection: vector ym is used with
Eq. (5) to assess the most probable class k ? . At this point,
0
Algorithm 1 of [31] is utilized to assess whether ym is

0 1 X
0

D(ym , Yk ) = 1 − 0 |−1
S(y − y m ) . (6) an outlier. In case the vector is classified as an outlier, it
|Yk ∪ ym
y∈Yk is assigned to the OOD class, otherwise it is assigned to
Note that D(ym0
, Yk ) ∈ [0, 1] provides a measure of centrality class k ? .
0
of the new point ym with respect to the points in the training Some final remarks are in order. The outlier detection
0
setPYk . In particular, if D(ym , Yk ) = 1, it follows that algorithm uses a threshold t ∈ [0, 1], which allows exploring
0
k y∈Yk S(y − ym )k = 0 and the new point is said to be the tradeoff between false alarm rate and detection rate.
the spatial median of set Yk , i.e., it can be thought of as the Instead, the covariance matrix Σ controls the decision
“center of mass” of this set. Hence, the spatial depth attains boundary for rejecting vectors, driving the tradeoff between
12

traffic decomposition into the considered service classes is


1.0 shown for the four selected eNodeBs using t? = 0.48. The
percentage of sessions identified as OOD, for which the
0.8 t* classifier is uncertain, is also reported at the top of each bar.
Common characteristics are observed in all the considered
0.6 deployments:
F-score

• the most used service is video-streaming, with typical


0.4 shares ranging from 50% to 80%. This confirms the
measurements in [1] and [2].
0.2 • The least used service is video-call, whose share is
CNN typically between 5% and 10%, whereas audio-streaming
0.0 CNN + OOD takes 21% of the total traffic load.
• OOD sessions are consistently well below 8%. Note that
0.0 0.2 0.4 0.6 0.8 1.0 this share accounts for all those apps that are not tracked
t by our classifier, such as texting, web browsing, and file
Fig. 13: Finding threshold t? using the CNN with (solid line) and without transfers.
(dashed line) the OOD detection mechanism.
Through the proposed service identification approach, we
the local and global behavior of KSD. If properly chosen, the can accurately characterize, at runtime, the used services.
contours of KSD should closely follow those of the (actual) Moreover, the traffic decomposition at service level allows
underlying statistical distribution. Σ is learned, for each class one to make some interesting considerations on the land
k, from the training vectors in Yk , and for the following use. For example, in a typical residential area (PobleSec) the
results we picked Σ = Σ2 in [31]. audio-streaming service is the one used the least across the
four monitored sites, with an average of 16.4%. Instead, in
Tuning the OOD threshold t: we define Sk as the set contain- a typical office and university neighborhood (Castelldefels),
ing the training examples belonging to class k. We recall that audio-streaming has the highest traffic share across all sites
Sk is used to compute the covariance matrix associated with (22% on average). Born and CampNou, which are two leisure
the adopted Gaussian kernel, which models the contours of districts, present a similar traffic distribution across the day.
the pdf of the output softmax vectors. The threshold t ∈ [0, 1] We finally remark that, while the traffic profiling results are
is instead used by the outlier detection algorithm to gauge shown using a time granularity of one hour, our classifica-
the (kernelized) distance between the center of mass of set tion tool allows for traffic decomposition at much shorter
Sk and a new softmax vector, acquired at runtime. If t = 1, timescales, i.e., on a per-session basis.
the kernelized spatial depth of the new point will always be
smaller than or equal to t and all points will be rejected VI. R ELATED W ORK
(marked as outliers). This is of course of no use. However, The most common classification methods in the literature
as we decrease t towards 0, we see that more and more points leverage the transport layer protocol, including UDP/TCP port
will be accepted, until, for t = 0, no rejections will occur. So, analysis and/or packet inspection, since most Internet appli-
t determines the selectivity of the outlier detection mechanism, cations use UDP/TCP port numbers. For instance, the authors
the higher t, the more selective the algorithm is. For our of [32] define a mobile traffic classifier as a collection of rules,
numerical evaluation, once the sets Sk are obtained for all including destination IP addresses and port numbers. Based
classes k, we set this threshold by picking the highest value on these rules, application-level mobile traffic identification
of it, t? , for which all the softmax vectors belonging to the is performed deploying a dedicated classification architecture
test set are accepted, i.e., none of them is marked as an outlier within the network, and measurement agents at the mobile
(OOD). In other words, this is equivalent to making sure that devices. However, port-based schemes hardly work in the
the F -Score obtained over the test set from our trained CNN presence of applications using dynamic port numbers [33].
without the OOD mechanism enabled equals that of the CNN A scheme based on deep packet inspection is presented
classifier augmented with the OOD detection capability. As t? in [34]. The authors of this article devise a technique for Code
is the highest value of t for which all the data in the test set are Division Multiple Access (CDMA) traffic classification, using
correctly classified as valid, our approach amounts to tuning correlation-based feature selection along with a decision tree
the threshold in such a way that the outlier detection algorithm classifier trained on a labeled dataset (which is labeled via deep
will be as selective as possible, while correctly treating all the packet inspection). The algorithm in [35] extracts application
data in the test set. In Fig. 13, we show the F -Score as a layer payload patterns, and performs maximum entropy-based
function of t for the CNN algorithm with and without OOD IP-traffic classification exploiting different Machine Learn-
detection. Threshold t? = 0.48 corresponds to the highest ing (ML) algorithms such as Naive Bayes, Support Vector
value of t for which the F -Score remains at its maximum, Machines (SVMs) and partial decision trees. Remarkably,
i.e., at the end of the flat region. payload-based methods are limited by a significant complexity
and computation load [33]. Furthermore, many mobile appli-
Experimental analysis of eNodeB traffic: in Fig. 14, the cations adopt encrypted data transmission due to security and
13

450 Camp Nou 200 Castelldefels


video-streaming video-calls 5%
video-streaming video-calls
400 audio-streaming OOD 7% 5% 6% 175 audio-streaming OOD
5% 6% 7%
350 11% 12%
6% 11% 150 8%
6% 4% 4% 5%
300 4% 7% 4%
6%
14% 125 6%
8%
Session Count

Session Count
17% 28%
250 18% 17% 18% 16% 8%
6% 5% 100 17%
5% 6% 7% 24%
200 23% 25% 19% 18%
23% 75 26%
8% 26% 76% 78% 77%
150 5% 5%
74% 25% 44% 4%
100 40% 74% 72% 75% 76% 50 71% 58%
65% 69% 67%
52% 60% 66%
50 61% 25 56%
62%
51% 67% 43%
58% 48%
0 0 5 10 15 20 0 0 5 10 15 20
Time [h] Time [h]
PobleSec 400 Born
400 video-streaming video-calls video-streaming video-calls
audio-streaming OOD 4% audio-streaming OOD
350 4% 6% 4% 350
5% 5%
12% 7%
5% 9% 6% 5% 14% 4% 14% 300 6% 5% 6% 5% 6%
300 9% 7% 8% 5% 6% 6% 7%
10% 17% 8% 5% 6% 10% 10% 5%
4% 10% 10%
18% 16% 19% 250 11% 10%
250 19% 12%
Session Count
Session Count

5% 4%
19% 17% 11%
14% 10%
200 200
6% 19% 15%
31% 6%
150 150
78% 19% 78% 78% 79% 7%
68% 7% 24% 77% 77% 78% 77% 78% 77%
100 57% 66% 68% 75% 72% 100 31% 70%
62%
60% 70% 34%
16%
50 45% 50 65%
20% 53% 55%
0 0 5 10 15 20 0 0 5 10 15 20
Time [h] Time [h]
Fig. 14: Traffic decomposition at service level for the four monitored eNodeBs during the 24 hours of a day.

privacy concerns, which renders packet inspection approaches classification tasks, present some deep learning techniques,
ineffective. show how they can be used for traffic classification and
Recent works consider NNs [6], [5], [36]. The authors discuss open problems. The survey in [40] presents a deep
of [36] exploit the ability of deep NNs to perform classification learning-based framework for mobile encrypted traffic classifi-
of Android applications using system API-call sequences and cation, reviewing existing work according to dataset selection,
investigate the effectiveness of NNs to learn complex features model input design, and model architecture, and highlighting
that can help in the malware detection task. In [6], mobile open issues and challenges. Finally, a comprehensive review
apps are identified by automatically extracting features from of the interplay between deep learning and mobile networking
labeled packets through CNNs, which are trained using raw research is provided in [41], where the authors discuss how
HTTP requests. In [4], encrypted traffic is classified using to tailor deep learning to mobile environments. Current chal-
deep learning architectures (feed forward, convolutional and lenges and open future research directions are also discussed.
recurrent neural networks) for Android and iOS mobile apps, We stress that most of the works in the literature, with the
with and without exploiting TCP/UDP ports. The authors exception of [4]–[6] and [37], classify mobile traffic based on
of [5] combine Zipper Networks (ZipNet) and Generative- manual feature extraction and all the papers that we surveyed
Adversarial Networks (GAN) to infer narrowly localized and process network or application level data. Our work departs
fine grained traffic generation from coarse measurements. from previous research, as we classify mobile data gathered
A systematic framework is devised in [37] for the com- from the physical control channel at the network edge, at
parison among different deep learning classifiers. Their per- runtime, and without access to application data and TCP/UDP
formance is thoroughly investigated based on three mobile port numbers.
datasets of real human users activity, highlighting their draw- Although seemingly orthogonal to the traffic classification
backs, discussing design guidelines and challenges. In [38], the problem, which is the main focus of the present work, we be-
same authors propose a multi-classification approach, where lieve it is appropriate to comment on the security implications
they combine the outputs of different classifiers in a modular of the developed technology. The central point here is that
way to improve the overall performance. by just reading unencrypted control traffic, it is possible to
Several survey papers dealing with deep learning techniques perform network inference, achieving good results. One could
applied to traffic classification can be found in [39], [40] for example understand the type of service and app that is
and [41]. The authors of [39] provide general guidelines for being run by a certain user equipment, as we do in this paper,
14

which may be by itself a privacy breach. The authors of [42] data traffic forecast update, 2016–2021 white paper. [Online].
go further and, using DCI information, show that it is possible Available: www.cisco.com/c/en/us/solutions/collateral/service-provider/
visual-networking-index-vni/mobile-white-paper-c11-520862.html
to even identify and track smartphones, although we do not [3] S. Chen and J. Zhao, “The requirements, challenges, and technologies
know the identity of the owner, a sort of signature of the way for 5g of terrestrial mobile telecommunication,” IEEE communications
in which their smartphone interacts with the network over the magazine, vol. 52, no. 5, pp. 36–43, 2014.
[4] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile encrypted
wireless air interface can be computed, and such signature is traffic classification using deep learning,” in Network Traffic Measure-
enough to allow for its identification, with good accuracy. This ment and Analysis Conference (TMA). Vienna, Austria: IEEE, June
is a privacy violation. The interested reader is invited to check 2018.
[5] C. Zhang, X. Ouyang, and P. Patras, “Zipnet-gan: Inferring fine-grained
the discussion in [42] and the references therein. mobile traffic patterns via a generative adversarial neural network,”
in International Conference on emerging Networking EXperiments and
VII. C ONCLUSIONS Technologies (CoNEXT). Seoul-Incheon, South Korea: ACM, 2017.
In this paper, we have presented a framework that allows [6] Z. Chen, B. Yu, Y. Zhang, J. Zhang, and J. Xu, “Automatic mobile
application traffic identification by convolutional neural networks,” in
highly accurate classification of application and services from IEEE Trustcom/BigDataSE/ISPA. Tianjin, China: IEEE, August 2016.
radio-link level data, at runtime, and without having to decrypt [7] N. Bui and J. Widmer, “Owl: A reliable online watcher for lte control
dedicated physical layer channels. To this end, we decoded the channel measurements,” in Workshop on All Things Cellular: Oper-
ations, Applications and Challenges. New York, NY, USA: ACM,
LTE Physical Downlink Control Channel (PDCCH), where October 2016.
Downlink Control Information (DCI) messages are sent in [8] EU EARTH: Energy Aware Radio and neTwork tecHnologies, “D2.3:
clear text. Through DCI data, it is possible to track the data Energy efficiency analysis of the reference systems, areas of improve-
ments and target breakdown,” Deliverable D2.3, www.ict-earth.eu, 2010.
flows exchanged between the serving cell and its active users, [9] F. Xu, Y. Li, H. Wang, P. Zhang, and D. Jin, “Understanding Mobile
extracting features that allow the reliable identification of the Traffic Patterns of Large Scale Cellular Towers in Urban Environment,”
apps/services that are being executed at the mobile terminals. IEEE/ACM Transactions on Networking, vol. 25, no. 2, 2017.
[10] J. K. Laurila, D. Gatica-Perez, J. B. I. Aad, O. Bornet, T.-M.-T. Do,
For the classification of such traffic, we have tailored deep O. Dousse, J. Eberle, and M. Miettinen, “The mobile data challenge:
artificial Neural Networks NNs, namely, Multi-Layer Percep- Big data for mobile computing research,” in Mobile Data Challenge
tron (MLP), Recurrent NNs and Convolutional NNs, compar- Workshop (MDC), in conjunction with “Pervasive 2012”, Newcastle,
UK, 2012.
ing their performance against that of benchmark classifiers [11] G. Barlacchi, M. D. Nadai, R. Larcher, A. Casella, C. Chitic, G. Torrisi,
based on state-of-the-art supervised learning algorithms. Our F. Antonelli, A. Vespignani, A. Pentland, and B. Lepri, “A multi-source
numerical results show that NN architectures overcome the dataset of urban life in the city of Milan and the Province of Trentino,”
Scientific Data, vol. 2, no. 150055, pp. 1–15, 2015.
other approaches in terms of classification accuracy, with the [12] TSG RAN; NR; Overall description; Stage 2, 3GPP TS 38.300, Release
best accuracy (as high as 98%) being achieved by CNNs. 15, v16.0.0, Jan. 2020.
As a major contribution of this work, labeled and unlabeled [13] TSG RAN; NR; Physical channels and modulation, 3GPP TS 38.211,
Release 16, v16.0.0, Dec. 2019.
datasets of DCI data from real radio cell deployments have [14] I. Gomez-Miguelez, A. Garcia-Saavedra, P. D. Sutton, P. Serrano,
been collected. The labeled dataset has been used to train and C. Cano, and D. J. Leith, “srsLTE: an open-source platform for LTE
compare the classifiers. For the unlabeled dataset, we have evolution and experimentation,” in ACM International Workshop on
Wireless Network Testbeds, Experimental Evaluation, and Characteri-
augmented the CNN with the capability of detecting input DCI zation (WiNTECH). New York, NY, USA: ACM, October 2016.
data that do not conform to that learned during the training [15] “E-UTRA; physical layer procedures,” 3GPP TS, vol. 36.213, 2016.
phase: the corresponding patterns are detected and associated [16] C. M. Bishop, Pattern recognition and machine learning. Springer,
2006.
with an unknown class. This increases the robustness of the [17] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
CNN classifier, allowing its use, at runtime, to perform fine improve neural network acoustic models,” in International Conference
grained traffic analysis from radio cell sites from an operative on Machine Learning (ICML), Atlanta, USA, June 2013.
[18] T. Tieleman and G. Hinton, “Divide the gradient by a running average
mobile network. To summarize, the main outcomes of our of its recent magnitude,” Neural networks for machine learning, vol. 4,
work are: 1) a methodology to extract DCI data from the no. 2, pp. 26–31, 2012.
PDCCH channel, and for the use of such data to train traffic [19] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
classifiers, 2) the fine tuning and a thorough performance [20] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Con-
comparison of classification algorithms, 3) the design of a tinual prediction with LSTM,” in International Conference on Artificial
novel technique for the fine grained and online traffic analysis Neural Networks (ICANN). Edinburgh, UK: IEEE, 1999.
[21] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning.
of communication sessions from real radio cell sites, and 4) The MIT Press, 2016.
the discussion of the traffic distribution resulting from such [22] M. Gadaleta and M. Rossi, “IDNet: Smartphone-based gait recognition
analysis from four selected sites of a Spanish mobile operator, with convolutional neural networks,” Pattern Recognition, vol. 74, pp.
25–37, 2018.
in the city of Barcelona. As a future research direction, we [23] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin,
foresee the adoption of semi-supervised learning methods, to “Liblinear: A library for large linear classification,” Journal of machine
reduce the number of labeled sessions that are needed for learning research, vol. 9, no. Aug, pp. 1871–1874, 2008.
[24] N. S. Altman, “An introduction to kernel and nearest-neighbor non-
traffic classification and, at the same time, to automatically parametric regression,” The American Statistician, vol. 46, no. 3, pp.
detect and capture emerging behaviors that were originally 175–185, 1992.
not present in the labeled training set. [25] Y. Wu and Y. Liu, “Robust truncated hinge loss support vector ma-
chines,” Journal of the American Statistical Association, vol. 102, no.
R EFERENCES 479, pp. 974–983, 2007.
[26] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
[1] Ericsson. (2018) Ericsson mobility report june 2018. [Online]. Available: 5–32, 2001.
https://www.ericsson.com/en/mobility-report/reports/june-2018 [27] C. E. Rasmussen, “Gaussian processes for machine learning,” in Ad-
[2] Cisco. (2017) Cisco visual networking index: Global mobile vanced lectures on machine learning. Springer, 2004, pp. 63–71.
15

[28] A. Furno, M. Fiore, R. Stanica, C. Ziemlicki, and Z. Smoreda, “A tale


of ten cities: Characterizing signatures of mobile traffic in urban areas,”
IEEE Transactions on Mobile Computing, vol. 16, no. 10, pp. 2682–
2696, 2017.
[29] D. Hendrycks, M. Mazeika, and T. G. Dietterich, “Deep anomaly de-
tection with outlier exposure,” in International Conference on Learning
Representations (ICLR), Vancouver, BC, Canada, April 2018.
[30] S. Sigurdsson, J. Larsen, L. K. Hansen, P. A. Philipsen, and H.-
C. Wulf, “Outlier estimation and detection application to skin lesion
classification,” in IEEE International Conference on Acoustics Speech
and Signal Processing (ICASSP). Orlando, FL, USA: IEEE, May 2002.
[31] Y. Chen, X. Dang, H. Peng, and H. L. Bart Jr., “Outlier Detection with
the Kernelized Spatial Depth Function,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 31, no. 2, pp. 288–305, 2009.
[32] Y. Choi, J. Y. Chung, B. Park, and J. W.-K. Hong, “Automated classifier
generation for application-level mobile traffic identification,” in IEEE
Network Operations and Management Symposium (NOMS). Hawaii,
USA: IEEE, April 2012.
[33] Y. Fu, H. Xiong, X. Lu, J. Yang, and C. Chen, “Service usage
classification with encrypted internet traffic in mobile messaging apps,”
IEEE Transactions on Mobile Computing, vol. 15, no. 11, pp. 2851–
2864, 2016.
[34] J. Yang, Z. Ma, C. Dong, and G. Cheng, “An empirical investigation
into CDMA network traffic classification based on feature selection,” in
International Symposium on Wireless Personal Multimedia Communica-
tions (WPMC). Taipei, Taiwan: IEEE, December 2012.
[35] X. Han, Y. Zhou, L. Huang, L. Han, J. Hu, and J. Shi, “Maximum en-
tropy based IP-traffic classification in mobile communication networks,”
in IEEE Wireless Communications and Networking Conference (WCNC).
Shanghai, China: IEEE, April 2012.
[36] R. Nix and J. Zhang, “Classification of android apps and malware using
deep neural networks,” in 2017 International joint conference on neural
networks (IJCNN). IEEE, 2017, pp. 1871–1878.
[37] G. Aceto, D. Ciuonzo, A. Montieri, and A. Pescapé, “Mobile en-
crypted traffic classification using deep learning: Experimental evalu-
ation, lessons learned, and challenges,” IEEE Transactions on Network
and Service Management, vol. 16, no. 2, pp. 445–458, 2019.
[38] ——, “Multi-classification approaches for classifying mobile app traf-
fic,” Journal of Network and Computer Applications, vol. 103, pp. 131–
145, 2018.
[39] S. Rezaei and X. Liu, “Deep learning for encrypted traffic classification:
An overview,” IEEE communications magazine, vol. 57, no. 5, pp. 76–
81, 2019.
[40] P. Wang, X. Chen, F. Ye, and Z. Sun, “A survey of techniques for
mobile service encrypted traffic classification using deep learning,” IEEE
Access, vol. 7, pp. 54 024–54 033, 2019.
[41] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and
wireless networking: A survey,” IEEE Communications Surveys &
Tutorials, vol. 21, no. 3, pp. 2224–2287, 2019.
[42] F. Meneghello, M. Rossi, and N. Bui, “Smartphone Identification via
Passive Traffic Fingerprinting: a Sequence-to-Sequence Learning Ap-
proach,” IEEE Network, vol. 34, no. 2, pp. 112–120, 2020.

You might also like