34 Link - Congestion - Prediction - Using - Machine - Learning - For - Software-Defined-Network - Data - Plane

Uploaded by

Farhana Zainol Abidin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

34 Link - Congestion - Prediction - Using - Machine - Learning - For - Software-Defined-Network - Data - Plane

Uploaded by

Farhana Zainol Abidin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Link Congestion Prediction using Machine Learning

for Software-Defined-Network Data Plane

Junying Wu, Yunfeng Peng, Meng Song, Manman Cui, Liang Zhang
School of Computer and Communication Engineering
University of Science and Technology Beijing
Beijing, China
[email protected]

Abstract—This paper presented a machine learning method the buffer is used to perform congestion prediction, and
to reduce link congestion for Software-Defined Network (SDN). congestion can be avoided by adjusting the size of the
The updated status of switches and links are collected using the congestion window.
SDN southbound interface to train four popular machine
learning algorithms to predict network congestion degree. Up to now, most of the existing congestion control
Simulations show that the one-dimensional convolutional methods are TCP-based. In fact, UDP, which is another
neural network algorithm outperforms the other three. important protocol [7] in the transport layer, also has certain
advantages, e.g., better real-time performance and higher
Keywords—Software-Defined Network, Congestion Prediction, work efficiency than TCP, which will be suitable for high-
Machine Learning speed transmission and real-time communication. It has been
widely used in real-time games, IP telephones, real-time
I. INTRODUCTION video conferencing, QQ voice and QQ video. However, due
Nowadays, with the unprecedented prosperity of 4G/5G to it is a non-connection protocol, UDP can not avoid
mobile Internet services, the network capacity still can not network congestion. If the traffic can be predicted, the UDP
afford the demand for network resources (link bandwidth, will work more efficiently.
node storage space), therefore, congestion will occur.
In this paper, we propose a machine learning method to
Network congestion will increase packet transmission delay
predict link congestion for UDP services. We select five
and packet loss rate, and will seriously affect the quality of
typical parameters as eigenvalues to reflect link status,
service experience (QoE).
communication status, and switch working status. We find
The Software-Defined Network (SDN) is a new network that these eigenvalues can help to achieve better prediction
architecture, in which, the control plane is separated from the on link congestion in SDN. In detail, based on the open-
data plane [1]. The independent control plane has the ability source SDN controller, OpenDayLight (ODL), we have
of centralized programmability, which provides new ways to implemented the southbound interface to collect required
solve network congestion. network status, including the global network topology, link
data, and switch data, in real time. Five typical parameters,
Shadi Attarha et al. [2] proposed an algorithm to avoid including the receiving rate (in bps) of switches, the sending
link congestion in SDN networks, in which the controller rate of switches, switch load, link bandwidth, and the end-to-
collects the network status and periodically checks switch end traffic rate between source and destination hosts, are
utilization. When new traffic arrives, it judges whether the selected as five eigenvalues to represent the network status.
link is congested or not. If congestion occurs, the average Based on these eigenvalues, we construct a mathematical
transmission rate of the port is reduced and a backup path is model to quantify the link congestion. We have designed
found. However, this method is a postactive one lacking of four machine learning algorithms, including Support Vector
predicting congestion and does not work until the moment Machine (SVM), Multi-Layer Perceptron (MLP), One-
when traffic arrives. Even if the subsequent routing dimensional Convolutional Neural Network (1DCNN) and
scheduling strategy is adopted, packet loss is still inevitable. K-Nearest Neighbor (KNN), to predict the network
Dong-Hai Z et al. [3] proposed a predictive congestion congestion. Experiments show that the 1DCNN has the best
control algorithm using fuzzy neural network to predict the performance.
queue length in switch buffer. Before congestion occurs, the
transmission rate is dynamically adjusted by combining The remainder of this paper is organized as follows.
incremental and decreasing parameters. Ren H. et al. [4] take Section II introduces the scheme of congestion prediction. In
the bandwidth utilization of different links into consideration this section, the system model and the congestion metric
and use a genetic algorithm to predict link packet loss rate, model are introduced respectively. Section III is the design
delay and bandwidth utilization. of machine learning algorithm. In Section IV, we will
introduce our experiment in detail, and determine the
Considering the difference between the TCP and UDP,
algorithm with the best prediction performance. The section
Abdelmoniem A M et al. [5] design a new method to select
V is the conclusion.
congestion nodes to start the congestion control strategy,
which can alleviate congestion and ensure the utilization of
links to a great extent. The reinforcement learning method is
also introduced into the congestion prediction of SDN [6].
Through deep reinforcement learning, the data information in

978-1-7281-1374-6 / 19 / $ 31.00©2019 IEEE

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on October 31,2023 at 11:21:03 UTC from IEEE Xplore. Restrictions apply.
II. CONGESTION PREDICTION SCHEME TABLE I. STANDARD DEFINITION OF NETWORK CONGESTION

Standard Definition of Network Average utilization of link

A. System Model Link Congestion Level bandwidth within 3 days of
As illustrated in Fig 1, the SDN network system model network busy hours
consists of data layer, control layer and application layer. Severe congestion Over 90%
The controller manages the switch through the southbound
interface, and the application layer communicates with the General congestion Over 80%
controller through the northbound interface to complete the High load Over 70%
personalized application of SDN network. In this paper, the Normal load Over 60%
congestion prediction system is mainly composed of three
modules. Normally, high link load [9] has a great impact on the
performance and stability of the network. Combined with the
standard definition of operators and the selection of
congestion degree by researchers, we choose 70% as the
threshold of U to decide the link congestion. When U
exceeds 70%, corresponding congestion control strategies
and traffic scheduling strategies should be taken up.
In this paper, we focus on link congestion prediction for
UDP. Without congestion control mechanisms to regulate
traffic, the throughput [10] of UDP will be impacted by the
sending rate, transmission bandwidth, capability of source
and terminal host. Therefore, we select five typical
parameters, including the receiving rate (in bps) of switches,
the sending rate of switches, switch load, link bandwidth,
and the end-to-end traffic rate, as the eigenvalues to reflect
status of switches and links.
Fig 1. The system structure
III. MACHINE LEARNING ALGORITHMS DESIGN
1) Topology Management Module: This module is used
to collect network topology. In the application layer, the link A. Data Set
topology information is got by calling REST-API [8] to the With the Mininet simulators [11], we have established a
controller. network topology consisting of 13 switches and 12 hosts,
2) Data Acquisition Module: Data acquisition module is simulating the real network communication environment,
used to check current network resources utilization by and selecting multiple pairs of source-destination hosts to
collecting the flow table information, including the number transmit data simultaneously. Each experiment lasts 12
of bytes forwarded by the switch, the number of bytes hours, and we get network data every second in real time.
received and the bandwidth of the link, from OpenFlow The total amount of data is 559929. The specific
switch. We use bandwidth utilization as a symbol to judge experimental parameter settings will be introduced in
whether the link is congested or not. Section V.
3) Link State Prediction Module: According to the B. Data Preprocessing
network status from the data acquisition module, we select We use link bandwidth utilization as a criterion for
five typical parameters as the eigenvalues inputting to judging the link congestion. If it is greater than 0.7, the link
machine learning algorithms to evaluate congestion. is thought as congestion, and the data is marked as 1,
B. Congestion Metrics otherwise it is 0. Therefore, the problem studied in this
paper belongs to the two-classification problem with 5
In real networks, the distribution of traffic load is usually
eigenvalues. First, we normalize the data and scale the data
uneven. Therefore, we need to set a proper threshold to
to the [0,1] interval. We use linear function as the
determine link congestion. In this paper, we choose link
bandwidth utilization as the metric for congestion, which is normalization method, shown in formula (2) to get fast
expressed in following formula (1), convergence during training:
X  X min
(N  N sent ) X norm  
 U  received   X max  X min
T B
Where X is the input data, Xmax, Xmin are the maximum
where U is the link bandwidth utilization, N received is the and minimum values respectively, and Xnorm is the
normalized result of the output. We use the down-sampling
number of bits received by the switch, N sent is the number method to make different types of data keep in the same
of bits sent by the switch, T is the time of communication, B order of magnitude.
is the bandwidth of the link. For the classification problem, the widely used algorithms
are random sub-sampling, hold-out and k-fold cross
We refer to the operators’ standard definition of network
congestion, listed in TABLE I. validation. The random sub-sampling method randomly
divides the data set into two parts, one is the training set and

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on October 31,2023 at 11:21:03 UTC from IEEE Xplore. Restrictions apply.
the other is the test set. The rule of hold-out is based on the linear relationship between input and output data without
random sub-sampling method, dividing a portion of the data knowing the parameters beforehand. The structure of the
as a verification set. Apart from, the K-flod cross validation, MLP includes the input layer, the hidden layer, and the
that is, the data set is divided into K parts, in turns as the output layer. The input data is a five-feature vector, so we
training set and the test set, then the averaging method is choose 2 as the number of hidden layers, which avoids the
used to get the classification result. situation that the model is too complicated to converge. We
In order to make full use of the acquired data for compare the classification results when the number of
classification, the ten-fold cross validation method, as hidden layer units is set to [30, 40, 50, 60, 70].
shown in Fig 2, is used in this paper, to randomly divide all 3) KNN: The algorithm [14] classifies the data
the data into ten parts. We select nine parts for training, and according to the distance between the training data and the
the rest for testing, then repeat the above process to get the
test data. The Euclidean distance is commonly used, as
result. For example, we use D1 for test and other nine parts
shown in (7).
for training, then choose D2 for test, others for
training…until each part is used separately as a test set.
n
Data Set  d ( x, y )  ( ( xk  yk ) 2 )1/ 2  
k 1

D2 D3 D4 D5 D6 D7 D8 D9 D10 D1 We choose Euclidean distance to calculate the distance

between data. The selections of the parameter k and weight
D1 D3 D4 D5 D6 D7 D8 D9 D10 D2
are the following values: k=[5,15] (k is an integer).
.
.
.
.
.
.
.
.
Weights=[‘uniform’, ‘distance’]. We compare the
.
.
.
. classification results by selecting different parameter values.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 4) CNN: CNN has good performance for data feature
Training set Test set
extraction and classification [15]. This method is often used
to classify two-dimensional data, but our data is one-
Fig 2. The schematic diagram of ten-fold cross validation dimensional data, so we use a one-dimensional
C. Model Design convolutional neural network structure (1DCNN).
Firstly, the parameters are initialized, the output of each
We select four kinds of algorithms (SVM, MLP, KNN,
layer is calculated by the backward algorithm, and the loss
1DCNN) and design each model separately. The overall
function is calculated according to the forward algorithm,
execution flow of the scheme is as shown in Fig 3.
and the parameters are optimized step by the gradient
SVM
descent method. In this paper, the weight and offset are
updated by the mini-batch gradient descent method. We
MLP
Training Data Set Test Data set Precision Index
select 200 training samples for each batch and calculate the
KNN
gradient update parameters. The learning rate is set to 0.05
for model training. Since the data in this paper is one-
1DCNN
dimensional data, to avoid overfitting, we choose 40 as the
number of iterations.
The input of the 1DCNN network model is one-
Fig 3. The overall execution process
dimensional data, the size is 1X5, and the range of the
1) SVM: we use grid search strategy [12] to search number of convolution layers is set to 2-7. we set the size of
parameters. The search parameters include C and kernel. By each core to 1X3, The set of the number of convolution
comparing the different value of C between 5 and 15, the kernels per layer is [20,24,28,32,36]. We add a pooling layer
best value is determined. In the selection of kernel functions, between the two convolution layers and set the down-
the commonly used kernel functions are linear, polynomial, sampling factor to 2. According to the number of output
radial basis function and sigmoid, which is shown in (3) (4) categories and the input data size, the number of neurons in
(5) (6). Here, γ, r, and d are kernel parameters. the fully connected layer is set to 2, then the final model
structure is obtained.
 K ( X i , X j )  X iT X j   IV. SIMULATION EXPERIMENTS
We use Mininet and OpenDayLight (ODL) to set up the
 K ( X i , X j )  ( X iT X j  r ) d   
 simulation. Mininet is used to simulate data layer, ODL is
used as the controller, Python is the development language of
upper application, and Iperf tools simulate traffic. We choose
(   || X i  X j ||2 )
 K(Xi, X j )  e     the mode of UDP to transmit data. In this experiment, a
network composed of 13 OpenFlow switches and 12 hosts is
built, in which the number of links is 35. In the division of
 K ( X i , X j )  tanh( X iT X j  r )   links, according to the path of transmission between hosts,
we set two different bandwidth forms to represent easy-
congested links and general links. The experiment simulates
2) MLP: MLP is a non-parametric flexible prediction
the state of real network communication. We select three
method [13], which can model the complex potential non- pairs of hosts to communicate with each other at the same

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on October 31,2023 at 11:21:03 UTC from IEEE Xplore. Restrictions apply.
time. They are h1-h11, h3-h10, and h6-h9, respectively. The
network topology is shown in Fig 4.
H12 H10
H4
H9

S13 S12 S6 S5

S11 S10 S7 S4 S1

S9 H2

S2
H3 S8 S3

H5
H8

H7 H6
Fig 5. The SVM confusion matrix
Fig 4. The network topology

A. Parameter Setting
The transmission rate is set to eight types, they are
50kbit/s, 100kbit/s, 500kbit/s, 1Mbit/s, 5Mbit/s, 10Mbit/s,
15Mbit/s, 20Mbit/s, respectively. The link bandwidth is set
to two types, one is 10Mbit/s and 50Mbit/s, the other is
50Mbit/s and 250Mbit/s. Under each rate and specific
communication bandwidth, the experiment maintains
continuous communication between hosts for 12 hours, and
obtains five types of characteristic data of all links unit time
in the communication process.
B. Comparison of Predicted Results
Through comparison of several sets of experiments, we Fig 6. The MLP confusion matrix
found that the classification performance is the most optimal
when the parameters of each algorithm are set as follows: in
the SVM algorithm, C=8, kernel=rbf. In the MLP algorithm,
the number of hidden layer units is 30. In the KNN algorithm,
k=9, weight = ‘uniform’. The number of convolution layers
in the CNN algorithm is 4, and the number of convolution
kernels is 36.
Some indicators are often used to evaluate and compare
the performance of the classification, including accuracy
(ACC), precision (PPV), sensitivity (SEN), specificity (SPE)
and the Area Under ROC Curve (AUC) [16]. These common
metrics can be represented by true positive (TP), true
negative (TN), false positive (FP) and false negative (FN),
which are shown in (8), (9), (10), (11): 
Fig 7. The KNN confusion matrix
  ACC  (TP  TN ) / (TP  TN  FP  FN )  

 SEN  TP / (TP  FN )  

 SPE  TN / (TN  FP)  

 PPV  TP / (TP  FP)  

Confusion matrix can reflect the values of TP, TN, FP and

FN. The confusion matrix is shown below in Fig 5, Fig 6,
Fig 7, Fig 8, respectively. The metrics of four models are
shown in TABLE II.

Fig 8. The 1DCNN confusion matrix

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on October 31,2023 at 11:21:03 UTC from IEEE Xplore. Restrictions apply.
TABLE II. THE PERFORMANCE OF FOUR MODELS This work was supported by the foundation of Guizhou
Key Laboratory of Electric Power Big Data, Guizhou
model ACC PPV SEN SPE AUC
Institute of Technology (2003008002).
SVM 0.933 0.927 0.918 0.945 0.972
MLP 0.946 0.944 0.931 0.958 0.981 REFERENCES
KNN 0.932 0.928 0.913 0.946 0.971
[1] Kreutz D, Ramos F M V, Verissimo P, et al. Software-defined
1DCNN 0.954 0.951 0.942 0.963 0.983 networking: A comprehensive survey[J]. Proceedings of the IEEE,
2015, 103(1): 14-76.
The ROC curve, which is established by plotting the [2] Attarha S, Hosseiny K H, Mirjalily G, et al. A load balanced
congestion aware routing mechanism for Software Defined
relationship between the true positive rate (TPR) and the Networks[C]//2017 Iranian Conference on Electrical Engineering
false positive rate (FPR), can reflect the difference between (ICEE). IEEE, 2017: 2206-2210.
algorithms. The ROC of the four models is shown in Fig 9. [3] Dong-Hai Z, Li L, Fan J. Congestion control in ATM networks using
The AUC of the 1DCNN model is 98.3%, higher than additive-multiplicative fuzzy neural network[C]//Proceedings of the
Fourth International Conference on Parallel and Distributed
the other models. Also, in TABLE II, we can know the Computing, Applications and Technologies. IEEE, 2003: 306-310.
sensitivity of the other three models is less than 94%, [4] Ren H, Li X, Geng J, et al. A sdn-based dynamic traffic scheduling
however the model of 1DCNN exceeds 94.2%. Furthermore, algorithm[C]//2016 International Conference on Cyber-Enabled
the accuracy, precision and specificity are 95.4%, 95.1%, Distributed Computing and Knowledge Discovery (CyberC). IEEE,
96.3%, respectively. In conclusion, 1DCNN has better 2016: 514-518.
classification performance and can predict network [5] Abdelmoniem A M, Bensaou B. Enforcing Transport-Agnostic
Congestion Control in SDN-Based Data Centers[C]//2017 IEEE 42nd
congestion in the future more accurately and effectively. Conference on Local Computer Networks (LCN). IEEE, 2017: 128-
136.
[6] Xiao K, Mao S, Tugnait J K. TCP-Drinc: Smart Congestion Control
based on Deep Reinforcement Learning[J]. IEEE Access, 2019.
[7] Iwata K, Ito Y. Proposal of Multi-Pathization Method of UDP with
SDN for NFS[C]//2018 International Symposium on Networks,
Computers and Communications (ISNCC). IEEE, 2018: 1-5.
[8] Prajapati A, Sakadasariya A, Patel J. Software defined network:
Future of networking[C]//2018 2nd International Conference on
Inventive Systems and Control (ICISC). IEEE, 2018: 1351-1354.
[9] Taghizadeh S, Bobarshad H, Elbiaze H. CLRPL: context-aware and
load balancing RPL for Iot networks under heavy and highly dynamic
load[J]. IEEE Access, 2018, 6: 23277-23291.
[10] Masirap M, Amaran M H, Yussoff Y M, et al. Evaluation of reliable
UDP-based transport protocols for Internet of Things (IoT)[C]//2016
IEEE Symposium on Computer Applications & Industrial Electronics
(ISCAIE). IEEE, 2016: 200-205.
[11] Barrett R, Facey A, Nxumalo W, et al. Dynamic traffic diversion in
SDN: testbed vs mininet[C]//2017 International Conference on
Fig 9. The ROC curve of four models Computing, Networking and Communications (ICNC). IEEE, 2017:
167-171.
V. CONCLUSION [12] Tian Y, Shi Y, Liu X. Recent advances on support vector machines
research[J]. Technological and Economic Development of Economy,
This paper proposed a link congestion prediction method 2012, 18(1): 5-33.
using machine learning, which can work in the upper layer of [13] Nikravesh A Y, Ajila S A, Lung C H, et al. Mobile network traffic
the controller. The status information of both switches and prediction using MLP, MLPWD, and SVM[C]//2016 IEEE
links are collected to train and test the proposed machine International Congress on Big Data (BigData Congress). IEEE, 2016:
402-409.
learning algorithm to predict the related congestion. We
compared the prediction performance of four machine [14] Cover T M, Hart P E. Nearest neighbor pattern classification[J]. IEEE
transactions on information theory, 1967, 13(1): 21-27.
learning algorithms, 1DCNN has the highest AUC, reaching
[15] Fan X, Xie Y, Ren F, et al. Joint Optical Performance Monitoring and
98.3%, which is more suitable for link congestion prediction Modulation Format/Bit-Rate Identification by CNN-Based Multi-
under our network topology. Task Learning[J]. IEEE Photonics Journal, 2018, 10(5): 1-12.
[16] Powers D M. Evaluation: from precision, recall and F-measure to
ACKNOWLEDGMENT ROC, informedness, markedness and correlation[J]. 2011.

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on October 31,2023 at 11:21:03 UTC from IEEE Xplore. Restrictions apply.