(ARTICLE) Evaluation of Network Traffic Prediction Based On Neural Networks With Multi-Task Learning and Multiresolution Decomposition
(ARTICLE) Evaluation of Network Traffic Prediction Based On Neural Networks With Multi-Task Learning and Multiresolution Decomposition
AbstractNetwork traffic exhibits strong correlations which methods which detect congestion through measurements, only
make it suitable for prediction. Real-time forecasting of network after it has significantly influenced the network operation.
traffic load accurately and in a computationally efficient man- The prediction of network traffic parameters is possible
ner is the key element of proactive network management and
congestion control. This paper compares predictions produced because they present a strong correlation between chronologi-
by different types of neural networks (NN) with forecasts from cally ordered values. Their predictability is mainly determined
statistical time series models (ARMA, ARAR, HW). The novelty by their statistical characteristics. According to [3], network
of our approach is to predict aggregated Ethernet traffic with traffic is characterized by: self-similarity, multiscalarity, long-
NNs employing multiresolution learning (MRL) which is based range dependence (LRD) and a highly nonlinear nature.
on wavelet decomposition. In addition, we introduce a new
NN training paradigm, namely the combination of multi-task Several methods have been proposed in the literature for
learning with MRL. The experimental results show that nonlinear network traffic forecasting. These can be classified into two
prediction based on NNs is better suited for traffic prediction categories: linear prediction and nonlinear prediction. Choos-
purposes than linear forecasting models. Moreover, MRL helps ing a specific forecasting technique is based on a compro-
to exploit the correlation structures at lower resolutions of the mise between the complexity of the solution, characteristics
traffic trace and improves the generalization capability of NNs.
Keywordsmultiresolution learning, multi-task learning, neu-
of the data and the desired prediction accuracy. The most
ral networks, prediction widely used traditional linear prediction methods are: a)
the ARMA/ARIMA model [1], [4], [5], [6], [7] and b) the
I. I NTRODUCTION HoltWinters algorithm [1], etc. The most common nonlinear
forecasting methods involve neural networks (NN) [1], [3],
The main purpose of forecasting is to use historical data in [4], [8]. NNs can be combined with: a) multi-task learning
order to predict the behavior of a system, by modeling it as [9], [10] or b) multiresolution learning [2], [11], [12], [13],
a black-box [1]. Traffic prediction plays an important role in etc. Although some articles state that linear prediction models
guaranteeing Quality of Service (QoS) in IP networks due to are unable to describe the characteristics of network traffic [4],
the diversity of services and because of the increased volume other studies confirm the practical usability of linear predictors
of real-time network applications. Forecasting algorithms can for real-time traffic prediction [7]. Thus, it remains unclear
be embedded into network management systems to improve which predictors provide the best performance, being in the
the global performance of the network and to achieve a same time simple, adaptable and accurate.
balanced utilization of the resources. Traffic prediction can be In this paper we consider the problem of forecasting the
useful for dynamic routing, congestion control and prevention, transfer rate, i.e. given a set of transfer rates observed on a
autonomous traffic engineering, proactive management of the specific link, we try to predict its future values. We chose
network, etc. the prediction of this parameter because this is the basic QoS
Upon the occurrence of congestion in the network, a tra- parameter, i.e. if the demands regarding the transfer rate are
ditional routing protocol cannot react immediately, resulting not met, the other QoS parameters (delay, jitter, packet drops)
in packet loss, additional delay and jitter, as well as services will be affected seriously. In the following, we demonstrated
with severely degraded quality [2]. Prediction can be used by a that the prediction of future network traffic load based on
network device in the self-adaptation process for optimizing its recent observations is possible, with a certain accuracy, in a
own performance. Thus, proactive decision-making is possible computationally efficient manner.
based on the predicted evolution of traffic on certain links, The rest of this paper is organized as follows. Section II
as opposed to reacting to past events. Thanks to the early gives a brief introduction to traditional forecasting techniques.
warning, a prediction-based approach will be faster, in terms In Section III neural network traffic predictors with multi-task
of congestion identification and elimination, than reactive training and multiresolution learning approaches are described.
Section IV lists the performance metrics used for prediction The time series {Yt } of long-memory or moderately long-
accuracy evaluation. The experimental results are presented memory is processed until the transformed series can be
in Section V as a comparative study with various types of declared to be short-memory and stationary:
predictors applied to real-world network traffic traces. Finally,
St = (B)Yt = Yt + 1 Yt1 + . . . + k Ytk . (4)
Section VI concludes the paper and discusses future work.
The autoregressive model fitted to the mean-corrected series
II. T RADITIONAL T IME S ERIES F ORECASTING M ODELS Xt = St S, t = k + 1, n, where S represents the sample
In this section we give a brief introduction to various mean for Sk+1 , . . . , Sn , is given by:
predictors based on traditional statistical techniques, such as (B)Xt = Zt , (5)
ARMA(Autoregressive Moving Average), ARAR (Autoregres- l1 l2 l3
sive Autoregressive) and HW (HoltWinters) algorithm. where (B) = 1 1 B l1 B l2 B l3 B , {Zt }
WN(0, 2 ), while the coefficients j and the variance 2 are
A. ARMA model calculated using the YuleWalker equations described in [14].
From (4) and (5) we obtain the relationship:
The family of ARMA processes is one of the most popular
statistical methods used for modeling and forecasting linear (B)Yt = (1)S + Zt , (6)
time series. ARMA models rely on a linear combination of
where (B)Yt = (B)(B) = 1 + 1B + . . . + k+l3 B k+l3 .
autoregressive (AR) and moving average (MA) components. From the following recursion relation we can determine the
The time series {Xt } is called an ARMA(p, q) process if linear predictors Yn+h for n > k + l3 :
{Xt } is stationary (i.e. its statistical properties do not change
k+l
X3
over time) and :
Pn Yn+h = j Pn Yn+hj + (1)S h1, (7)
Xt 1 Xt1 . . .p Xtp = Zt +1 Zt1 +. . .+q Ztq (1) j=1
3 3.4
4
3.5
2.8 3.2
3.5 2.6 3
3
2.4 2.8
3
2.5 2.2 2.6
2.5
2 2.4
2
2 1.8 2.2
1.6 2
1.5
1.5
1.4 1.8
1 1 1.6
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 2 4 6 8 10 12 14 16 18 20
(a) Original training data (b) Decomposition level 1 (c) Decomposition level 2 (d) Testing data
3.2 4
2.8
3 3.5
2.7
Y Quantiles
2.8 3
2.6
2.6 2.5
2.4 2 2.5
2.2 1.5
2.4
2 1
2.3
1.8 0.5
2.2
1.6 0 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6
0 2 4 6 8 10 12 14 16 18 20 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 X Quantiles
TABLE I
P ERFORMANCE METRICS FOR TRADITIONAL FORECASTING MODELS NMSE are too high and the value of the efficiency coefficient
E is too low.
Method MSE NMSE MAPE r E
TABLE II
ARMA 0.3864 1.2281 19.09% -0.521 -0.293 P ERFORMANCE IMPROVEMENTS BETWEEN TRADITIONAL PREDICTORS
ARAR 0.3068 0.975 20.08% 0.254 -0.0263
1. 2. 3. 4.
HW 0.269 0.855 20.73% 0.8935 0.2979
HWS 0.2112 0.671 19.69% 0.6933 0.3752 1. 0% 25.95% 43.64% 82.95%
2. 2.06% 0% 14.03% 45.27%
3. 30.38% 12.32% 0% 27.37%
the highest E. Fig. 6(a) illustrates the values predicted with 4. 45.34% 31.16% 21.49% 0%
the HWS algorithm, compared to the observed data. Although
this linear predictor has the best accuracy, the predicted data
does not follow the evolution of the target values. Fig. 6(b) B. NN Predictors
shows the histogram of prediction errors. The histogram lets Four types of NN predictors are compared: STL (Single-
us investigate the distribution of errors (i.e. the difference Task Learning), STL with MRL (Multiresolution Learning),
between the actual and the forecasted values): the more narrow MTL (Multi-Task Learning) and MTL with MRL. In addition,
it is, the better the prediction accuracy is. The errors are in in the case of STL we compare the performance of single-step
the range [0.7, 0.2] [0.1, 0.6]. The QQ plot (Quantile versus multi-step prediction.
Quantile) in Fig. 6(c) compares two probability distributions In the experiments, we use NNs with a small topology in or-
as parametric curves, the parameter being the interval for the der to reduce the overall complexity of the predictor. The order
quantile. The figure displays the quantiles of the observed of complexity for training a single epoch is O(nh no (ni +1))
values (on 0x) against the quantiles corresponding to the [4], where nh , ni and no represent the number of hidden,
predicted values (on 0y). If the samples would come from the input and output nodes respectively. To find the appropriate
same distribution, the plot would be linear. But this is not the NN architecture, ni and nh were varied between 3 and 10. We
case, thus we can affirm that the target values are not modeled achieved the best results for a 45no structured feedforward
with sufficient precision. NN with backpropagation algorithm. The notation indicates
In order to quantify the prediction performance improve- a three-layer NN predictor having 4 input nodes, 5 hidden
ment from method a to method b in terms of MSE, the neurons and no output neurons. In the case of single-task
following metric is used, as in [10]: learning we have no = 1, as can be seen in Fig. 7, whereas
M SEb M SEa for multi-task learning we use no = 3, as presented in Fig. 8
a,b = 100% . (21) ( represents a delay element).
M SEb
We denote the prediction models with the following numbers:
x(t)
1. 7 ARMA; 2. 7 ARAR; 3. 7 HW; 4. 7 HWS. Table
II compares the performance improvements of the traditional x(t-1)
predictors (in terms of MSE) using (21) for each combination
x(t+1)
of pairs of predictors. The values in each line of the table x(t-2)
can be interpreted as follows: by how much that certain
predictor improved the prediction performance, compared to x(t-3)
3 3 3
2 2 2 2
(a) STL (b) STL + MRL (c) MTL (d) MTL + MRL
Fig. 9. NN Prediction
4 5 5 5
2 2 2
1.5
1.5 1.5 1.5
1
1 1 1
0.5
0.5 0.5 0.5
0 0 0 0
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0.4 0.2 0 0.2 0.4 0.6 0.8 1
(a) STL (b) STL + MRL (c) MTL (d) MTL + MRL
3.4
3 3
3.5
3.2
2.8 2.8
3
3
Y Quantiles
Y Quantiles
Y Quantiles
Y Quantiles
2.8 2.6 2.6
2.6
2.4 2.4
2.5
2.4
2.2 2.2
2.2
2
2 2
2
(a) STL (b) STL + MRL (c) MTL (d) MTL + MRL
TABLE IV
P ERFORMANCE METRICS FOR NN PREDICTORS ACKNOWLEDGMENT
This work was partly supported by the PRODOC program
Method MSE NMSE MAPE r E Project of Doctoral Studies Development in Advanced Tech-
STL 0.14462 0.45965 11.08% 0.75583 0.51616 nologies (POSDRU/6/1.5/S/5 ID 7676). The authors would
STL + MRL 0.049385 0.15696 6.92% 0.92009 0.83478 like to thank the Broadband Communications Research Group
MTL 0.091859 0.29195 8.69% 0.86099 0.69269 of the Universitat Polit`ecnica de Catalunya for their support.
MTL + MRL 0.050576 0.16074 7.53% 0.91699 0.8308
R EFERENCES
[1] P. Cortez, M. Rio, M. Rocha, P. Sousa, Internet Traffic Forecasting using
We make the following notations: 1. 7 STL; 2. 7 STL Neural Networks, International Joint Conference on Neural Networks,
with MRL; 3. 7 MTL and 4. 7 MTL with MRL. Table V pp. 26352642. Vancouver, Canada, 2006.
[2] Z. Li, R. Wang, J. Bi, A Multipath Routing Algorithm Based on Traffic
compares the performance improvement brought by different Prediction in Wireless Mesh Networks, Fifth International Conference on
NN predictors. Positive values are obtained for STL with Natural Computation, Volume 6, pp. 115119. Tianjin, China, August
MRL. The additional computational complexity of MTL with 2009.
[3] V. B. Dharmadhikari, J. D. Gavade, An NN Approach for MPEG Video
MRL is not justified in terms of performance improvement. Traffic Prediction, 2nd International Conference on Software Technology
Compared to the best linear predictor, namely the HWS and Engineering, pp. V1-57V1-61. San Juan, USA, 2010.
algorithm, the NN predictor involving STL with MRL brings [4] H. Feng, Y. Shu, Study on Network Traffic Prediction Techniques,
International Conference on Wireless Communications, Networking and
a performance improvement of = 76.62%. Mobile Computing, pp. 10411044. Wuhan, China, 2005.
[5] G. Mao, Real-Time Network Traffic Prediction Based on a Multiscale
TABLE V Decomposition, 4th International Conference on Networking, Reunion
P ERFORMANCE IMPROVEMENT BETWEEN NN PREDICTORS Island, France. Lecture Notes in Computer Science, Volume 3420, pp.
492499. 2005.
1. 2. 3. 4. [6] J. Dai, J. Li, VBR MPEG Video Traffic Dynamic Prediction Based on
1. 0% 192.84% 57.44% 185.95% the Modeling and Forecast of Time Series, Fifth International Joint
2. 65.85% 0% 46.24% 2.36% Conference on INC, IMS and IDC, pp. 17521757. Seoul, Korea, 2009.
[7] L. Cai, J. Wang, C. Wang, L. Han, A Novel Forwarding Algorithm over
3. 36.48% 86.01% 0% 81.63%
Multipath Network, International Conference on Computer Design and
4. 65.03% 2.41% 44.94% 0% Applications, pp. V5-353V5-357. Qinhuangdao, China, 2010.
[8] A. Abdennour, Evaluation of neural network architectures for MPEG-4
video traffic prediction, IEEE Transactions on Broadcasting, Volume 52,
VI. C ONCLUSIONS AND F UTURE W ORK No. 2, pp. 184192. ISSN 0018-9316, 2006.
[9] S. Sun, Traffic Flow Forecasting Based on Multitask Ensemble Learn-
In this paper we demonstrated that traffic load prediction ing, Proceedings of the first ACM/SIGEVO Summit on Genetic and
is possible, with a certain accuracy. The experimental results Evolutionary Computation, pp. 961964. Shanghai, China, 2009.
show that nonlinear traffic prediction based on NNs outper- [10] J. Rodrigues, A. Nogueira, P. Salvador, Improving the Traffic Prediction
Capability of Neural Networks Using Sliding Window and Multi-task
forms linear forecasting models (e.g. ARMA, ARAR, HW) Learning Mechanisms, Second International Conference on Evolving
which cannot meet the accuracy requirements. If we take into Internet, pp. 18. Valencia, Spain, 2010.
account both precision and complexity, the best results are [11] Y. Liang, Real-Time VBR Video Traffic Prediction for Dynamic Band-
width Allocation, IEEE Transactions on Systems, Man, and Cybernetics,
obtained by the NN predictor with multiresolution learning Part C: Applications and Reviews, Volume 34, No. 1, pp. 3247. ISSN
approach, the predicted traffic generally coinciding with the 1094-6977, 2004.
observed values. If a low computational complexity is more [12] Y. Liang, X. Liang, Improving Signal Prediction Performance of Neural
Networks Through Multiresolution Learning Approach, IEEE Transac-
important, then a NN predictor with multi-task learning offers tions on Systems, Man, and Cybernetics, Part B: Cybernetics, Volume
a better solutions because this approach is simpler and its 36, No. 2, pp. 341352. ISSN 1083-4419, 2006.
performance is satisfying. [13] D.-C. Park, Prediction of MPEG Traffic Data Using a Bilinear Recurrent
Neural Network with Adaptive Training, International Conference on
As future work we envisage to integrate the chosen predictor Computer Engineering and Technology, pp. 5357. Singapore, 2009.
into a network management system and to evaluate it in [14] P. J. Brockwell, R. A. Davis, Introduction to Time Series and Forecast-
real-time. Foreseeing the immediate future by employing a ing, Second Edition. Springer-Verlag,ISBN 0-387-95351-5, 2002.
[15] Traffic measurements http://dc-snmp.wcc.grnoc.iu.edu/i2net/#
prediction based approach enables a proactive management.