Infocom Submission
Infocom Submission
I. Introduction
In recent trend of data-intensive applications with payas-you-go execution in a cloud environment, there are new
challenges in system management and design to optimize the
resource utilization. Types of the application, deployed in a
cloud, can be very diverse. Some applications exhibit highly
varying demand in resources. In this paper we consider a Video
on Demand (VoD) system as a relevant example of a dataintensive application where bandwidth usage varies rapidly
over time.
A VoD service delivers video contents to consumers on
request. According to Internet usage trends, users are increasingly getting more involved in the VoD and this enthusiasm
is likely to grow. According to 2010 statistics a popular
VoD provider like Netflix accounts for around 30 percent
of the peak downstream traffic in the North America and
is the largest source of Internet traffic overall [1]. Since
VoD has stringent streaming rate requirements, each VoD
provider needs to reserve a sufficient amount of server outgoing bandwidth to sustain continuous media delivery (we are not
considering IP multicast here). However, resource reservation
is very challenging in a situation, when a video becomes
popular very quickly leading to a flood of user requests on
the VoD servers. This situation, also known as a buzz,
demands an adaptive resource allocation strategy to cope with
the sudden (and significant) variation of workload. Following
is one example of buzz (see Figure 1) where interest over
a video Star Wars Kid [2] grew very quickly within a
short timespan. According to [3] it was viewed more than
Time (days)
Fig. 1. Video server workload: time series displaying a characteristic
activity models to describe the usage of system resources. Limitation of these models are that they only give average results.
However, dealing with mean workloads might not be sufficient
to clearly describe applications because of their potential
volatility. In [13] authors proposed a maximum likelihood
method for fitting a Markov arrival process (MAP) to the web
traffic measurements, collected in commonly available HTTP
web server traces. This method achieves reasonable accuracy
in predictive models for web workloads but lacks intuitive
nature to describe users behavior like a gossip based method.
In [14] the authors statistically model traffic volatility in large
scale VoD systems using GARCH (generalized autoregressive
conditional heteroscedasticity) process. Amazon Cloud-Watch
follows this approach and provides a free resource monitoring service to Amazon Web Service customers for a given
frequency. Based on such estimates of future demand, each
VoD provider can individually reserve a sufficient amount of
bandwidth to satisfy in average its random future demand
within a reasonable confidence. However, according to the
authors, this technique only models and forecasts the mean
demand, or the expected demand whereas the real demand
might vary around this predicted mean. They suggested to
provision an additional risk premium to the service providers
for tolerating the demand fluctuation. In another workload
model the authors of [15] [16] proposed a Markov Modulated
Poisson Process (MMPP) based approach for buzz modeling
and then parameter estimation using the index of dispersion.
However, the MMPP model includes only short-term memory
in the system and the obtained statistics is not physically
interpretable to draw inference about the system dynamics.
The model we derive in [4] has the following advantages:
It follows a constructive approach, based on a Markov
model,
It is identifiable and succeeds to capture workload volatility,
It satisfies the large deviation properties, that can be
exploited to frame dynamic resource allocation strategies.
III. Proposed Video on Demand (VoD) Model
Following the lines of related works, we rely on epidemic
models to represent the way information spreads among the
users (gossip-like phenomenon) in a VoD system. Epidemic
spreading models commonly subdivide a population into several compartments: susceptible (noted S) to designate the
persons who can get infected, and contagious (noted C) for
the persons who have contracted the disease. This contagious
class can further be categorized into two parts: the infected
subclass (I) corresponding to the persons who are currently
suffering from the disease and can spread it, and the recovered
class (R) for those who got cured and do not spread the
disease anymore [17]. In these models S (t)t0 , I(t)t0 and
R(t)t0 are stochastic processes representing the time evolution
of susceptible, infected and recovered populations respectively.
In the case of a VoD system, infected I refers to the people
who are currently watching the video and can pass the
information along. In our setting, I(t) directly represents
(1)
(i+r)+l
i, r
= 1
i+1, r
r
a1
= 2
i-1, r+1
i, r-1
a2
transitions of the number of current (i) and past active (r) viewers.
l
,
(3)
(4)
We now extend these results to the case where the model may
exhibit a buzz activity. As alternates between the hidden
states = 1 and = 2 , with respective state probabilities
a2 /(a1 + a2 ) and a1 /(a1 + a2 ), one can simply replace in Eq.
(3) and (4) with the equivalent average value:
1 a2
2 a1
=
+
.
a1 + a2 a1 + a2
(5)
l
a1
a2
E(i)
Emp. mean hii
case (a)
4.762 104
0.0032
0.0111
5 104
104
107
0.0667
case (b)
3.225 105
0.0032
0.0020
3.289 105
104
107
0.0667
case (c)
2.439 105
0.0032
0.0011
2.5 105
104
107
0.0667
1.92
1.74
15.68
16.72
44.72
45.23
!1
X(t) dt
(6)
!1
I(t) dt
(7)
= argmin T
b
(10)
R
b= R
bb .
Plots of Figure 4 show the evolution of the KolmogorovSmirnov distance corresponding to the traces displayed in
Figure 3. In the 3 cases, T clearly attains its minimum
bound for b
close to to the actual value. The corresponding
b derived from Eq. (10) match fairly
estimated processes R(t)
well the real evolution of the (R) class in our model (see
Figure 5).
Propagation parameters and l. According to our model, the
arrival rate (t) of new viewers is given by Eq. (2). It linearly
depends on the current number of active and past viewers. So,
b
from the observation I(t) and the reconstructed process R(t)
of Eq. (10), we could formally apply the maximum likelihood
Eq. (6) to estimate . In practice however, we have to bear
in mind that: (i) the arrival process of rate (t) comprises a
case (a)
case (b)
case (c)
60
350
80
60
50
300
40
250
200
30
40
150
20
100
20
10
0
500
1000
1500
2000
2500
50
0
1498
1500
1502
1504
0
9326
9328
9330
9332
Fig. 3. Illustration of our model ability at generating different dynamics of workload I(t). See Table I for the parameter values corresponding
to each of these three cases. The Xaxis corresponds to time (in hours unit) while the Yaxis indicates the number of active viewers.
0.03
0.02
case (a)
case (b)
case (c)
0.01
0
6
10
10
10
(logarithmic scale)
10
b
R(t), R(t)
900
850
800
750
700
1498
1499
1500
1501
1502
1503
1504
time (hrs)
Evolution of the number of active past viewers. Comparison
of the actual (non observable) process R(t) (blue curve) with the
b (red curve) derived from expression (8).
estimated process R(t)
Fig. 5.
0.45
case (a)
case (b)
case (c)
0.4
0.35
((x))1
0.3
0.25
0.2
0.15
0.1
0.05
0
0
500
1000
1500
b
x = I(t) + R(t)
2000
2500
negligible (less than 5% in the worst case (c)) and the variance
always confines to 10% of the actual value of 1 . Notice also
that the estimation of 1 goes from a slight underestimation in
case (a) to a slight overestimation in case (c), as the buzz
effect, i.e. the value of 2 , grows from traces (a) to (b).
Compared to b1 , the estimation of 2 behaves more poorly
and proves to be the most difficult parameter to estimate. But
we have to keep in mind that this latter is only based on buzz
periods which represent only a small fraction of the entire
time series. Regarding the parameter , its estimation remains
within a 20% inter-quartile range but cases (a) and (c) show
a systematic bias (median hits the lower quartile bound). Let
us then recall that the procedure, described by Eq. (10) to
determine b
selects within some discretized interval, the value
of that yields the best T score. It is then very likely that
the true value does not coincide with any sampled point of
the interval and therefore, the procedure picks the closest one
that systematically lies beneath or above. Finally, estimation
of the transition parameters a1 and a2 between the two hidden
states relies on all other parameters estimation, cumulating so
all relative inaccuracies. Nonetheless and despite a systematic
underestimating trend, precision remains within a very acceptable confidence interval. Convergence rate of the empirical
estimators is another important feature that binds the estimate
precision to the amount of available data. Using the same data
set, the bar plots of Figure 11 depicts the evolution of the mean
square error MSE(b
) = E{(b
)2 } where generic stands
for any parameter of the model with the length N of the
observable time series I(t). As our purpose is to stress the
rate of convergence of these quantities towards zero, to ease
the comparison, we normalize the MSE of each parameter by
its particular value at maximum data length (i.e 221 points
here). Then, the estimator rate of convergence corresponds
to the decaying slope of the MSE with respect to N in a loglog plot, i.e. MSE(b
) O(N ). For the different parameters
of our model we obtain convergence rates that lie between
1 = 0.9 and a2 = 0.2, leading each time to sub-optimal
convergence ( < 1). It is worth noticing that, despite its
relatively ad hoc construction, the estimator of 1 has an
almost optimal convergence rate, which proves the rationality
of our approach.
V. Model Fitting to Real Workload Data
We now apply the calibration procedure detailed in the
previous section, to fit our model on real data and to assess
the data-model adequacy in the context of highly volatile
workloads. As a paradigm for variable demand applications,
we use a VoD trace, released by the Greek Research and
Technology Network (GRNET) VoD servers [21]. Since the
trace shows modest activity with a workload level that is
not sufficient to stress-test our system, we scale up the data,
shrinking all inter-arrival times by a factor of 10. The resulting
workload time series displayed in Figure 9, clearly shows two
distinct periods of steady activity before and after the time
index t = 200. We consider the sub-series on both sides of this
cutoff time, as two individual workload processes, referred to
case (a)
case (b)
case (c)
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0.1
0.1
0.2
0.1
0.2
0.3
0.2
0.3
a1
a2
0.4
a1
a2
Box-and-Whisker plots for relative estimation errors of the model parameters for the three different sets of prescribed parameters
reported in Table I. For each case (a)-(c), statistics are computed over 10 independent realizations of time series of length 221 points.
Fig. 7.
10
9
l
a1
log2 MSE(b
)
7
6
5
a2
4
3
2
1
0
12
15
18
21
log2 (N)
Evolution of the Mean Square Error versus the data length
N in a log-log plot. For the sake of conciseness, we only show here
the results corresponding to the case (b) of Table I. Corresponding
figures for the other two cases are deported in the Appendix section.
Fig. 8.
I
II
b1
0.0013
0.0049
b2
0.0084
0.0183
0.0039
0.0118
0.0028
0.0095
b
l
0.0032
0.0005
ab1
3.13.104
1.32.105
ab2
0.022
0.041
Steady-state distribution
Autocorrelation function
10
10
10
10
10
10
10
10
10
10
10
10
10
10
50
100
150
0.5
1.5
10
10
10
10
10
10
10
10
10
10
10
10
10
10
15
20
0.1
0.2
0.3
0.4
0.5
Fig. 10. Comparison of the empirical steady-state distribution and of the autocorrelation function of the real (blue curves) and the fitted (red
curves) traces. Top two plots correspond to trace I and bottom plots correspond two trace II.
VI. Conclusion
140
120
Trace I
Trace II
100
80
60
40
20
0
0
50
100
150
200
250
300
time (hrs)
Fig. 9. Real workload time series corresponding to a VoD server
demand from [21]. Initial trace was scaled up by a factor of 10
to increase the mean workload. Trace is chopped into two separate
processes (Trace I and II) corresponding to different activity levels.
[19] S.R. Jammalamadaka and E. Taufer, Testing exponentiality by comparing the empirical distribution function of the normalized spacings with
that of the original data, J. Nonparametric Statistics, vol. 15, pp. 719
729, 2003.
[20] J. Kleinberg, Bursty and hierarchical structure in streams, in Proc. the
8th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 2002.
[21] GRNET, Video traces obtained from grnet, 2011, http://vod.grnet.gr/.
Appendix
10
Acknowledgments
This work has been supported by the EU FP7 project SAIL.
log2 MSE(b
)
l
a1
7
6
a2
5
4
3
2
1
0
12
15
18
21
log2 (N)
10
l
a1
log2 MSE(b
)
References
a2
5
4
3
2
1
0
12
15
18
21
log2 (N)
Evolution of the Mean Square Error versus the data length
N in a log-log plot. Top: case (a) (top) ; Bottom: case (c) of Table I.
Fig. 11.