2021 - Energy-Efficient VM Scheduling Based On Deep Reinforcement Learning
2021 - Energy-Efficient VM Scheduling Based On Deep Reinforcement Learning
article info a b s t r a c t
Article history: Achieving data center resource optimization and QoS guarantee driven by high energy efficiency has
Received 3 February 2021 become a research hotspot. However, QoS information directly sampled from the cloud environment
Received in revised form 22 June 2021 will inevitably be affected by a small amount of structured noise. This paper proposes a deep reinforce-
Accepted 14 July 2021
ment learning model based on QoS feature learning to optimize data center resource scheduling. In the
Available online 17 July 2021
deep learning stage, we propose a QoS feature learning method based on improved stacked denoising
Keywords: autoencoders to extract more robust QoS characteristic information. In the reinforcement learning
Energy efficiency stage, we propose a multi-power machines (PMs) collaborative resource scheduling algorithm based
QoS guarantee on reinforcement learning. Extensive experiments show that compared with other excellent resource
Denoising autoencoder scheduling strategies, our method can effectively reduce the energy consumption of cloud data centers
q-learning while maintaining the lowest service level agreement (SLA) violation rate. A good balance is achieved
Feature learning between energy-saving and QoS optimization.
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2021.07.023
0167-739X/© 2021 Elsevier B.V. All rights reserved.
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
correlation coefficient to calculate the similarity between users (1) We propose a VM scheduling framework based on deep
and services to provide QoS features for the target user. However, reinforcement learning. The framework uses a combination
the coefficient is generally used to measure the linear correlation of deep learning and reinforcement learning. The deep neu-
between data, which cannot sufficiently explain the similarity ral network implements denoising and extracting the QoS
between objective QoS data. The second is a QoS feature learning feature in the data center. It can improve the state space
method based on matrix decomposition, which decomposes the information of the Q learning algorithm. The Q learning
QoS sampling matrix into service feature vectors and user feature algorithm obtains knowledge of the cloud environment
vectors [15,16]. However, due to the sparsity of QoS data, the in- in the action-evaluation mechanism to improve the VM
formation obtained from feature vectors is not enough to provide scheduling scheme and optimize the energy consumption
accurate QoS feature information. The third is the feature learning of the entire data center with guaranteed QoS.
method based on traditional machine learning, such as Principal (2) We design a new type of deep neural network model for
Component Analysis (PCA) [17] or random forest method [18]. QoS feature learning in data centers. The model is built by
These methods can deal with common feature selection prob- combining stacked denoising autoencoders with multilayer
lems, but they will not perform well in the face of noisy data. perceptrons (MLP). Stacked denoising autoencoders can
The above QoS feature learning method assumes that the existing denoise the structured noise in the QoS feature informa-
QoS sampling information is accurate and reliable. However, the tion to extract more robust feature information. The status
sampled QoS information will inevitably be contaminated by a information of each VM on the same host has a certain
small amount of structured noise [19]. Therefore, it is necessary dependency that affects the QoS of the data center. Our
to study how to reduce the negative impact caused by structured MLP component can further extract the deeper QoS feature
noise in feature learning. This paper uses an improved stacked between the dependencies of these VMs.
denoising autoencoder network to denoise the data center QoS- (3) We propose a Multi-PMs Collaborative VM scheduling
related characteristic information, extracting more robust QoS
based on the MiniMax-Q learning algorithm to optimize
characteristics.
dynamic resource scheduling in the data center. The QoS
Machine learning strategies have been studied for resource
features extracted by DNN will be provided to the Q learn-
scheduling and power management in cloud computing systems.
ing algorithm as the state information in the current
Some researches [20,21] proposed resource management frame-
scheduling interval. In the choice of reward-action, a non-
works based on machine learning methods for workload pre-
deterministic hybrid strategy is adopted so that nodes
diction and resource allocation. However, the training process
can make more use of existing experience to make deci-
of these methods will be over-adjusted due to different work
sions and make the entire reinforcement learning process
scenarios. It is difficult to guarantee the QoS when some unex-
converge.
pected scenarios are not included in the training stage, which
makes these methods unsuitable for dynamically changing envi- The rest of the paper is arranged as follows: Section 2 dis-
ronments. Dynamic virtual machine (VM) consolidation [22–24] cusses related work. Section 3 discusses the proposed VM
can effectively reduce energy consumption in cloud data centers. scheduling framework and its components. Section 4 conducts
This method migrates the VM from the overloaded physical ma-
an experimental analysis on the dynamic VM scheduling module.
chine (PM), relocates it, and switches the idle PM to the standby
Section 5 summarizes the article and prospects for future work.
mode to reduce energy consumption. Dynamic VM consolidation
aims to maintain PM utilization within a fixed range and avoid
2. Related work
performance degradation and SLA violations caused by overload.
Due to many physical and virtual machines in the cloud data cen-
QoS guarantee has always been a key issue for cloud comput-
ter, the computational complexity of finding the unique optimal
ing energy-efficient scheduling. Related literature has researched
solution for the VM scheduling problem is too high. In order to
QoS performance evaluation in a distributed environment. The
avoid the large time cost, heuristic algorithms are usually used
results have shown that the system response time, reliability,
to solve this problem [25]. However, it is easy to fall into a local
minimum and cannot achieve the best overall placement effect. In throughput, security and other parameters will affect the mea-
contrast, the meta-heuristic algorithm can find the optimal global surement of system performance. However, the QoS feature infor-
solution better [26,27]. However, there are too many parameters mation that can be directly captured in the actual cloud environ-
in the algorithm, which results in the low reusability of the ment is often incomplete. It is of great significance to efficiently
calculation results and the inability to tune parameters quickly extract QoS features from historical data in the data center.
and effectively. In response to the above shortcomings, this paper Collaborative filtering is a popular algorithm for service rec-
uses a deep reinforcement learning strategy to solve the VM ommendation and feature learning. Literature [30] proposes a col-
scheduling problem in the data center. Deep reinforcement learn- laborative filtering QoS feature learning method based on support
ing combines the advantages of deep learning and reinforcement vector machines, reducing the interference caused by network
learning. This type of algorithm has the characteristics of self- dynamics. This method will not be affected by data sparseness.
learning and self-adaptation [28,29]. It needs few parameters and Ren et al. in [31] design a simple and effective similarity model
has better global search capabilities. Introducing the extracted to reduce the impact of QoS data range, and use the neighbors
low-noise QoS feature information into the reinforcement learn- obtained by JacMinMax similarity to improve the accuracy of
ing environment can improve the QoS guarantee capability of the QoS feature learning. Chen et al. [32] propose a novel collabo-
scheduling algorithm for the data center. rative filtering approach to extract the feature representations.
In this paper, we focus on analyzing the historical QoS-related The model uses a deep neural network to capture the non-
data information of the data center, and perform noise reduc- linear characteristics of the input data. Zhu et al. [33] present
tion processing on the data to extract more robust QoS feature an adaptive matrix factorization method to perform online QoS
information. Then use the extracted QoS feature information to feature learning. The matrix factorization model has been greatly
improve the virtual machine scheduling algorithm based on the expanded and has gained stronger robustness and accuracy. The
Q learning algorithm. article proposes a QoS prediction method based on privacy pro-
The main contributions of this paper can be summarized as tection. The Literature [34] uses a location-aware low-rank matrix
follows: factorization strategy to improve the robustness of the model,
617
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
which can effectively learn and predict QoS feature information. but this is a model established when isomorphic sub-nodes are
White et al. [35] present a stacked autoencoder with dropout to assumed and the system QoS is not evaluated. The evaluation
reduce the training cost compared to other traditional algorithms. did not consider the SLA defined by the user while using cloud
The model can effectively extract the features of the original computing services.
input data. However, the above QoS feature learning method does Reinforcement learning (RL) based energy management has
not take into account that the original data may contain noise, been widely used for cloud computing systems. Qiu et al. [46]
which will greatly affect the efficiency and reliability of feature introduced a QoS-enabled load scheduling method based on RL.
learning. It is necessary to study how to reduce the negative Chen et al. [47] proposed a Prediction-enabled resource allocation
impact of structural noise in feature learning before designing an scheme with a Q-learning algorithm to efficiently obtain adaptive
energy-efficient scheduling mechanism. resource allocation for cloud-based software services. However,
The energy consumption optimization in data centers can the state space and action space in the traditional RL process are
be mainly divided into three categories: dynamic voltage fre- limited, which easily leads to huge Q-table and poor convergence.
quency scaling (DVFS) technique based on power model, energy- Lu et al. [48] solved task offloading problems by using deep
aware resource allocation heuristics, threshold-based dynamic reinforcement learning based on LSTM. The authors use the LSTM
consolidation and cloud workload characterization and estima- network layer and candidate network to improve their RL algo-
tion. DVFS [36] is a state-of-the-art energy saving technique to rithm and optimize the energy consumption and latency in the
reduce the power consumption in current computer systems. mobile edge computing environment (MEC). Literature [49] pro-
The technique enables CPUs to run at various combinations of poses a job scheduling scheme based on reinforcement learning
clock frequencies and voltages based on system performance in the cloud system and achieves the minimum makespan with
requirements at the given time. The limitations of this work are resource constraints. These studies only consider the indicators
lower priority tasks suffering and higher SLA violations due to in the previously signed SLA for the QoS guarantee. However,
response time. Energy-aware resource allocation heuristics can the real cloud environment is dynamic and bursty, and the traffic
be divided into two categories: bio-inspired and nature-inspired carried by data centers is not static. In this scenario, scheduling
algorithms [37], like Particle swarm optimization (PSO) [38], Ant with static QoS constraints will still lead to SLA violations. There-
colony optimization (ACO) [26], Genetic algorithm (GA) [39], etc. fore, it is indispensable to extract the QoS feature information
For example, Peng et al. [6] proposed an Evolutionary VM Allo- and attributes based on the historical data of the data center to
cation algorithm which is based on an energy saving model for improve the resource constraints in the scheduling process.
data centers. However, the final solution from these heuristics Most of the algorithms above have the disadvantages of too
can only be chosen from a Pareto set, which implies that the many parameters, low availability of calculation results, and in-
selected solution might not be good. Liu et al. [40] proposed a VM ability to quickly adapt to changes in the external environment.
consolidation algorithm that combines ELM and ACO. After the When the load fluctuates greatly, these algorithms will have more
ELM predicts the overloaded host, the ACO will construct multiple significant limitations. In this paper, we choose the MiniMax-Q
populations and optimize the migration plan through the local algorithm to solve the VM scheduling problem in the data center.
search strategy. This method reduces the cost of the pre-built This type of algorithm has the characteristics of self-learning and
migration tuples. self-adaptation [26,27]. It needs few parameters and has better
Energy consumption optimization based on workload charac- global search capabilities. Bringing the extracted QoS feature
terization and estimation mainly consists of two parts: workload information into the reinforcement learning environment can
forecasting and VM scheduling. Tang [41] proposes a parallel improve the QoS guarantee capability of the scheduling algorithm
improved long short-term memory (LSTM) prediction model to for the data center.
achieve systems resource management real-time requirement
in large-scale computing systems. Zhang et al. [42] proposed a 3. The proposed VM scheduling framework
novel prediction model to handle traffic bursts and give accurate
predictions. They combine the simple moving average (SMA) In this section, we propose a VM scheduling framework based
model and Gompertz curve fitting [43] to capture characteris- on deep reinforcement learning to solve the energy-efficient
tics of bursty workloads. The prediction results of the workload scheduling problem in data centers. We call this model SDAEM-
forecasting model will directly affect the performance of VM MMQ. As shown in Fig. 1, the model is mainly composed of two
scheduling algorithm (e.g., VM scheduling algorithm based on modules. In the deep neural network (DNN) module, we propose
threshold evaluation [10], overload detection [44,45] and dy- an improved QoS feature extraction model based on a denoising
namic consolidation [2,12] mechanism). In the literature [14], to autoencoder. The denoising autoencoder enables the model to
reduce the system energy consumption as much as possible, the learn more robust feature information. The status information of
author uses the customer-level SLA requirements to deploy cloud each VM on the same host has a certain dependency, and this
resources in a cloud computing environment to minimize the dependency is also a critical factor that affects the QoS of the data
overall power consumption of the system. This research does not center. Therefore, we connect the new features extracted from
consider the computing power provided by the cloud comput- each SDAE to the MLP. The MLP can further extract the deeper
ing system for users when the energy consumption is minimal. QoS feature between the dependencies of these VMs. QoS feature
Literature [15] uses the automatic server configuration system extraction aims to obtain the accurate ‘‘hunger level’’ of VMs in
(ACES) to reduce energy consumption or increase energy effi- the same PM. That is, to obtain the real CPU resources demand of
ciency when the load requirements are met to solve the problem, the current VMs.
thereby improving the system’s energy efficiency. The litera- The second part is a reinforcement learning module based
ture [16] proposes an online algorithm based on reducing the on Minimax-Q. Due to the mutual influence of multiple physi-
energy consumption of the data center and dynamically adjusts cal opportunities in the data center, we cannot simply use the
each central server’s load. When the load is low, some servers will greedy strategy to update the Q value function. Due to the limited
be shut down. In this way, it can reduce the overall energy con- resources of the data center and frequent host activities, there
sumption to achieve load balancing. Literature [9] has proposed will be competition among PMs during VM scheduling. Therefore,
a convenient energy efficiency model for the maximum value the competition process at each stage can be regarded as a zero-
of energy efficiency and the proof of the occurrence conditions, sum game, so we adopt a MiniMax-Q learning Algorithm to
618
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
Θ̂ = argminL(Θ ) (3)
Θ
Fig. 1. The proposed overall VM scheduling framework. When minimizing the loss function, we also need to consider
the complexity of the model. An overly complex model can easily
lead to overfitting. Therefore, we use model complexity as one in-
solve this problem. The QoS features extracted by DNN will be dicator to participate in the training process, thereby constraining
provided to the Q learning algorithm as the state information our model. In order to improve the generalization ability of the
in the current scheduling interval. The scheduling action of the model and avoid overfitting, a regular term (L2 Norm) based on
Q learning algorithm on the physical machine is to migrate or weight decay has been appended to the loss function. The weight
stay. After the action is completed, the data center will give the decay is a coefficient of the regular term, which can adjust the
Q learning module experience feedback so that it can continue to influence of model complexity on the loss function.
update the model during exploration and exploitation. We use the cross entropy H(x, x̂) of x and x̂ as the loss func-
tion; α denotes the adjustment weight coefficient, ∥W ∥22 =
3.1. Denoising autoencoder Σi,j ωi2,j . In this paper, α = 0.002. The final loss function is defined
as:
The basic idea of Autoencoder [50] is to make the encoding
layer (hidden layer) learn the hidden features of the input data, L(Θ ) = H(x, x̂) + α∥W ∥22 (4)
and the new features learned can also reconstruct the original The cross entropy function H(x, x̂) can be expressed as:
input data through the decoding layer. Autoencoder has a non-
linear transformation unit to extract more critical features and D
∑
H(x, x̂) = − xi log x̂ı + (1 − xi ) log 1 − x̂i
( ( ))
express the original input better. The denoising autoencoder is an (5)
improvement based on the autoencoder with a purpose to learn i=1
more robust features. The denoising autoencoder [51] forces the From the Eqs. (4)–(5), the objective function of the denoising
hidden layer to learn more robust features. This technique can autoencoder can be written as:
obtain a good characterization of the input and reconstruct its ⎛ ⎞
corresponding original input. D
∑ ∑
Θ̂ = argmin ⎝− xi log x̂i + (1 − xi ) log 1 − x̂i +α ωi2,j ⎠
( ( ))
In the cloud environment, the requirements of tasks submitted
Θ
by different users are also different, some tend to be completed i=1 i,j
in the shortest time, and some only need to ensure that the
(6)
task is completed within a period of time. For tasks with rela-
tively long deadlines, consider assigning them to virtual machines The optimization of the above objective function can be solved
with weaker performance for execution. Virtual machines with by quasi-Newton optimization algorithm based }on linear search
to estimate the parameters Θ̂ = Ŵ1 , Ŵ2 , b̂1 , bˆ2 .
{
stronger performance give priority to tasks with short deadlines
so as to protect the interests of all users as much as possible. The
traditional scheduling method requires all entities in the entire
3.2. Stacked denoising autoencoders with multilayer perceptron
cloud computing environment to meet a single QoS constraint
(SDAEM)
target, which often fails to meet the requirements of the real
cloud computing environment.
We propose a QoS feature learning method based on denois- The denoising autoencoder can extract more robust QoS fea-
ing autoencoders to capture important QoS information of the tures from the noisy data. However, the single-layer DAE is dif-
current VM. As shown in Fig. 2, in addition to the conventional ficult to process the complex multi-dimensional VM state infor-
encoding stage and decoding stage. the denoising autoencoder mation in the data center. In this subsection, we use a stacked
adds noise to the original input data before encoding. It enables denoising autoencoder (SDAE) constructed by stacking a series
the encoder to remove the effects of this noise and capture of single-layer DAEs. SDAE can extract QoS features in VM state
the uncontaminated input. Therefore, the encoder can learn and information layer by layer and fully consider the dependencies
extract more robust QoS features from the input data. between VMs.
Let xi represents the original input data, x̃i represents the VMs in the data center runs on the hosts. The status in-
corrupted data after adding Gaussian noise, W1 and W2 represent formation of the VM is also affected by other VMs located on
the weights of the encoder and decoder respectively, b1 and the same host. The status information of each VM on the same
b2 denote the bias vectors, and the encoding function of DAE host has a certain dependency, and this dependency is also an
encodes the original input to get a new feature representation, important factor that affects the QoS of the data center. Therefore,
the encoding process can be expressed by the following equation: we connect the new features extracted from each SDAE to the
MLP (Fig. 3). The MLP can further extract the deeper QoS feature
hi (x̃i ) = sigm (W1 x̃i + b1 ) (1) between the dependencies of these VMs.
619
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
3.3. Multi-PMs collaborative VM scheduling based on MiniMax-Q replace the value function of the new state in the learning rule.
This method only considers the current PM’s own decision and
The energy efficiency of the data center is not only affected does not consider the impact of other host nodes on the energy
by the current PM’s scheduling strategy, but also by other PMs. efficiency of the data center. However, in a multi-host random
Moreover, each PM’s reward value is different, and the existing game, we cannot simply adopt a greedy strategy because multiple
learning methods cannot be directly applied. In a real data center PM nodes influence and restrict each other. Therefore, we should
environment, it is impossible to define the optimal strategy. The improve the Q function. The most direct idea is to use Nash
optimal strategy of a single PM does not represent the optimal equilibrium to approximate. If the Q function is defined as the
strategy of the overall data center. Therefore, the VM scheduling reward value of the stage game, then the strategy of each host is
algorithm should consider the overall energy consumption and the equilibrium point if and only if the entire random game model
performance of multiple PMs to balance resource scheduling. is in equilibrium. Therefore, the equilibrium point of the random
The traditional Q-learning algorithm adopts a greedy strategy. game can be achieved by designing a reasonable Q function.
In each scheduling slot, the node selects the largest Q value to
620
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
3.3.1. State space and action space Due to the limited idle margin of resources and frequent activ-
The state space of the data center at time t can be defined as: ities in the data center, there will be competition between PMs.
Because the total resource capacity of the data center is fixed, the
St = ⟨(ξ1 , s1 ) , (ξ2 , s2 ) , . . . , (ξi , si ) , . . . , (ξn , sn )⟩ (7)
competition process at each stage can be regarded as a zero-sum
i
Mallocation game. For zero-sum games, Minimax equilibrium is equivalent
ξi = (8)
i
Mdemand to Nash equilibrium. All physical nodes must play games with
other nodes when making decisions. Similar to the original Q-
where ξi represents the QoS coefficient of the ith PM, Mallocation
i
learning algorithm, if there are enough samples for each state
represents Million Instructions Per Second (MIPS) allocated to PM and action, the Minimax-Q learning algorithm can also guarantee
i
i by the data center, Mdemand denotes the actual demand MIPS convergence.
value of PM i. The value of ξi reflects the real hunger level of VMs After learning the optimal Q function, we can calculate the
on PM i for resource requirements. si represents the utilization of optimal strategy based on the Q value. The most straightforward
the ith PM and n denotes the number of PMs. When the remaining method is that the node greedily chooses the action with the
CPU capacity of the PM can meet the current resource constraints, highest reward for scheduling the actions previously executed ac-
the VM can be migrated. cording to the current state. However, in the resource scheduling
In order to migrate the VM to a suitable PM, this paper defines process’s initial stage, the PM node does not have any experience
the corresponding relationship between the action space and the to use. To discover the most effective action, PM needs to try
set of PMs. For each VM, the action set can be described as (0/1)m n, more scheduling options and accumulate experience for future
which means whether the mth VM is migrated or not (stay/ decisions. Therefore, in the design of resource scheduling strate-
migrate). For example, if VM m is migrated to PM n, the action gies, it is necessary to balance exploration and exploitation. For
space of VM m can be defined as a tuple: A = ⟨0, 1, 0, 0, . . . , 0⟩, this purpose, we use a non-deterministic hybrid strategy. In other
which indicates VM m is allocated to the second PM. words, we assign a greater probability of selection to actions with
a high Q value and assign a smaller probability to actions with a
3.3.2. Reward function low Q value to ensure that all nodes have a chance to be selected.
This paper takes the minimization of the energy consumption In the ε -greedy strategy, the current best node is selected
of PMs in the data center as the optimization objective, so the with a probability of 1 − ε , and there is a probability of ε
immediate reward is set to the total energy consumption of to randomly select other non-optimal available nodes. Although
the current PMs. The immediate reward can reflect the system this method is simple and efficient, it is unreasonable to treat
running status and efficiency of the VM scheduling scheme. This all non-optimal nodes equally, especially in the later stages of
paper defines the power model of the nth PM in the data center the learning process when the network stabilizes. The SoftMax
as follows: strategy is an improvement of ε -greedy. The probability that the
{
Pnidle
( )
Pnidle + 1 − ∗ Pnmax ∗ µ, µ>0 PM node chooses action a in state s is shown in Eq. (13):
Pn (µ) = Pnmax (9)
0, µ=0 exp (Pidle (a)Q (s, a)/ψ
P {a | s} = ∑ (13)
a′ exp (Pidle (a′ ) Q (s, a)/ψ
where Pnidle represents the power of the nth PM in idle state, Pnmax
where Q (s, a) = Q (s, a, k)P(s, k). ψ is an adjustment factor.
∑
represents the power of the nth PM at full load, and µ is the CPU k∈A
utilization. ψ (t) = 100 exp(−δ t) + µ (14)
The CPU utilization of a PM changes over time, so the energy
consumption of a PM from time t1 can be expressed as follows: When ψ has a larger value, the PM is more likely to choose
∫ t1 +t farther and more actions. The smaller the ψ , the higher the
En (t1 ) = Pn (µ(t))dt (10) probability of action with a higher reward value. When ψ is very
t1 small, the probability of the optimal action approaches 1, at this
In order to solve the deviation of the reward value caused by time, the softmax strategy degenerates to the ε -greedy strategy.
the performance difference of different PMs at each time step, this As the learning process progresses, the parameter should change
paper normalizes the energy consumption of PMs as follows: from high to low. In the early stage of the learning process, the
PM explores all actions driven by the high value of ψ . Then, the
N
∑ En (t1 − t ) value of ψ gradually decreases over time, and the PMs make more
E (t1 ) = (11) use of the existing experience to make decisions until the learning
En (t1 )
n=1 process stabilized.
where N is the number of PMs in the data center. If the strategy Fig. 4 shows the multi-PMs collaborative VM scheduling
generated by the reinforcement learning method effectively re- framework based on Q-learning. In each scheduling cycle, the
duces energy consumption, En (t1 ) ≤ En (t1 − t ) and E (t1 ) ≥ 1; data center observes all PMs’ current state and obtains the Q-
otherwise E (t1 ) < 1. Therefore, the larger the value of E (t1 ), learning state vector s through the SDAEM model. Next, the
the more effective the scheduling strategy generated by our Q- current PM node calculates the probability that the VM needs
Learning algorithm. The long-term reward will gradually increase to be migrated out in the current state s according to the Q
with the iteration of the algorithm and finally generate a VM value and the idle probability. The data center selects a PM
scheduling strategy that minimizes the total energy consumption to perform scheduling operations based on the probability of
of the data center. migration. After the scheduling is completed, the data center
needs to observe the changes in the state of the physical machine,
3.3.3. Q-learning algorithm the rewards obtained, and the decisions of competitors. Based on
For current PM i, the Q function is as shown in Eq. (12). these three observations, the data center updates the Q value. In
the initial stage, the learning rate is set to 1, which means that
Qi (s, ai , a−i ) = (1 − αt ) Qi (s, ai , a−i ) + αt Ri + β Vi∗ s′
[ ( )]
(12)
the new Q value in the update operation is larger. As the learning
where αt represents the action of current PM t, Qi (s, a) represents process progresses and gradually decreases, the data center is
the optimal expected cumulative reward obtained by PM i when more dependent on the old Q value. At this time, the learning
all physical machines perform action a in state s. process gradually tends to converge.
621
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
Table 1
PM configuration used in the simulation.
PM Type CPU(MIPS) Core RAM(GB) BW(Gbits/s)
HP Proliant G4 1860 2 4 1
HP Proliant G5 2660 2 4 1
Table 2
VM configuration used in the simulation.
VM Type CPU(MIPS) RAM(GB) BW(Mbits/s)
High-CPU medium instance 2500 0.85 100
Extra-large instance 2000 3.75 100
Small instance 1000 1.7 100
Micro instance 500 0.613 100
Table 3
The power of servers under different CPU utilization.
CPU utilization(%) Power (W )
Fig. 4. Multi-PMs collaborative VM scheduling framework based on Q-learning. HP Proliant G4 HP Proliant G5
0 86 93.7
10 89.4 97
This paper first uses a stacked denoising autoencoder to ex- 20 92.6 101
30 96 105
tract the critical QoS feature information between virtual ma- 40 99.4 110
chines in the data center. The( time complexity
) of the SDAE and 50 102 117
optimization algorithm is O nD2 + ndk , where D is the maxi- 60 105 121
mum number of units of the hidden layer, d is the dimension 70 108 125
80 112 129
of the embedding layer and k is the number of iterations. The
90 115 133
time complexity of the Q-learning algorithm is O(A · S), where A 100 117 135
is the number of actions that can be selected, S is the number
of states. The time complexity of Q-learning can be written as
O(n).
( The time complexity of SDAEM-MMQ can be written as
4.2. Experimental deployment
)
O n + nD2 + ndk .
Table 4
The mean, standard deviation and confidence interval of the results.
98 World Cup 17 UNSW
Algorithm mean CI for mean std mean CI for mean std
VPBAR 14315 [14269,14388] 50.13 1094.62 [1041.05,1123.84] 21.22
LRR_MMT 15310 [15274,15354] 37.74 1275.87 [1243.31,1301.93] 16.03
DthMf 13725 [13682,13770] 47.19 1143.75 [1105.60,1217.61] 20.15
VMTA 13143 [13042,13209] 69.37 1095.27 [1035.44,1157.08] 28.7
Energy(kwh)
Megh 12985 [12910,13078] 59.21 1088.61 [1057.12,1134.35] 23.11
EQBFD-0.1 14570 [14483,14615] 55.4 1214.31 [1171.65,1239.48] 20.58
EQBFD-0.3 12918 [12761,12979] 54.08 1074.94 [1042.43,1101.52] 21.22
SDAEM-MMQ 12501 [12450,12584] 44.01 1048.63 [1020.19,1079.1] 18.34
VPBAR 18.22 [17.78,18.58] 0.228 12.82 [12.68,13.07] 0.124
LRR_MMT 23.79 [23.52,24.01] 0.165 15.19 [14.96,15.28] 0.081
DthMf 21.35 [20.89,21,64] 0.199 14.24 [14.07,14.40] 0.102
VMTA 13.18 [12.34,14.05] 0.376 12.18 [11.79,12.54] 0.156
SLAV(%)
Megh 18.69 [18.32,19.43] 0.261 11.45 [11.06,11.78] 0.115
EQBFD-0.1 18.4 [17.98,19.02] 0.315 9.91 [9.72,10.45] 0.133
EQBFD-0.3 24.1 [22.90,24.74] 0.324 18.77 [18.30,19.06] 0.128
SDAEM-MMQ 12.31 [12.15,12.70] 0.172 9.28 [9.14,9.42] 0.096
VPBAR 10178 [10024,10342] 102.7 8259 [8157,8340] 77.23
LRR_MMT 10968 [10913,11079] 59.1 9540 [9446,9601] 49.78
DthMf 10715 [10621,10824] 88.5 9254 [9147,9354] 68.62
VMTA 10377 [10218,10521] 132.3 8416 [8294,8563] 93.61
Overload detection
Megh 9569 [9441,9687] 105.2 7892 [7719,7937] 79.75
EQBFD-0.1 9462 [9310,9586] 114.8 8007 [7924,8132] 81.24
EQBFD-0.3 11934 [11792,12440] 115.1 9515 [9401,9613] 80.59
SDAEM-MMQ 8510 [8397,8621] 81.7 7680 [7592,7743] 65.21
The formula definition of SVTAH can be expressed as the • LRR_MMT [59] is based on robust local regression minimum.
following formula: migration time strategy, 1.5 is the safety factor.
M • DthMf [6] is a dynamic threshold maximum fitting algo-
1 ∑ TVi rithm using dynamic merging technology.
SVTAH = (15)
M Tai • VMTA [42] is a scheduling strategy based on traffic burst
i=1
workload prediction method.
where M denotes the number of hosts in the data center; TVi • Megh [60] is an online reinforcement learning algorithm to
denotes the total time that the CPU of the host i is fully loaded; solve energy-efficient live VM migration problem.
Tai denotes the total time that host i is active. • EQBFD [61] is a dynamic server integration strategy based
PDCVM (Performance Degradation Caused by VM Migration): on Markov workload self-adaptation. 0.1 and 0.3 are the QoS
VM migration can cause performance degradation for applica- levels to be satisfied.
tions running on a VM. The performance degradation caused by
VM j is defined as: To see how the models fit the real data, Table 4 shows the
N
1 ∑ pdj results and confidence intervals of algorithms. Fig. 5(a–c) show
PDCVM = (16) the experimental results in 98 World Cup dataset. In Fig. 5(a), we
N prj focus on how different scheduling algorithms affect the energy
i=1
where N denotes the number of VMs; pdj denotes the value of consumption of the data center. As can be seen from the figure,
the performance degradation caused by VM j; prj denotes the CPU on the 98 World Cup dataset, the SDAEM-MMQ algorithm has
capacity requested by VM j. The SLA violation in a data center can the best energy-saving effect for the data center, followed by the
be written as: EQBFD-0.3. VPBAR adopts the traditional cloud task arrival model
and energy consumption modeling method, which makes the
SLAV = SVTAH × PDCVM (17) model extremely limited in the traffic burst scenarios. The DthMf
method has a better energy-saving effect than LRR_MMT because
Throughput is the number of jobs executed per unit time. In
it makes full use of the dynamic VM consolidation technology to
CloudSim, throughput is called the number of cloudlets exe-
reduce the power consumption of the PMs.
cuted per second. Throughput can be calculated by the following
Fig. 5(b) illustrates the average SLA violation rate in the data
equation:
center. It shows that EQBFD-0.3 and LRR_MMT caused a large
n number of SLA violations. The QoS coefficients of EQBFD are
Throughput = (18)
Makespan set to the weakest guarantee, making the algorithm more likely
where n is the number of cloudlets and Makespan represents the to optimize energy consumption than QoS. Furthermore, these
execution time of all cloudlets in the cloudsim. algorithms do not consider the burstiness of the workload, which
can lead to performance degradation in the data center. Dynamic
4.4. Experiment 1: Models for comparison VM consolidation shuts down many PMs, which leads to a high
SLA violation rate for Dthmf. In this case, the SDAEM-MMQ al-
In order to prove the superiority of the proposed framework gorithm achieves the best SLA optimization performance because
for VM scheduling, the proposed algorithm SDAEM-MMQ was it can obtain the real-time state and load scenario of the current
compared with the following methods. PMs in each scheduling interval. VMTA achieves the second-best
optimization effect. It adopts the scheduling algorithm of real-
• VPBAR [58] is a VM scheduling algorithm based on Poisson time workload prediction, which can avoid the host overload in
arrival rate. advance. Megh algorithm adopts scheduling algorithm based on
623
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
Fig. 5. Performance analysis on 98 World Cup and 17 UNSW dataset: (a) Total energy consumption of the data center (98 World Cup); (b) Average SLA violation
rate of the data center (98 World Cup); (c) Total number of overload detections of the data center (98 World Cup); (d) Total energy consumption of the data center
(17 UNSW); (e) Average SLA violation rate of the data center (17 UNSW); (f) Total number of overload detections of the data center (17 UNSW).
online reinforcement learning, which has moderate optimization our algorithm can achieve a better trade-off between energy
effect on energy consumption and SLA. This part shows that consumption and SLA.
624
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
Fig. 6. The standard deviation of experimental results on 98 World Cup and 17 UNSW dataset: (a) Using 98 World Cup (b) Using 17 UNSW.
Fig. 5(c) illustrates the number of overload detections of all Table 4 records the results and confidence intervals of algo-
algorithms. The result show that many hosts in the idle state are rithms. The confidence interval can reflect the stability of the
turned off to save energy through dynamic consolidation tech- algorithm. We calculated the standard deviation of the experi-
nique (DthMf). EQBFD-0.3 and LRR_MMT have a weak guarantee mental data in terms of energy consumption, SLAV, and over-
for QoS, which cause their number of overload hosts is the largest. loaded detection. Analyzing and comparing the standard devia-
SDAEM-MMQ algorithm will gradually show its advantages when tion level can judge which algorithm has higher confidence in
the number of requests is at a high level, which also reflects the experimental results. Fig. 6 shows that the standard deviation
excellent applicability of SDAEM-MMQ algorithm in the traffic of the LRR_MMT is the smallest, followed by the SDAEM-MMQ
burst scenarios. proposed in this article. LRR_MMT method only uses the local
Fig. 5(d–f) show the experimental results in 17 UNSW dataset. regression method, and the algorithm logic is relatively simple.
In Fig. 5(d), the results show that SDAEM-MMQ has the best Although the standard deviation is low, the experimental results
energy saving effect, followed by EQBFD-0.3 and Megh algorithm. on energy consumption and SLAV indicators are not good. Since
The SDAEM-MMQ algorithm can be 4.7%–22% better than other the Q-learning method also has a better convergence speed, the
algorithms for energy saving in the data center. result is more stable after the algorithm converges, so the stan-
An excellent scheduling strategy can effectively reduce the dard deviation of SDAEM-MMQ is also low. The VMTA algorithm
occurrence of SLA violations in the data center. Fig. 5(e) shows the relies on the workload forecasting algorithm for scheduling. The
average SLA violation rate of the data center under the 17 UNSW accuracy of workload forecasting is extremely susceptible to envi-
dataset. The EQBFD-0.3 and LRR_MMT algorithms caused the ronmental fluctuations, so the standard deviation of this method
most severe SLA violations in this scenario. The less guarantee of is the largest. The experimental results of system throughput are
QoS made by this strategy makes it difficult to guarantee the data shown in Fig. 7. It shows that SDAEM-MMQ can achieve the opti-
center SLA on a non-burst dataset. The SDAEM-MMQ and VMTA mal throughput on these two data sets, followed by VMTA. When
strategies will try to start migration before the violation occurs. the PM in the data center is overloaded, it will directly affect
Therefore, they will result in fewer SLA violations. Megh is also the throughput because the total completion time of the system
a scheduling algorithm based on reinforcement learning, so its becomes longer. The SDAEM-MMQ leads to the least number of
performance between energy consumption and QoS optimization overloaded hosts, especially when using the workload-intensive
is relatively balanced. data set (98 world cup). In this scenario, the method proposed
Fig. 5(f) shows that the SDAEM-MMQ algorithm can signifi- in this paper has significantly better throughput than others
cantly reduce the total number of overloaded hosts in the data due to the capture of QoS information in the data center. The
center and be optimized by 4.53%–26.3% compared with other experimental results also validate that SDAEM-MMQ is reliable
strategies. The main reason is that our algorithm can schedule in QoS guarantee.
while guaranteeing QoS execution time constraints. On the con-
trary, when the other algorithms detect resource shortage, the 4.5. Ablation study
host has already been overloaded. Therefore, the algorithm we
proposed has greatly reduced the number of overloaded hosts In order to prove the superiority of the proposed framework
in the data center with guaranteeing SLA. Experimental results for VM scheduling, an ablation study is conducted. The pro-
show that the performance of SDAEM-MMQ in this metric is very posed algorithm SDAEM-MMQ was compared with the following
close to Megh, while it is significantly better than Megh in terms methods.
of energy saving. From the results shown in Fig. 5(d–f), it can
be concluded that the resource management strategy proposed
• AE-MMQ: SDAEM-MMQ with a normal autoencoder compo-
nent.
in this paper has better performance in energy consumption
optimization and QoS guarantee.
• SDAE-MMQ: SDAEM-MMQ without the multilayer percep-
trons.
625
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
Fig. 7. The throughput of the data center: (a) Using 98 World Cup (b) Using 17 UNSW.
Table 5
Results summary in ablation study on 98 World Cup dataset.
98 World Cup 17 UNSW Traffic
Algorithm Energy (kwh) SLAV(%) Overload detection Energy(kwh) SLAV(%) Overload detection
AE-MMQ 12930.51 15.57 9670 1081.45 11.96 8241
SDAE-MMQ 12916.24 13.88 9218 1069.64 10.32 8102
SDAEM-Q 13150.83 13.25 9112 1103.87 9.79 7988
SDAEM-MMQ 12501.12 12.31 8510 1048.63 9.28 7680
• SDAEM-Q: SDAEM-MMQ with a normal Q-learning algo- information of QoS from the historical data of the data center
rithm. more efficiently. The dependency among VMs is also a critical
factor that affects the QoS of the data center. Therefore, We use
This experiment takes 98WorldCup and 17 UNSW Traffic data MLP to extract a deeper QoS feature between the dependencies
sets as examples. Table 5 shows the experimental results of of these VMs. We also proposed a dynamic resource scheduling
different algorithms. Based on the results observed in Table 5, algorithm based on Q-learning. Due to the resource competition
there are several conclusions worth noting: between the PM nodes in the environment, we designed and
proposed a competition-based Q-learning learning algorithm to
• In all cases, SDAEM-MMQ can achieve the best experimental learn the optimal resource scheduling strategy through continu-
results
ous iteration. To improve the learning speed, we have improved
• Deleting SDAE components or multi-layer perceptron com- the learning strategy model to explore all available scheduling
ponents will result in the most significant QoS guarantee
nodes faster and more comprehensively.
performance degradation, which illustrates the key role of
Finally, the experimental results show that the SDAEM-MMQ
SDAEM components in QoS feature extraction
model proposed in this paper has a good performance. The
• Comparing the experimental results of SDAE-MMQ and SDAEM component can effectively extract the QoS feature infor-
SDAEM-Q, it can be found that the MiniMax-Q algorithm has mation of the data center. The process of learning the optimal
a significant effect on the reduction of data center energy scheduling selection strategy can be used on the cloud plat-
consumption, while the deep neural network layer is more form. A good balance is achieved between optimizing energy
focused on QoS optimization. consumption and QoS.
By comparing the SDAEM-MMQ, SDAE-MMQ and AE-MMQ
CRediT authorship contribution statement
models, this shows that the structure design of our proposed
model is robust. SDAE has learned the important QoS features
Bin Wang: Investigation, Conceptualization, Methodology,
in the historical information of the data center. A multi-layer
Software, Data curation, Validation, Writing – original draft.
perceptron with multiple encoders connected together can fur-
Fagui Liu: Resources, Writing – review & editing, Supervision.
ther extract the characteristics of the dependency between VMs
Weiwei Lin: Formal analysis, Writing – review & editing.
that affect QoS, which can make the MiniMax-Q algorithm choose
more effective actions. Declaration of competing interest
2019B030302002, in part by the Science and Technology Major [25] B. Zeng, C. Li, Improved multi-variable grey forecasting model with a
Project of Guangzhou under number 202007030006, in part by dynamic background-value coefficient and its application, Comput. Ind.
Eng. 118 (2018) 278–290.
the National Natural Science Foundation of China under Grant
[26] A. Ashraf, I. Porres, Multi-objective dynamic virtual machine consolidation
61772205 and Grant 61872084, in part by the Industrial Devel- in the cloud using ant colony system, Int. J. Parallel Emergent Distrib. Syst.
opment Fund Project of Guangzhou under Project X2JSD8183470, 33 (1) (2018) 103–120.
and in part by the Engineering and Technology Research Center [27] X.-N. Shen, L.L. Minku, N. Marturi, Y.-N. Guo, Y. Han, A Q-learning-
of Guangdong Province for Logistics Supply Chain and Internet of based memetic algorithm for multi-objective dynamic software project
scheduling, Inform. Sci. 428 (2018) 1–29.
Things under Project GDDST[2016]176. [28] S. Schmitt, J.J. Hudson, A. Zidek, S. Osindero, C. Doersch, W.M. Czarnecki,
J.Z. Leibo, H. Kuttler, A. Zisserman, K. Simonyan, et al., Kickstarting deep
References reinforcement learning, 2018, arXiv preprint arXiv:1803.03835.
[29] A. Carie, M. Li, C. Liu, P. Reddy, W. Jamal, Hybrid directional CR-MAC based
[1] M. Dabbagh, B. Hamdaoui, M. Guizani, A. Rayes, An energy-efficient VM on Q-learning with directional power control, Future Gener. Comput. Syst.
prediction and migration framework for overcommitted clouds, IEEE Trans. 81 (2018) 340–347.
Cloud Comput. 6 (4) (2016) 955–966. [30] L. Ren, W. Wang, An SVM-based collaborative filtering approach for Top-
[2] J.N. Witanto, H. Lim, M. Atiquzzaman, Adaptive selection of dynamic N web services recommendation, Future Gener. Comput. Syst. 78 (2018)
VM consolidation algorithm using neural network for cloud resource 531–543.
management, Future Gener. Comput. Syst. 87 (2018) 35–42. [31] Z. Chen, L. Shen, F. Li, Your neighbors are misunderstood: On modeling
[3] S.K. Mishra, D. Puthal, B. Sahoo, P.P. Jayaraman, S. Jun, A.Y. Zomaya, accurate similarity driven by data range to collaborative web service QoS
R. Ranjan, Energy-efficient VM-placement in cloud data center, Sustain. prediction, Future Gener. Comput. Syst. 95 (2019) 404–419.
Comput. Inform. Syst. 20 (2018) 48–55. [32] W. Chen, F. Cai, H. Chen, M.D. Rijke, Joint neural collaborative filtering for
[4] P. Delforge, America’s Data Centers are Wasting Huge Amounts of Energy, recommender systems, ACM Trans. Inf. Syst. (TOIS) 37 (4) (2019) 1–30.
Natural Resources Defense Council (NRDC), 2014, pp. 1–5. [33] J. Zhu, P. He, Z. Zheng, M.R. Lyu, Online QoS prediction for runtime service
[5] S. Zhang, Z. Qian, Z. Luo, J. Wu, S. Lu, Burstiness-aware resource reservation adaptation via adaptive matrix factorization, IEEE Trans. Parallel Distrib.
for server consolidation in computing clouds, IEEE Trans. Parallel Distrib. Syst. 28 (10) (2017) 2911–2924.
Syst. 27 (4) (2015) 964–977. [34] X. Zhu, X.-Y. Jing, D. Wu, Z. He, J. Cao, D. Yue, L. Wang, Similarity-
[6] Y. Peng, D.-K. Kang, F. Al-Hazemi, C.-H. Youn, Energy and QoS aware maintaining privacy preservation and location-aware low-rank matrix
resource allocation for heterogeneous sustainable cloud datacenters, Opt. factorization for QoS prediction based web service recommendation, IEEE
Switch. Netw. 23 (2017) 225–240. Trans. Serv. Comput. (2018).
[7] A. Vallejo, A. Zaballos, J.M. Selga, J. Dalmau, Next-generation QoS control [35] G. White, A. Palade, C. Cabrera, S. Clarke, Autoencoders for QoS predic-
architectures for distribution smart grid communication networks, IEEE tion at the edge, in: 2019 IEEE International Conference on Pervasive
Commun. Mag. 50 (5) (2012) 128–134. Computing and Communications, PerCom, IEEE, 2019, pp. 1–9.
[8] F. Farahnakian, A. Ashraf, T. Pahikkala, P. Liljeberg, J. Plosila, I. Porres, H. [36] A.A. Khan, M. Zakarya, R. Khan, Energy-aware dynamic resource manage-
Tenhunen, Using ant colony system to consolidate VMs for green cloud ment in elastic cloud datacenters, Simul. Model. Pract. Theory 92 (2019)
computing, IEEE Trans. Serv. Comput. 8 (2) (2014) 187–198. 82–99.
[9] J. Wang, P. Li, K. Fang, Y. Zhou, Robust optimization for household load [37] X.-S. Yang, Nature-Inspired Optimization Algorithms, Academic Press, 2020.
scheduling with uncertain parameters, Appl. Sci. 8 (4) (2018) 575. [38] M. Elhoseny, A. Abdelaziz, A.S. Salama, A.M. Riad, K. Muhammad, A.K.
[10] M. Ranjbari, J.A. Torkestani, A learning automata-based algorithm for Sangaiah, A hybrid model of internet of things and cloud computing to
energy and SLA efficient consolidation of virtual machines in cloud data manage big data in health services applications, Future Gener. Comput.
centers, J. Parallel Distrib. Comput. 113 (2018) 55–62. Syst. 86 (2018) 1383–1394.
[11] J. Conejero, O. Rana, P. Burnap, J. Morgan, B. Caminero, C. Carrión, [39] Z. Zhu, G. Zhang, M. Li, X. Liu, Evolutionary multi-objective workflow
Analyzing Hadoop power consumption and impact on application QoS, scheduling in cloud, IEEE Trans. Parallel Distrib. Syst. 27 (5) (2015)
Future Gener. Comput. Syst. 55 (2016) 213–223. 1344–1357.
[12] Y. Sharma, W. Si, D. Sun, B. Javadi, Failure-aware energy-efficient VM [40] F. Liu, Z. Ma, B. Wang, W. Lin, A virtual machine consolidation algorithm
consolidation in cloud computing systems, Future Gener. Comput. Syst. based on ant colony system and extreme learning machine for cloud data
94 (2019) 620–633. center, IEEE Access 8 (2019) 53–67.
[13] J. Chen, X. Wang, S. Zhao, F. Qian, Y. Zhang, Deep attention user-based [41] X. Tang, X. Liao, J. Zheng, X. Yang, Energy efficient job scheduling with
collaborative filtering for recommendation, Neurocomputing 383 (2020) workload prediction on cloud data center, Cluster Comput. 21 (3) (2018)
57–68. 1581–1593.
[14] W. Cai, J. Zheng, W. Pan, J. Lin, L. Li, L. Chen, X. Peng, Z. Ming, [42] Q. Zhang, H. Chen, Y. Shen, S. Ma, H. Lu, Optimization of virtual resource
Neighborhood-enhanced transfer learning for one-class collaborative management for cloud applications to cope with traffic burst, Future Gener.
filtering, Neurocomputing 341 (2019) 80–87. Comput. Syst. 58 (2016) 42–55.
[15] Z. Khan, N. Iltaf, H. Afzal, H. Abbas, Enriching non-negative matrix [43] D. Jukić, G. Kralik, R. Scitovski, Least-squares fitting gompertz curve, J.
factorization with contextual embeddings for recommender systems, Comput. Appl. Math. 169 (2) (2004) 359–375.
Neurocomputing 380 (2020) 246–258. [44] A. Beloglazov, R. Buyya, Managing overloaded hosts for dynamic consoli-
[16] F. Zhuang, Z. Zhang, M. Qian, C. Shi, X. Xie, Q. He, Representation learning dation of virtual machines in cloud data centers under quality of service
via dual-autoencoder for recommendation, Neural Netw. 90 (2017) 83–89. constraints, IEEE Trans. Parallel Distrib. Syst. 24 (7) (2012) 1366–1379.
[17] S. Yi, Z. Lai, Z. He, Y.-m. Cheung, Y. Liu, Joint sparse principal component [45] M. Pelikán, H. Štiková, I. Vrana, Detection of resource overload in
analysis, Pattern Recognit. 61 (2017) 524–536. conditions of project ambiguity, IEEE Trans. Fuzzy Syst. 25 (4) (2016)
[18] C. Zhang, J. Yan, C. Li, R. Bie, Contour detection via stacking random forest 868–877.
learning, Neurocomputing 275 (2018) 2702–2715. [46] C. Qiu, S. Cui, H. Yao, F. Xu, F.R. Yu, C. Zhao, A novel QoS-enabled load
[19] A. Liu, X. Shen, Z. Li, G. Liu, J. Xu, L. Zhao, K. Zheng, S. Shang, Differential scheduling algorithm based on reinforcement learning in software-defined
private collaborative Web services QoS prediction, World Wide Web 22 energy internet, Future Gener. Comput. Syst. 92 (2019) 43–51.
(6) (2019) 2697–2720. [47] X. Chen, F. Zhu, Z. Chen, G. Min, X. Zheng, C. Rong, Resource allocation for
[20] S.M. Moghaddam, M. O’Sullivan, C. Walker, S.F. Piraghaj, C.P. Unsworth, cloud-based software services using prediction-enabled feedback control
Embedding individualized machine learning prediction models for en- with reinforcement learning, IEEE Trans. Cloud Comput. (2020).
ergy efficient VM consolidation within Cloud data centers, Future Gener. [48] H. Lu, C. Gu, F. Luo, W. Ding, X. Liu, Optimization of lightweight task of-
Comput. Syst. 106 (2020) 221–233. floading strategy for mobile edge computing based on deep reinforcement
[21] F.J. Baldan, S. Ramirez-Gallego, C. Bergmeir, F. Herrera, J.M. Benitez, A learning, Future Gener. Comput. Syst. 102 (2020) 847–861.
forecasting methodology for workload forecasting in cloud systems, IEEE [49] D. Cui, Z. Peng, W. Lin, et al., A reinforcement learning-based mixed job
Trans. Cloud Comput. 6 (4) (2016) 929–941. scheduler scheme for grid or iaas cloud, IEEE Trans. Cloud Comput. (2017).
[22] A. Beloglazov, J. Abawajy, R. Buyya, Energy-aware resource allocation [50] J. Jiang, W. Li, A. Dong, Q. Gou, X. Luo, A Fast Deep AutoEncoder
heuristics for efficient management of data centers for cloud computing, for high-dimensional and sparse matrices in recommender systems,
Future Gener. Comput. Syst. 28 (5) (2012) 755–768. Neurocomputing 412 (2020) 381–391.
[23] S.Y.Z. Fard, M.R. Ahmadi, S. Adabi, A dynamic VM consolidation technique [51] Y. Liu, M. Zhai, J. Jin, A. Song, J. Lin, Z. Wu, Y. Zhao, Intelligent online
for QoS and energy consumption in cloud environment, J. Supercomput. catastrophe assessment and preventive control via a stacked denoising
73 (10) (2017) 4347–4368. autoencoder, Neurocomputing 380 (2020) 306–320.
[24] V.D. Reddy, G. Gangadharan, G.S.V. Rao, Energy-aware virtual machine [52] M. Arlitt, T. Jin, 1998 world cup web site access logs, 1998, The Internet
allocation and selection in cloud data centers, Soft Comput. 23 (6) (2019) Traffic Archive, Sponsored By ACM SIGCOMM. http://ita.ee.lbl.gov/html/
1917–1932. contrib/worldCup.html.
627
B. Wang, F. Liu and W. Lin Future Generation Computer Systems 125 (2021) 616–628
[53] A. Sivanathan, H.H. Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Bin Wang received his B.S. degree from South China
Vishwanath, V. Sivaraman, Classifying IoT devices in smart environments University of Technology, Guangzhou, China, in 2014.
using network traffic characteristics, IEEE Trans. Mob. Comput. 18 (8) He is now working toward the Ph.D. degree at the
(2018) 1745–1759. School of Computer Science and Engineering, South
[54] S. Namasudra, Fast and secure data accessing by using DNA computing for China University of Technology, Guangzhou, China. His
the cloud environment, IEEE Trans. Serv. Comput. (2020). research interests include cloud computing, energy
[55] M. Jammal, H. Hawilo, A. Kanso, A. Shami, ACE: Availability-aware efficiency and data mining.
CloudSim extension, IEEE Trans. Netw. Serv. Manag. 15 (4) (2018)
1586–1599.
[56] R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A. De Rose, R. Buyya, Cloudsim:
a toolkit for modeling and simulation of cloud computing environments
and evaluation of resource provisioning algorithms, Softw. - Pract. Exp. 41
Fagui Liu received the M.S. degree from Beihang Uni-
(1) (2011) 23–50.
versity and Ph.D. degree from South China University
[57] C.-H. Hsu, S.W. Poole, Power signature analysis of the specpower_ssj2008
of Technology in 1991 and 2006, respectively. She is
benchmark, in: IEEE International Symposium on Performance Analysis of
currently a professor at the School of Computer Science
Systems and Software, IEEE ISPASS, IEEE, 2011, pp. 227–236.
and Engineering, South China University of Technology,
[58] M. Kowsigan, P. Balasubramanie, An efficient performance evaluation
China. Her current research interests include cloud
model for the resource clusters in cloud environment using continuous
computing, big data and Internet of things.
time Markov chain and Poisson process, Cluster Comput. 22 (5) (2019)
12411–12419.
[59] J.V. Wang, C.-T. Cheng, C.K. Tse, A thermal-aware VM consolidation
mechanism with outage avoidance, Softw. - Pract. Exp. 49 (5) (2019)
906–920.
[60] D. Basu, X. Wang, Y. Hong, H. Chen, S. Bressan, Learn-as-you-go with Megh: Weiwei Lin received his B.S. and M.S. degrees from
Efficient live migration of virtual machines, IEEE Trans. Parallel Distrib. Nanchang University in 2001 and 2004, respectively,
Syst. 30 (8) (2019) 1786–1801. and the Ph.D. degree in Computer Application from
[61] H. Monshizadeh Naeen, E. Zeinali, A. Toroghi Haghighat, Adaptive Markov- South China University of Technology in 2007. Cur-
based approach for dynamic virtual machine consolidation in cloud data rently, he is a professor in the School of Computer
centers with quality-of-service constraints, Softw. - Pract. Exp. 50 (2) Science and Engineering, South China University of
(2020) 161–183. Technology. His research interests include distributed
systems, cloud computing, big data computing and
AI application technologies. He has published more
than 80 papers in refereed journals and conference
proceedings. He is a senior member of CCF.
628