0% found this document useful (0 votes)
56 views22 pages

Paper 11 PDF

Uploaded by

Sebastian Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views22 pages

Paper 11 PDF

Uploaded by

Sebastian Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Journal of Network and Computer Applications 88 (2017) 50–71

Contents lists available at ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

Review

Load-balancing algorithms in cloud computing: A survey MARK


a a,⁎ b
Einollah Jafarnejad Ghomi , Amir Masoud Rahmani , Nooruldeen Nasih Qader
a
Science and Research Branch, Islamic Azad University, Tehran, Iran
b
Computer Science, University of Human Development, Sulaimanyah, Iraq

A R T I C L E I N F O A BS T RAC T

Keywords: Cloud computing is a modern paradigm to provide services through the Internet. Load balancing is a key aspect
Cloud computing of cloud computing and avoids the situation in which some nodes become overloaded while the others are idle
Load balancing or have little work to do. Load balancing can improve the Quality of Service (QoS) metrics, including response
Task scheduling time, cost, throughput, performance and resource utilization.
Hadoop MapReduce
In this paper, we study the literature on the task scheduling and load-balancing algorithms and present a new
classification of such algorithms, for example, Hadoop MapReduce load balancing category, Natural
Phenomena-based load balancing category, Agent-based load balancing category, General load balancing
category, application-oriented category, network-aware category, and workflow specific category. Furthermore,
we provide a review in each of these seven categories. Also. We provide insights into the identification of open
issues and guidelines for future research.

1. Introduction balancing algorithms and mechanisms in cloud environments:

Cloud computing is a modern technology in the computer field to • Milani and Navimipour (2016) have presented a systematic review
provide services to clients at any time. In a cloud computing system, of the existing load balancing techniques. They classified the existing
resources are distributed all around the world for faster servicing to techniques based on different parameters. The authors compared
clients (Dasgupta et al., 2013; Apostu et al., 2013). The clients can some popular load-balancing algorithms and presented their main
easily access information via various devices such as laptops, cell properties, including their advantages and disadvantages. They also
phones, PDAs, and tablets. Cloud computing has faced many chal- addressed the challenges of these algorithms and mentioned the
lenges, including security, efficient load balancing, resource scheduling, open issues. However, their work lacks a discussion regarding the
scaling, QoS management, data center energy consumption, data lock- load balancing and task scheduling techniques in Hadoop
in and service availability, and performance monitoring (Kaur et al., MapReduce that is an issue nowadays.
2014; Malladi et al., 2015). Load balancing is one of the main • Mesbahi and Rahmani (2016) have studied state of the art load
challenges and concerns in cloud environments;(Jadeja and Modi, balancing techniques and the necessary requirements and consid-
2012) it is the process of assigning and reassigning the load among erations for designing and implementing suitable load-balancing
available resources in order to maximize throughput, while minimizing algorithms for cloud environments. They presented a new classifica-
the cost and response time, improving performance and resource tion of load balancing techniques, evaluated them based on suitable
utilization as well as energy saving (Singh et al., 2016; Goyal et al., metrics and discussed their pros and cons. They also found that the
2016). Service Level Agreement (SLA) and user satisfaction could be recent load balancing techniques are focusing on energy saving.
provided by excellent load balancing techniques. Therefore, providing However, their work suffers from the lack of simulating the load
the efficient load-balancing algorithms and mechanisms is a key to the balancing techniques by simulator tools; in addition, a discussion of
success of cloud computing environments. Several researches have open issues and future topics that researchers should focus on is also
been done in the field of load balancing and task scheduling in cloud missing.
environments. However, our studies showed that despite the key role of
load-balancing algorithms in cloud computing, especially in the advent • Kanakala et al. (2015a, 2015b) have analyzed the performance of
of big data, there are a few comprehensive reviews of these algorithms. load balancing techniques in cloud computing environments. They
First, we mention a few recent papers that have reviewed the load- studied several popular load-balancing algorithms and compared


Corresponding author.
E-mail addresses: [email protected] (E. Jafarnejad Ghomi), [email protected] (A. Masoud Rahmani), [email protected] (N. Nasih Qader).

http://dx.doi.org/10.1016/j.jnca.2017.04.007
Received 31 December 2016; Received in revised form 6 March 2017; Accepted 7 April 2017
Available online 08 April 2017
1084-8045/ © 2017 Elsevier Ltd. All rights reserved.
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

them based on metrics such as throughput, speed, complexity, etc.


They concluded that none of the reviewed algorithms were able to
perform well in all the required areas of load balancing. However,
they did not mention the current trend, future works, and open
issues in the field of load balancing in cloud environments.
• Ivanisenko and Radivilova (2015) have studied major load-balan-
cing algorithms in distributed systems. They classified the most used
load-balancing algorithms in distributed systems, including cloud
technology, cluster systems, and grid systems. They also presented a
comparative analysis of different load-balancing algorithms on
various efficiency indicators such as throughput, migration time,
response time, etc. In their work, a description of the main features
of load-balancing algorithms, analysis of their advantages, and
defaults of each type of algorithms is also presented. Nevertheless,
a discussion of challenges, open issues, and future trends is similarly
missing.
• Farrag and Mahmoud (2015) have reviewed intelligent cloud
algorithms for load balancing problems, including Genetic
Algorithms (GA), Ant Colony Optimization (ACO), Artificial Bee
Colony (ABC) and Particle Swarm Optimization (PSO). They also
proposed an implementation of Ant Lion Optimizer (ALO) based
cloud computing environment as an efficient algorithm, which was Fig. 1. The model of load balancing (Gupta et al., 2014).
expected to supply the outcomes in load balancing. The authors
found that these algorithms showed a better performance than run on. VMs process the requests of the users. Users are located all
traditional ones in terms of QoS, response time, and makespan. around the world and their requests are submitted randomly. Requests
However, they did not evaluate their proposed algorithm in different have to be assigned to VMs for processing. Therefore, the task assign-
scales of cloud systems by comparing its results. ment is a significant issue in cloud computing. If some VMs are
overloaded while others are idle or have a little work to do, QoS will
To help the future researchers in the field of load balancing in decrease. With the decreasing of QoS, users become unsatisfied and
designing novel algorithms and mechanisms, we surveyed the litera- may leave the system and never return. A hypervisor or Virtual
ture and analyzed state of the art mechanisms. Therefore, the purpose Machine Monitor (VMM) is used to create and manage the VMs.
of this paper is to survey the existing techniques, describe their VMM provides four operations: multiplexing, suspension (storage),
properties, and clarify their pros and cons. The main goals of this provision (resume), and life migration (Hwang et al., 2013). These
paper are as follows: operations are necessary for load balancing. In Ivanisenko and
Radivilova (2015) it has been mentioned that load balancing has to
• Studying the existing load balancing mechanisms consider two tasks: resource allocation and task scheduling. The result
• Providing a new classification of load balancing mechanisms of these two tasks is the high availability of resources, energy saving,
• Clarifying the advantages and disadvantage of the load-balancing increasing the utilization of resources, reduction of cost of using
algorithms in each class resources, preserving the elasticity of cloud computing, and reduction
• Outlining the key areas where new researches could be done to of carbon emission.
improve the load-balancing algorithms
2.1. Load balancing metrics
The rest of the paper is organized as follows. Section 2 provides a
literature review for the model, metrics, policies, and taxonomy of load- In this subsection, we review the metrics for load balancing in cloud
balancing algorithms. Challenges in cloud-based load-balancing algo- computing. As mentioned before, researchers have proposed several
rithms are explained in Section 3. Section 4 provides a relatively load-balancing algorithms. Literature in load balancing (e.g., Daraghmi
comprehensive review of literature on the existing load balancing et al., 2015; Rastogi et al., 2015; Lua et al., 2011; Randles et al., 2010;
techniques and presents a new classification. Section 5 provides a Abdolhamid et al., 2014; Abdullahi et al., 2015;, Kansal et al.,
discussion of the mentioned techniques and some useful statistics. 2012;Milani and Navimipour, 2016) proposed metrics for applying
Open issues are outlined in Section 6. Finally, in Section 7, we conclude load-balancing algorithms and we summarize them as follows:
our survey and provide future topics.
• Throughput: This metric is used to calculate the number of
2. The load balancing model, metrics, and policies in processes completed per unit time.
literature • Response time: It measures the total time that the system takes to
serve a submitted task.
The model of load balancing is shown in Fig. 1 (Gupta et al., 2014), • Makespan: This metric is used to calculate the maximum comple-
where we can see the load balancer receives users’ requests and runs tion time or the time when the resources are allocated to a user.
load-balancing algorithms to distribute the requests among the Virtual • Scalability: It is the ability of an algorithm to perform uniform load
Machines (VMs). The load balancer decides which VM should be balancing in the system according to the requirements upon
assigned to the next request. The data center controller is in charge of increasing the number of nodes. The preferred algorithm is highly
task management. Tasks are submitted to the load balancer, which scalable.
performs load-balancing algorithm to assign tasks to a suitable VM. • Fault tolerance: It determines the capability of the algorithm to
VM manager is in charge of VMs. Virtualization is a dominant perform load balancing in the event of some failures in some nodes
technology in cloud computing. The main objective of virtualization or links.
is sharing expensive hardware among VMs. VM is a software imple- • Migration time: The amount of time required to transfer a task from
mentation of a computer that operating systems and applications can

51
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

an overloaded node to an under-loaded one.


• Degree of imbalance: This metric measures the imbalance among
VMs.
• Performance: It measures the system efficiency after performing a
load-balancing algorithm.
• Energy consumption: It calculates the amount of energy consumed
by all nodes. Load balancing helps to avoid overheating and there-
fore reducing energy usage by balancing the load across all the
nodes.
• Carbon emission: It calculates the amount of carbon produced by all
resources. Load balancing has a key role in minimizing this metric
by moving loads from underloaded nodes and shutting them down.

2.2. Taxonomy of load-balancing algorithms Fig. 2. State of the art classification of load balancing strategies.

In this subsection, we present the existing classification of load- nodes of the clusters achieve load balancing of the system. Static
balancing algorithms. In some studies (Rastogi, 2015; Mishra et al., algorithms are divided into two categories: optimal, and sub-optimal
2015; Bhatia et al., 2012) load-balancing algorithms were classified (Neeraj et al., 2014). In optimal algorithms, the data center controller
based on two factors: the state of the system and person who initiated determines information about the tasks and resources and the load
the process. Algorithms based on the state of the system are classified balancer can make an optimal allocation in a reasonable time. If the
as static and dynamic. Some static algorithms are Round Robin, Min- load balancer could not calculate an optimal decision for any reason, a
Min and Max-Min Algorithms, and Opportunistic Load Balancing sub-optimal allocation is calculated. In an approximate mechanism, the
(OLB) (Aditya et al., 2015). Some of the dynamic algorithms include load-balancing algorithm terminates after finding a good solution,
examples such as Ant Colony Optimization (ACO) (Nishant et al., namely, it does not search the whole solution space. After that, the
2012), Honey Bee Foraging (Babu et al., 2013), and Throttled (Bhatia solution is evaluated by an objective function. In a heuristic manner,
et al., 2012). Nearly all dynamic algorithms follow four steps (Neeraj load-balancing algorithms make reasonable assumptions about tasks
et al., 2014; Rathore and Chana, 2013; Rathore et al., 2013): and resources. In this way, these algorithms make more adaptive
decisions that are not limited by the assumptions. Algorithms in a
• Load monitoring: In this step, the load and the state of the sender-initiated strategy make decisions on arrival or creation of tasks,
while algorithms in a receiver-initiated strategy make load-balancing
resources are monitored
• Synchronization: In this step, the load and state information is decisions on the departure of finished tasks. In a symmetric strategy,
either sender or receiver makes load-balancing decisions (Daraghmi
exchanged.
• Rebalancing Criteria: It is necessary to calculate a new work et al., 2015; Alakeel et al., 2010; Rathore and Channa, 2011). A state of
the art classification schema is shown in Fig. 2.
distribution and then make load-balancing decisions based on this
new calculation.
• Task Migration: In this step, the actual movement of the data 2.3. Policies in dynamic load-balancing algorithms
occurs. When system decides to transfer a task or process, this step
will run. As mentioned before, dynamic load-balancing algorithms use the
current state of the system. For this purpose, they apply some policies
The characteristics of static algorithms are: (Daraghmi et al., 2015; Kanakala et al., 2014; Alakeel et al., 2010;
Yahaya et al., 2011; Mukhopadhyay et al., 2010; Babu et al., 2013;
1. They decide based on a fixed rule, for example, input load Kumar and Rana, 2015). These policies are:
2. They are not flexible
3. They need prior knowledge about the system. Transfer Policy: This policy determines the conditions under
which a task should be transferred from one node to another.
The characteristics of dynamic algorithms are: Incoming tasks enter the transfer policy, which based on a rule
determines the transfer of the task or processes it locally. This rule
1. They decide based on the current state of the system relies on the workload of each of the nodes. This policy includes task
2. They are flexible re-scheduling and task migration.
3. They improve the performance of the system Selection policy: This policy determines which task should be
transferred. It considers some factors for task selection, including
Dynamic algorithms are divided into two classes: distributed and the amount of overhead required for migration, the number of non-
non-distributed. In the distributed approach, all nodes execute the local system calls, and the execution time of the task.
dynamic load-balancing algorithm in the system and the task of load Location Policy: This policy determines which nodes are under-
balancing is shared among them (Rastogi et al., 2015). The interactions loaded, and transfers tasks to them. It checks the availability of
of the system nodes take two forms: cooperative and non-cooperative. necessary services for task migration or task rescheduling in the
In the cooperative form, the nodes work together to achieve a common targeted node.
objective, for example, to decrease the response time of all tasks. In the Information Policy: This policy collects all information regarding
non-cooperative form, each node works independently to achieve a the nodes in the system and the other policies use it for making their
local goal, for example, to decrease the response time of a local task. decision. It also determines the time when the information should be
Non-distributed algorithms are divided into two classes: centralized gathered. The relationships among different policies are as follows.
and semi-distributed. In the centralized form, a single node called the Incoming tasks are intercepted by the transfer policy, which decides
central node executes the load-balancing algorithms and it is comple- if they should be transferred to a remote node for the purpose of load
tely responsible for load balancing. The other nodes interact with the balancing. If the task is not eligible for transferring, it will be
central node. In the semi-distributed approach, nodes in the system are processed locally. If the transfer policy decides that a task should be
divided into clusters and each cluster is of centralized form. The central transferred, the location policy is triggered in order to find a remote

52
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

Table 1
Summary of load balancing policies.

Policy Transfer policy Selection policy Location policy Information policy

Description Includes: Factors for selection a task to • Find suitable partner for transfer task. • Determine the time when the information
transfer: • Checks the availability of the services
necessary for migration within the Partner.
about nodes has to gather.

• task re-scheduling • Overhead of migration. • There of three types of information


• task migration • A number of the remote- policy:
• Based on thresholds in terms system calls. 1. Demand-driven policy.
of load units. • The
task.
execution time of the 2. Periodic policies.
3. State-change driven policy.

node for processing the task. If a remote partner is not found, the 3.5. Emergence of small data centers in cloud computing
task will be processed locally, otherwise, the task will be transferred
to the remote node. Information policy provides the necessary Small data centers are cheaper and consume less energy with
information for both transfer and location policies to assist them respect to large data centers. Therefore, computing resources are
in making their decisions. These descriptions are summarized in distributed all around the world. The challenge here is to design
Table 1. load-balancing algorithms for an adequate response time.

3.6. Energy management


3. Challenges in cloud-based load balancing
Load-balancing algorithms should be designed to minimize the
Review of the literature shows that load balancing in cloud
amount of energy consumption. Therefore, they should follow the
computing has faced some challenges. Although the topic of load
energy-aware task scheduling methodology (Vasic et al., 2009).
balancing has been broadly studied, based on the load balancing
Nowadays, the electricity used by Information Technology (IT) equip-
metrics, the current situation is far from an ideal one. In this section,
ment is a great concern. In 2005, the total energy consumed by IT
we review the challenges in load balancing with the aim of designing
equipment was 1% of total power usage in the world (Koomey et al.,
typical load balancing strategies in the future. Some studies have
2008). Google data centers have consumed 260 million Watts of energy
mentioned challenges for the cloud-based load balancing (Palta and
that is equal to 0.01% of the world's energy [37]. Research has shown
Jeet, 2014; Nuaimi et al., 2012; Kanakala and Reddy, 2015a, 2015b;
that on an average, 30% of cloud servers exploit 10–15% of the
Khiyaita et al., 2012; Ray and Sarkar, 2012; Sidhu and Kinger, 2013),
resource capacity. Limited resource utilization increases the cost of
including:
cloud center operations and power usage (Vasic et al., 2009; Koomey
et al., 2008). Due to the tendency of organizations and users to use
3.1. Virtual machine migration (time and security) cloud services, in the future, the installations of cloud providers will
expand and thus the energy usage in this industry will increase rapidly.
The service-on-demand nature of cloud computing implies that This increase in energy usage not only increases the cost of energy but
when there is a service request, the resources should be provided. also increases carbon-emission. If the number of servers in data centers
Sometimes resources (often VMs) should be migrated from one reaches a threshold, their power usage can be as much as that of a city.
physical server to another, possibly on a far location. Designers of High energy consumption has become a major concern for industry
load-balancing algorithms have to consider two issues in such cases: and society (Kansal et al., 2012).
Time of migration that affects the performance and the probability of What is the role of load balancing mechanisms in energy efficiency?
attacks (security issue). In this section, we answer this question. Our survey of the literature
[Ahmad et al., 2015; Vasic et al., 2009; Koomey et al., 2008) clarified
that developing energy-saving approaches in load balancing is on the
3.2. Spatially distributed nodes in a cloud
way. Load-balancing algorithms can be designed in ways that maximize
the utilization of a physical server. For this purpose, they monitor the
Nodes in cloud computing are distributed geographically. The
permanent workload of servers and migrate VMs from under-loaded
challenge in this case is that the load balancing algorithms should be
physical servers to other servers and force some of the servers to enter
designed so that they consider parameters such as the network
a sleep state (shrinking the set of active machines). In Vasic and
bandwidth, communication speeds, the distances among nodes, and
Barisits (2009) it has been shown that energy efficiency reaches a peak
the distance between the client and resources.
in full utilization of a machine. Energy efficient load balancing
mechanisms have to make a certain contribution to power manage-
3.3. Single point of failure ment too. In this way, load-balancing mechanisms are necessary for
achieving green computing in a cloud. In green computing, two factors
As mentioned in Section 2, some of the load-balancing algorithms are important: Energy usage reduction and carbon emission reduction.
are centralized. In such cases, if the node executing the algorithm
(controller) fails, the whole system will crash because of that single 4. Survey on existing load balancing mechanisms
point of failure. The challenge here is to design distributed or
decentralized algorithms. In this section, we survey the literature on the existing mechanisms
for load balancing in cloud environments. For this purpose, we studied
a number of journals and conference proceedings to present a new
3.4. Algorithm complexity
classification of them. We have classified the existing mechanisms into
seven categories:
The load-balancing algorithms should be simple in terms of
implementation and operation. Complex algorithms have negative
effects on the whole performance. • Hadoop MapReduce load balancing category (HMR-category in this

53
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

a pluggable module in Hadoop, and users can design their own


dispatchers according to their actual application requirements (Khalil
et al., 2013). Researchers have developed several scheduling algo-
rithms for the MapReduce environment that contribute to the load
balancing (Manjaly et al., 2013; Patel et al., 2015; Dagli et al., 2014;
Selv et al., 2016). In addition, several load-balancing algorithms are
developed as a plugin to standard MapReduce component of Hadoop.
As mentioned before, any strategy used for an even load distribution
among processing nodes is called load balancing. The main purpose of
load balancing is to keep all processing nodes in use as much as
possible, and not to leave any resources in an idle state while some
other resources are being overloaded. Conceptually, a load-balancing
algorithm implements a mapping function between the tasks and
processing nodes (Destanoğlu et al., 2008). According to this definition
Fig. 3. The architecture of Hadoop. of load balancing, scheduling algorithms do the task of load balancing.
For this reason, we first surveyed and analyzed the load balancing
paper) schedulers in Hadoop.
• Natural Phenomena-based load balancing category (NPH-based in
this paper) 4.1.1.1. FIFO scheduling. FIFO is the default scheduler in Hadoop
• Agent-based load balancing category (Agent-based in this paper) that operates on a queue of jobs. In this scheduler, each job is divided
• General load balancing category (GLB-category in this paper) into individual tasks that are assigned to a free slot for processing
• Application oriented load balancing (AOLB-category in the paper) (Shaikh et al., 2017; Li et al., 2015). A job dominates the whole cluster
• Network-aware task scheduling and load balancing (NATSLB-cate- and only after finishing a job, the next job can be processed. Therefore,
gory in the paper) in this scheduler job wait time, especially for short jobs, increases and
• Workflow specific scheduling algorithms (WFSA-category in the no jobs could be preempted. The default FIFO job scheduler in Hadoop
paper) assumes that the submitted jobs are executed sequentially under a
homogeneous cluster. However, it is very common that MapReduce is
In the next subsections, we will address each category. being deployed in a heterogeneous environment; the computing and
data resources are shared for multiple users and applications.
4.1. An Introduction to Hadoop MapReduce

A large volume of data is produced daily, for example from, 4.1.1.2. Fair scheduler. Facebook developed the fair scheduler
Facebook, Twitter, Telegram, and WEB. These data sources together (Zaharia et al., 2009). In this algorithm, jobs are entered into pools
form big data. Hadoop is an open source framework for the storage and (multiple queues) and in the case of multiple users; one pool is
processing of big data on clusters of commodity machines (Hefny et al., assigned to each user. Fair scheduler distributes the available
2014; Chethana et al., 2016; Dsouza et al., 2015). We have summarized resources among the pools and tries to give each user a fair share of
the architecture of Hadoop in Fig. 3. Hadoop consists of two core the cluster over time, with each pool allocated a minimum number of
components, namely Hadoop Distributed File System (HDFS) for data Map and Reduce slots. If there are free slots in an idle pool, they may
storage and MapReduce for data processing. HDFS and MapReduce be allocated to other pools, while extra capacity in a pool is shared
follow master/slave architecture. A master node in HDFS is called among the jobs. In contrast to FIFO, the fair scheduler supports
NameNode and slaves or workers are called DataNodes. For storing a preemption, therefore if a pool has not received its fair share for a long
file, HDFS splits it into fixed-size blocks (i.e., 64 MB per block) and time, then the scheduler will preempt tasks in pools running over
sends them to DataNodes. NameNode does mapping of blocks to capacity in order to give the slots to the pool running under capacity. In
workers. In MapReduce, the master node is called a JobTracker and this way, a long batch job cannot block short jobs for a long time
slaves are called TaskTrakers. User's jobs are delivered to the (Polato et al., 2014; Xia et al., 2011; Zaharia et al., 2008).
JobTracker that is responsible for managing the jobs over a cluster
and assigning tasks to TaskTrackers. MapReduce provides two inter-
faces called Map and Reduce for parallel processing. In general, the 4.1.1.3. Capacity scheduler. Yahoo! developed the Capacity scheduler
Map and Reduce functions divide the data that they operate on for load to guarantee a fair allocation of resources among a large number of
balancing purposes (Sui et al., 2011). TaskTracker executes each map cluster users (Zaharia et al., 2009). For this purpose, it uses queues
and reduce task in a corresponding slot. Nodes in Hadoop spread over with a configurable number of task slots (Map or Reduce). Available
racks contained in one or several servers. resources are assigned to queues according to the priorities. If there are
free resources in some queues, they are allocated to other queues
4.1.1. Load balancing schedulers in Hadoop (Hefny et al., 2014; Chethana et al., 2016; Polato et al., 2014). Within a
Hadoop simplifies cluster programming as it takes care of load queue, the priority of jobs is determined based on the job arrival time,
balancing, parallelization, task scheduling, and fault tolerance auto- class of the job, and priority settings for users according to the Service
matically (Chethana et al., 2016; Vaidya et al., 2012; Rao et al., 2011). Level Agreement (SLA). When a slot in a TaskTracker becomes free, the
In other words, MapReduce, as the Google privacy strategy, hides the scheduler chooses a job with the longest waiting time from a queue
details of parallelization and distribution. Scheduling in Hadoop with the lowest load. Therefore, the capacity scheduler enforces cluster
MapReduce is achieved at two levels: job level and task level (Dsouza sharing among users, rather than among jobs, as is the case in the fair
et al., 2015). In job level scheduling, jobs are selected from a job queue scheduler (Dsouza et al., 2015; Gautam et al., 2015).
(based on a scheduling strategy); in task-level scheduling, tasks of the
job are scheduled. Scheduling strategies decide when and which
machine a task is to be transferred for processing (load balancing).
4.1.1.4. Delay scheduler. The delay scheduler is an optimization of the
Hadoop uses First-In-First-Out (FIFO) strategy as its default schedul-
fair scheduler, which eliminates the locality issues of the latter (Zaharia
ing, but it is pluggable for new scheduling algorithms. The scheduler is

54
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

et al., 2010). We consider a scenario in which a slot becomes free and

not consider the actual workload of nodes for task scheduling

• set a queue acquire large amount of information about system to

• the job may not execute in the system due to its deadline constraints
we have to select a task of the job in front of a queue to process. It is

speculative tasks wastes the work performed by them


possible that the data needed by this task does not exist on the node
with a free slot. This is a locality problem. In the delay scheduler, this

data locality when running a small job on cluster

not suitable for environment with dynamic loading


task is temporarily delayed until a slot in a node with the needed data

job deadline can affect the response time


becomes free. If the delayed time becomes long enough, to avoid
starvation, the non-local task is allowed to schedule (Manjaly et al.,

one job at a time uses cluster resources


2013).

speculative tasks is instantaneous


considering priority or job size

nodes are assumed


multiuser execution
4.1.1.5. Longest Approximate Time to End (LATE). The LATE

not ensure reliability


which may result imbalance
scheduler was developed to improve the job response time on

the past information

not ensure reliability


Hadoop in heterogeneous environments (Lee et al., 2011). Some
tasks may progress slowly due to CPU high load, race condition,
temporary slowdown due to background processes, or slow background

• Homogeneous
• Users should
processes. These tasks are called speculative tasks. The LATE scheduler

• No-support
Disadvantages
tries to find a slow task and execute an equivalent backup task on

• Dynamic
• It does

• Killing
• Killing
• ItIs use
another node. This execution is called speculative execution. If the new

• Does

• Does
• Only
• Low
• No
copy of the task executes faster, the whole job performance will
improve. The LATE Scheduler assigns priorities to slow or failed
tasks for speculative execution and then selects the fastest nodes for

response time for both short and long

job

the performance by reducing

• Works better for large clusters utilization


for heterogeneous environment
resources among jobs fairly
that speculative execution. LATE scheduling improves the response

through
time of Hadoop in heterogeneous environments.

time may decrease


be used in large clusters
resources utilization

on increasing system
and efficient
QoS
4.1.1.6. Deadline constraint scheduler. The deadline constraint

implementation
scheduler was designed to satisfy the user constraints (Kc et al.,
2010). The goals of this scheduler are: (1) to be able to give users

the

response time

• Job response
immediate feedback on whether the job can be completed within the

classification

• ItIt iscanflexible
• Distributes
• Improves
• Improves
• Improves
given deadline or not and proceed with the execution if the deadline
Advantages

• Efficient

• Suitable
• Simple

• Focus
can be met. Otherwise, users have the option to resubmit with modified

• Fast
jobs
deadline requirements, (2) to maximize the number of jobs that can be


run in a cluster while satisfying the time requirements of all jobs
[Dsouza et al., 2015, 2015). Experiment results showed that when

No when job fails


deadlines for the job is different, then the scheduler assigns a different
Preemptive

number of tasks to Tasktracker and makes sure that the specified


deadline is met.
Yes

Yes

Yes

Yes
No

We have thoroughly investigated and analyzed the scheduling


Throughput

algorithms in Hadoop. Our observations are summarized in Table 2.


The analysis table contains the names of the algorithms proposed by
High

High

High

High

High
Low

researchers, parameters that they have tried to improve, advantages


and disadvantages, and the tools through which they have simulated
Job allocation

their experiments.
Dynamic
An overview of the current load balancing scheduler in Hadoop MapReduce.

Static

Static

Static

Static

Static

4.1.2. MapReduce optimization for load balancing


In this subsection, we review some of the algorithms proposed for
Adaptive

MapReduce load balancing. In the standard Hadoop MapReduce, each


data file is divided into fixed-sized blocks and each block has three
Yes

Yes

Yes

Yes

Yes
No

replicas on three different DataNodes with two rules: (1) no two copies
Starvation

are on the same DataNode, (2) no two copies are on the same rack,
provided that there are enough racks. However, in replica placement,
May
Yes

No

No

No

No

the current load of DataNodes is irrelevant. A built-in tool called the


balancer executes repeatedly, the balancer moving data blocks from
Utilization

the overloaded DataNodes to under-loaded ones (Lin et al., 2015). The


High

High

High

High

High

balancer tool is used to balance an imbalanced cluster, but it would be


Low

better if we could keep the cluster as balanced as possible from scratch.


Load balancing

Furthermore, using the balancer tool to load migration consumes a lot


of system resources. Therefore, several researches have tried to provide
Relatively

load-balancing techniques in the Hadoop environment; we have


reviewed some of them here.
Yes

Yes

Yes

Yes
No


Algorithm

Valvåg et al. (2011, 2009)) proposed Cogset, a unified engine, for


Deadline
Capacity
Table 2

static load balancing. The authors have found that the loose coupling
LATE

Delay
FIFO

Fair

between HDFS and MapReduce engine is the cause of poor data

55
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

locality for many applications. Rather than viewing the file system decreased substantially by adopting the algorithm.
and execution engine as separate and loosely coupled components, • Vernica et al. (2012) proposed a suite of adaptive techniques to
Cogset combines them closely into a distributed storage system that improve the MapReduce performance. The authors have ignored the
supports parallel processing of data at the actual storage nodes. key assumption of MapReduce that mappers run in isolation. They
Cogset consists of two stages: (1) data storage is distributed over the used an asynchronous channel called the Distributed Meta Data
cluster through partitioning and replications stage, (2) data access is Store (DMDS) to share the situation information between mappers.
achieved through a traversal stage. Due to the importance of load They used these mappers, called Situation-Aware-Mappers (SAMs),
balancing and fault tolerance, the replication mechanism is an to make traditional MapReduce more dynamic: (1) Adaptive
integral part of Cogset. The work provided a system with signifi- Mappers, (2) Adaptive Combiners, (3) Adaptive Sampling and
cantly better performance than Hadoop, in particular for small and Partitioning. Adaptive Mappers merge small partitions into a virtual
moderate data volumes; it is not fully scalable. split thus making more splits that avoid frequent check pointing and
• Ahmad et al. (2012) proposed Tarazu, a suite of optimizations of load imbalance (Doulkeridis et al., 2013). Adaptive Combiners
MapReduce, to address the problem of poor performance of perform a hash-based aggregation instead of sort-based ones. In
MapReduce in heterogeneous clusters. The authors believe that contrast to standard MapReduce, Adaptive Sampling creates local
the poor performance of MapReduce is due to two factors: (1) sampling dynamically, aggregates them, and produces a histogram.
MapReduce causes excessive and burst network communication, (2) Adaptive Partitioning can exploit the global histogram to produce
heterogeneity amplifies the Reduce load imbalance (Fadika et al., partitions of the same size for better load balancing. Although SAMs
2011). Tarazu consists of (1) Communication-Aware Load can solve the data skew problem, they cannot solve the computa-
Balancing of Map computation (CALB) across the nodes, (2) tional skew in reducers (Shadkam et al., 2014). Experimental
Communication-Aware Scheduling of Map computation (CAS) to evaluation showed that the adaptive techniques dramatically im-
avoid burst network traffic, and (3) Predictive Load Balancing of prove the MapReduce performance and especially performance
Reduce computation (PLB) across the nodes. Authors showed by stability.
simulation that using Tarazu significantly improves the performance • Yang and Chen (2015) proposed an adaptive task allocation
over a traditional Hadoop MapReduce in heterogeneous clusters. scheduler to improve MapReduce performance in heterogeneous
• Kolb et al. (2011) proposed a block-based load-balancing algorithm, clouds. The paper makes improvements on the original speculative
BlockSplit, to reduce search space of Entity Resolution (ER). ER is execution method of Hadoop (called Hadoop Speculative) and LATE
the task of identifying entities referring to the same real-world Scheduler by proposing a new scheduling scheme known as
object. ER techniques usually compare pairs of entities by evaluating Adaptive Task Allocation Scheduler (ATAS). The ATAS adopts more
multiple similarity measures. They utilize a blocking key based on accurate methods to determine the response time and backup tasks
the values of one or several entity attributes to divide the input data that affect the system, which is expected to enhance the success ratio
into multiple partitions (blocks) and restrict the subsequent match- of backup tasks and thereby effectively increase the system's ability
ing to entities of the same block. For example, it is sufficient to to respond. Simulation experiments showed that the proposed ATAS
compare entities of the same manufacturer when matching product scheme could effectively enhance the processing performance of
offers. The BlockSplit approach takes the size of the blocks into MapReduce.
account and assigns entire blocks to reduce tasks if this does not • Bok et al. (2016) proposed a scheduling scheme to minimize the
violate the load balancing constraints. Larger blocks are split into deadline miss of jobs to which deadlines are assigned when
smaller chunks based on the input partitions to enable their parallel processing large multimedia data such as video and image in
matching within multiple Reduce tasks (Kolb et al., 2012). The MapReduce frameworks. The proposed scheme improves job task
evaluation in a real cloud environment demonstrated that the processing speed by utilizing a replica node of the same data
proposed algorithm was robust against data skew and scaled with required to process jobs if a node where I/O load is excessive is
the number of available nodes. about to process the jobs. A replica node refers to another node that
• Hsueh et al. (2014) proposed a block-based load-balancing algo- has the data block required to process jobs at available nodes. If
rithm for Entity Resolution with multiple keys in MapReduce. available nodes are not found despite the expected job completion
Actually, the authors extended the BlockSplit algorithm presented time exceeding the deadline, the most non-urgent job is searched
in Kolb et al. (2011) by considering more than one blocking key. In and the corresponding job task is temporarily suspended to fasten
their algorithm, the load distribution in the Reduce phase is more the job completion time. The performance evaluation result showed
precise because an entity pair may exist in a block only when the that the proposed scheme reduced completion time and improved
number of common blocking keys between the pair exceeds a certain the deadline success ratio.
threshold (i.e., kc). Since an entity may have more than one kc key, it • Ghoneem and Kulkarni (2016) introduced an adaptive scheduling
needs to generate all the combinations of kc keys for potential key technique for MapReduce scheduler to increase efficiency and
comparisons. The proposed algorithm features in the combination- performance when it is used in the heterogeneous environment. In
based blocking and load-balanced matching. Experiments using the this model, we make the scheduler aware of cluster resources and
well-known CiteSeerX digital library showed that the proposed job requirement by providing the scheduler with a classification
algorithm was both scalable and efficient. algorithm. This algorithm classifies jobs into two categories execu-
• Hou et al. (2014) proposed a dynamic load-balancing algorithm for table and non-executable. Then the executable jobs are assigned to
Hadoop MapReduce. Their algorithm balances the workload on a the proper nodes to be executed successfully without failures, which
rack, while previous works tried to load balance between individual increase the execution time of the job. This scheduler overcomes the
DataNodes. In the standard MapReduce and its optimizations, there problems of previous schedulers such as small job starvation, a
was no way for Hadoop to guarantee that higher capability racks sticky node in fair scheduler, and the mismatch between resource
have more workload than lower capability racks. In other words, and job. The adaptive scheduler increase performance of
when assigning workload to DataNodes, the processing capacity was MapReduce model in the heterogeneous environment while mini-
irrelevant. Their work has two novelties: (1) They concentrate on mizing master node overhead and network traffic.
load balancing between racks; (2) They use Software Defined • Benifa and Dejey (2017) proposed a scheduling strategy named
Network (SDN) to improve the data transfer. The results of efficient locality and replica-aware scheduling (ELRAS) integrated
simulation experiments showed that by moving the tasks from the with an autonomous replication scheme (ARS) to enhance the data
busiest rack to a less busy one, the finished time of these tasks locality and performs consistently in the heterogeneous environ-

56
Table 3
An overview of the current load balancing strategies for Hadoop MapReduce.

Year Authors Static/ Key Idea Main Objective Advantages Disadvantages Evaluation techniques Journal/ Conference
Dynamic

2017 Ghoneem Dynamic • Handling heterogeneity • Increasing the performance of • Considering job • Finding content information is • Implementing the scheduler on Proceedings of the
and Kulkarni and scalability of MapReduce using an efficient requirements and node computationally expensive and a cluster consisted of three International Conference on
E. Jafarnejad Ghomi et al.

(2016) Hadoop MapReduce scheduling capabilities time-consuming nodes Data Engineering and
• Reducing makespan Communication Technology
• No starvation (Springer)

2017 Benfia et al. Dynamic data locality in throughput and


• Scalable Wireless personal
(2017)
• Using
scheduler
• Improving
reducing cross-rack
• Effective
resources
utilization of • No considering auto-scaling • A heterogeneous cluster built in
applications for commercial Amazon EC2 Environment as a communication (Springer)
communication • Providing nearly cloud environment testbed
optimal data locality
• Reducing execution
time
• Adaptable for a wide
range of applications
2016 Bok et al. Dynamic • Employing speculative • Minimizing deadline miss of • Reducing completion • No implementation in a real • Performance evaluation was Multimedia tools and
(2016) tasks and block jobs time MapReduce environment conducted with personal Applications (Springer)
replication to avoid • Providing MapReduce • Improving deadline computers whose OS was
deadline miss scheduling schema foe success ratio Windows 7
multimedia data • Considering both I/O
loads in nodes and data
locality
2015 Yang and Dynamic • Improving original LATE • Enhancing MapReduce model • Increasing throughput • No, collect data of run-time • Installing heterogeneous cloud Journal of Network and
Chen (2015) scheduler by a task allocation scheduler • Reducing mean tasks tasks environment by physical and Computer applications
latency virtual machines (Elsevier)

57
• Promote performance • Using VMWare for managing
• Considering nodes
heterogeneity
• Data locality, job types,
and job importance are
considered
• Backup tasks quickly
2014 Hsueh et al. Dynamic • Solving the Entity • Load balancing among • Using multiple keys is • It may lead to duplicated • Experiments performed in a 30- Twelfth Australian
(2014)] Resolution problem for reducers used to sort the comparison nodes cluster which contains symposium
a huge collection of • Reducing the response time of entities, so the matching three types of nodes,
entities with multiple a job step can be accelerated
blocking keys • Efficiently solves the ER
problem
2014 Hou et al. Dynamic • Balancing the workload • Decreasing the completion • Increasing the entire • Moving data between racks • Using Mumak which is Apache's IEEE, Fourth International
(2014) between different racks time of job tasks. performance of Hadoop make communication overhead Hadoop Map-Reduce simulator Conference on Big Data and
on a Hadoop cluster by • Maintain load balancing of • Decreasing task and consumes network Cloud Computing
considering the clusters completion time bandwidth
capability of DataNodes • Increasing the Map-Reduce • Do load balancing
performance between racks rather
than Data Nodes
2012 Ahmad et al. Dynamic • Proposing a suite of • Eliminating the load • Eliminate the bottleneck • Tarazu considers clusters with • Using a heterogeneous cluster of Proceedings of the
(2012) optimization for Map- balancing problem of due to shuffle or Map two classes of hardware: Atom, 90 servers comprising 10 Xeon- seventeenth international
Reduce traditional Map-Reduce phase Xeon, while literature showed based and 80 Atom-based server conference on Architectural
• Eliminating the • Load balancing in Map- that hardware in clusters has nodes Support for Programming
communication overhead of Side closely-related performance Languages and Operating
traditional MapReduce in • Load balancing in Systems (ACM)
heterogamous clusters Reduce-Side
2012 Vernica et al. Dynamic • Breaking the key Improving cluster Adding new runtime • Situation-Aware Mappers • Running experiments on a 42- ACM, International
(2012) assumption of isolation
• performance and simplify job • options to Hadoop and continuously monitor the node IBM system x iDataPlex Conference on Extending
(continued on next page)
Journal of Network and Computer Applications 88 (2017) 50–71
Table 3 (continued)

Year Authors Static/ Key Idea Main Objective Advantages Disadvantages Evaluation techniques Journal/ Conference
Dynamic

execution of Mappers in tuning made them Situation execution of mappers. However, dx340. Each node had two quad- Database Technology
standard Map-Reduce • Make Map-Reduce more Aware Situation-Aware Mappers core Intel Xeon E5540 64 bit
dynamics • Flexible Map-Reduce cannot handle computational 2.83 GHz processors, 32 GB
E. Jafarnejad Ghomi et al.

• Tasks in Situation- skew at the reducers. RAM, and four SATA disks. The
Aware mappers can cluster consisted of 336 cores
alter their execution at and 168 disks.
runtime
2011 Kolb et al. Dynamic • Even redistribution of • Increasing the effectiveness • Handling data skew in • Considering one block key for • Running experiments with real- Proceedings of the 20th ACM
(2011) data between map and and scalability of Map-Reduce Map-Reduce any entity world datasets on the Amazon international conference on
reduce tasks • Using blocking-techniques to • Suitable for may lead to imbalance in EC2 cloud computing using Information and knowledge
facilitate entity resolution heterogeneous cluster
• Itreduce phase due to using Hadoop management
• Is used for all kind of different-sized sub-block
paired-wise similarity multiple blocking
computation such as
• Itkeyconsider
as many individual
article comparison blocking key that is time-
consuming
2009 Valvåg et al. Static • Deterministic split of • Increasing the system • Ease of implementation • Reconfiguring the placement of • Using a cluster of 12 Dell Power Sixth IFIP International
(2009) input dataset to some efficiencyDecreasing the • Supports appending to, partitions in the presence of Edge 1995 machines Conference on Network and
partitions request response time readings and deleting failure, entail copying a large interconnected by an HP Parallel Computing (IEEE)
files in a name space amount of data between nodes ProCurve 4208VL with a 24-port
• Avoid bottleneck • Does not consider the 1 Gbps switched Ethernet
• Reducing the layering heterogeneity module
overhead of software
running on top of the
Map-Reduce

58
Journal of Network and Computer Applications 88 (2017) 50–71
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

ment. ARS autonomously decides the data object be replicated by

International Conference on
Publication/ Presentation

Conference on Modeling,
considering its popularity and removes the replica as it is idle. The

Conference on Modeling
Simulation, and Applied

Applied Soft Computing

IEEE 14st International


IEEE 6th International

Intelligence: Modeling

Applications(Elsevier)
results proved the efficiency of the algorithm for heterogeneous
clusters and workloads.

Techniques and

and Simulation
Optimization Now that we have reviewed some approaches to load balancing in

(Elsevier)
MapReduce, it is time to investigate and analyze them. In Table 3, we
have summarized our analysis. The analysis table contains article year,
authors, key ideas, main objectives, advantages and disadvantages,
on

on

on
Evaluation techniques

evaluation techniques, and the journal or conference that the article


CloudSim toolkit

CloudSim toolkit
presented. We also showed the name of the publisher.

No simulation or
CloudAnalyst

implementation
policy for detecting over-/ • Does not provide any security policy for • Simulation

• Simulation

• Simulation
4.2. Natural phenomena-based load balancing category
toolkit

In this section, we have surveyed several load balancing strategies


that are inspired by natural phenomena or biological behavior, for
point of failure due to producing

node make a bottleneck Lake of a mechanism


Minimum Migration Time • VM migration can increase response time
• It does not clear live/dead VM migration

example, Ant-Colony, Honey-Bee, and Genetic algorithms.


Lack of scalabilityLack of throughputHead

for head node selectionIt is not clear the


evaluation method It might cause data


for lower priority load

Yakhchi et al. (2015) proposed a load balancing method in cloud


not consider job priorities

computing for energy saving by simulating the life of a family of


birds called cuckoos. They have used Cuckoo Optimization
bees from single source

Algorithm (COA). The cuckoos are species of birds that do not


No guaranty for QoS

make nests for themselves. Cuckoos lay eggs in the nests of other
Lack of scalability

of scalability
power saving

transmission break
Low throughput
Avoidance

birds with similar eggs to raise their young. For this, cuckoos search
VM migration
Disadvantages

for the most suitable nests to lay eggs in order to maximize their
• Starvation

eggs survival rate (Rajabioun et al., 2011). The load balancing


• Single
• Does

• Waiting time of tasks become minimum • Lack


• SLA

method proposed in the paper consists of three different steps. In


• No

the first step, the COA is applied to detect over-utilized hosts. In the


second step, one or more VMs are selected to migrate from the over-
There is a single result set The task of each
ant is specialized To avoid overloads due to

utilized host to other hosts. For this, they considered all the hosts
Detection of over-/ under-loaded nodes
ant creation, it uses a timer to suicide.

except the over-utilized ones as under-utilized hosts and attempted


suitable for green computing

system performance

to migrate all their VMs to the other host and switch them to sleep
and doing operations accordingly
• Improve resource utilization
the throughput

mode. It must be noted that if this process could not be completed,


job time span

the under-utilized host is kept active. Finally, Minimum Migration


under-utilized hosts

Time (MMT) policy is used for selecting VMs from over-utilized and
migration

under-utilized hosts. The Simulation results demonstrated that the


proposed approach reduced energy consumption. However, the
• for VMMMT

• Maximizing
• Improving
• Reducing

method may cause SLA violation.


Advantages


• Simple
• IsUsing

Dasgupta et al. (2013) proposed a novel load-balancing strategy


using a genetic algorithm (GA). The algorithm tries to balance the
load of the cloud infrastructure while trying to minimize the
Genetic Algorithm • Load balancing of cloud

waiting time of

Load balancing of nodes in cloud


energy

resource

the completion

the foraging • Load balancing VMS for

completion time of a given task set. In the paper, a GA has been


or grid systemsFinding optimal

used as a soft computing approach, which uses the mechanism of


time of a given tasks set

maximizing throughput

natural selection strategy. It is a stochastic searching algorithm


resources to process the

based on the mechanisms of natural selection and genetics. A simple


An overview of the current NPH-based category load balancing techniques.

GA is composed of three operations: (1) selection, (2) genetic


tasks in queue
infrastructure
consumption
Main objectives

submitted jobs

operation, and (3) replacement. The algorithm creates a “popula-


Maximizing

• Minimizing

• Minimizing
utilization
Cuckoo • Reducing

tion” of possible solutions to the problem and lets them “evolve”


over multiple generations to find better and better solutions. The
authors have tried to eliminate the challenge of the inappropriate

distribution of the execution time, which is used to create the traffic


behavior of honey bee for

Optimization (ACO) for load

on the server. Simulation results showed that the proposed algo-


Optimization Algorithm

(GA) for load balancing


for load balancing in

rithm outperformed the existing approaches like First Come First


load balancing VMS
in cloud computing

Serve (FCFS).
cloud computing


balancing in cloud
Using Ant Colony

Nishant et al. (2012) proposed a load-balancing algorithm using the


Ant Colony Optimization (ACO). ACO is inspired from the ant
• Inspiring

computing

colonies that work together in a foraging behavior. Inspired by this


Key Idea

• Using

• Using

behavior, authors of Kabir et al. (2015) have used ACO for load
balancing. In this algorithm, there is a head node that is chosen in
such a way that it has the highest number of neighbor nodes. Ants
Yakhchi et al.

Nishant et al.
et al. (2013)

Babu et al.

move in two directions: (1) Forward movement; where ants move


Dasgupta
Authors

forward in a cloud to gather information about the nodes’ loads, (2)


(2015)

(2013)

(2012)

Backward movement; if an ant finds an under-loaded node (over-


loaded node) on its path, it goes backward and redistributes the load
Table 4

2015

2013

2013

2012
Year

among the cloud nodes. The main benefit of this approach lies in its
detections of over-loaded and under-loaded nodes and thereby

59
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

performing operations based on the identified nodes. select an AM for migration, (3) acceptance policies which determine
• Babu et al. (2013) proposed a honeybee-based load balancing which VMs should be accepted, and (4) a set of load balancing
technique called HBB-LB that is nature-inspired; it is inspired by heuristics of the front-end to select the initial hosts of VMs.
the honeybee foraging behavior. This technique takes into account Simulation experiments showed that agents, through autonomous
the priorities of tasks to minimize the waiting time of tasks in the and dynamic collaboration, could efficiently balance loads in a
queue. This algorithm has modeled the behavior of honeybees in distributed manner outperforming centralized approaches.
finding and reaping food. In cloud computing environments, when- • Keshvadi and Faghih (2016) proposed a multi-agent load balancing
ever a VM is overloaded with multiple tasks, these tasks have to be system in an IaaS cloud environment. Their mechanism performs
removed and submitted to the under-loaded VMs of the same data both receiver-initiated and sender-initiated approach to balance the
center. Inspired by this natural phenomenon, the authors consid- IaaS load to minimize the waiting time of the tasks and guarantee
ered the removal of tasks from overloaded nodes as the honeybees the Service Level Agreement (SLA). The mechanism presented in the
do. When a task is submitted to a VM, it updates the number of paper comprises of three agents: (1) VMM Agent, (2) Datacenter
priority tasks and the load of that VM and informs other tasks to Monitor (DM), and (3) Negotiator Ant (NA). The VMM agent
help them in choosing a VM. Actually, in this scenario, the tasks are collects the CPU, memory and bandwidth utilization of the indivi-
the honeybees and the VMs are the food sources. The experimental dual VM hosted by different types of tasks to monitor the load. A
results showed that the algorithm improved the execution time and table for storing the state of the VMs supports this agent. The DM
reduced the waiting time of tasks on the queue. agent performs information policy in a datacenter by monitoring the
VMM's information. This agent is supported by a table that
We investigated and analyzed the NPH-based category of load- maintains all information about the status and characteristics of
balancing algorithms. The results are presented in Table 4. The analysis all VMs in a datacenter. It categorizes the VMs based on their
table contains article year, authors, key ideas, main objectives, ad- characteristics. DCM agents initiate NA agents. They move to other
vantages and disadvantages, evaluation techniques, and the journal or datacenters and communicate with the DCM agent of those data-
conference that the article presented. We also showed the name of the centers to acquire the status of VMs there, searching for the desired
publisher. configuration. Simulation results showed that the proposed algo-
rithm was more efficient and there was a good improvement in the
4.3. Agent-based load balancing techniques load-balance, response time, and makespan.
• Tasquier (2015) proposed an agent-based load balancer for multi-
In this section, we have reviewed the literature that proposed agent- cloud environments. The author proposed an application-aware,
based techniques for load balancing in cloud nodes. The dynamic multi-cloud, and load-balancer based on a mobile agent paradigm.
nature of cloud computing is suitable for agent-based techniques. An The proposed architecture uses agents to monitor the status of the
agent is a piece of software that functions automatically and continu- cloud infrastructure and detects the overload and/or under-utiliza-
ously decides for itself and figures out what needs to be done to satisfy tion conditions. The multi-agent framework provides provisioning
its design objectives. A multi-agent system comprises a number of facilities to scale the application automatically to the under-loaded
agents, which interact with each other. To be successful, the agents resources and/or to new resources acquired from other cloud
have to able to cooperate, coordinate and negotiate with each other. providers. Furthermore, the agents are able to deallocate unused
Cooperation is the process of working together, coordination is the resources, thus leading to cost saving. The proposed architecture
process of reaching a state in which their actions are well suited, and in consists of three agents: (1) an executor agent, which represents the
negotiation process, they agree on some parameters (Singha et al., application running in multi-cloud environments, (2) a provisioner
2015; Sim et al., 2011). agent, which is responsible for managing the cloud infrastructure
through adding and removing resources, (3) a monitor agent, which
• Singh et al. (2015) proposed a novel autonomous agent-based load- is responsible for monitoring the overload and/or under-utilization
balancing algorithm called A2LB for cloud environments. Their conditions. Users can overview the current state of the cloud
algorithm tries to balance the load among VMs through three environment through an additional agent called controllers.
agents: load agent, channel agent, and migration agent. Load and Moreover, each agent has mobility capabilities in order to migrate
channel agents are static agents whereas migration agent is an ant, themselves autonomously on the multi-cloud infrastructure. The
which is a special category of mobile agents. Load agent controls the proposed algorithm overcame the provider lock-in challenge in the
information policy and calculates a load of VMs after allocating a cloud and it was flexible to exploit the extreme elasticity.
job. A VM Load Fitness table supports the load agent. The fitness
table maintains the list of all details of the VM properties in a data We investigated and analyzed the agent-based load balancing
center such as id, memory, a fitness value, and load status of all techniques. The results are presented in Table 5. The analysis table
VMs. Channel agent controls the transfer policy, selection policy, contains article year, authors, key ideas, main objectives, advantages
and location policy. Finally, the channel agent initiates the migration and disadvantages, evaluation techniques, and the journal or confer-
agents. They move to other data centers and communicate with the ence that the article presented. We also showed the name of the
load agent of that data center to acquire the status of VMs present publisher.
there, looking for the desired configuration. Result obtained through
implementation proved that this algorithm works satisfactorily. 4.4. General load balancing techniques
• Gutierrez-Garcia and Ramirez-Nafarrate (2015) proposed an agent-
based load balancing technique for cloud data centers. The authors In this section, we have surveyed and overviewed the literature in
proposed a collaborative agent-based problem-solving technique the field of general load balancing techniques. Although several
capable of balancing workloads across commodity and heteroge- algorithms are provided in this category, we have focused on new
neous servers by making use of VM live migration. They proposed an ones. For example, techniques such as First-In-First-Out (FIFO), Min-
agent-based load balancing architecture composed of VM agents, Min, Max-Min, Throttled, and Equally Spread Current Execution Load
server manager agents, and a front-end agent. They also proposed (ESCEL) are all belong to this category.
an agent-based load balancing mechanism for cloud environments
composed of (1) migration heuristics that determines which VM • Komarasamy and Muthuswamy (2016) proposed a novel approach
should be migrated and its destination, (2) migration policies to for dynamic load balancing in a cloud environment. They called it

60
E. Jafarnejad Ghomi et al.

Table 5
An overview of agent-based load balancing techniques.

Year Authors Key Idea Main objectives Advantages Disadvantages Evaluation techniques Journal/ Conference

2016 Keshvadi et al. • Using Multiagent paradigm for • Maximizing resource • Increase the resource • Datacenter management ants do • Simulation using CloudSim International Robotics and
(2015) dynamic load balancing across utilization utilization not have a timer for self- toolkit Automation Journal
virtual machines • Load balancing across • Avoid or reduce dynamic destroying and wait for message • Agents are programmed using (MedCrave)
• Using both senders- initiated and virtual machines migration from parent Java language
receiver-initiated approaches • Reducing the response • Reducing the migration
time cost
• Guarantee the SLA • Reducing the waiting time
of tasks in queue
2015 Singh et al. (2015) • Using software agents for load • Load balancing VMs • Improves resource • Includes heavy computations • Implementation using Java International Conference on
balancing in cloud computing • Reducing service time utilization within a • Migration agent does search for technology Advanced Computing
datacenter and multiple available VMs and is time- Technologies and
datacenters consuming Applications (Elsevier)

61
• Reduces response time
2015 Gutierrez- Garcica • Using agent- based problem-solving • Efficient load • Agents do load balancing • does not estimate VM migration • Experiments performed using Cluster Computing (Springer)
and Ramirez- technique for load balancing in a balancing in a using partial information overhead agent-based test-bed such as
Nafarrate (2015) heterogeneous environment, live distributed manner about cloud datacenters • Provide no usage prediction MapLoad and Red Hat
VM migration • Considering the mechanism • Test-bed was implemented in
heterogeneity of servers • high migration overhead Java and JADE agent platform
and VMs a central approach and is
• Itnotisfully scalable
2015 Tasquier ([ (2015) • Using agent-based paradigm for • Multi-cloud load • Using multi-cloud • Not implemented or simulated • Does not implemented or COLUMBIA International
developing an application aware balancing resources for load balancing • Does not consider Quality of simulated Publishing Journal of Cloud
multi-cloud load balancer • Using full elasticity of • Overcoming the provider Service (QoS) computing Research
cloud environments lock-in challenge in cloud
Is flexible to exploit the
• extreme elasticity
Journal of Network and Computer Applications 88 (2017) 50–71
E. Jafarnejad Ghomi et al.

Table 6
An overview of current GLB-category load balancing techniques.

Year Authors Key Idea Main objectives Advantages Disadvantages Evaluation techniques Journal/Conference

2016 Komarasamy and • Using Bin Packing algorithm and • Load balancing virtual • Handles the user requests • Using the second table, • Simulation in Indian journal of Science and
Muthuswamy (2016) VM reconfiguration for load machines during peak situation reservation table is space CloudSim toolkit Technology
balancing in cloud environment • Reducing job waiting • Improves throughput consuming
time • Increases resource • Does not consider energy saving
utilization

2016 Chien et al. (2016)


• Reduces job waiting time using IEEE 18th International
• Using the estimating method of end • Reducing the response • Considers the actual instant
of service time to load balancing time processing power of VMs
• Determining the actual instant • Simulation
processing power is complicated CloudSim toolkit Conference on Advance
VMs • Load balancing VMs and job size to assign jobs to • Causes the problem of energy Communication Technology
• Reducing the processing VMs consumption and carbon
time more effective emissions

62
• IsImproving time
• and processingresponse
time
2015 Domanal and Reddy • Combining the Divide-and- Conquer • Maximizing resource • Maximize resource • Did not simulate in different • Simulation in IEEE International Conference on
(2015) methodology and throttled utilization utilization workload situation CloudSim toolkit cloud computing in emerging
algorithm to load balancing in cloud • Intelligently assign jobs • Reduces total execution time • Does not consider deadline markets
environment to VM for load considerably constraints
balancing
• Reducing the total
execution time of tasks
2015 Kulkarni and BA • Modifying Active VM Algorithm • Uniform allocation of • Load balancing VMs • Does not allocate the load • Simulation using IEEE International Conference on
(2015) implemented in CloudAnalyst to requests to VMs even well during the uniformly to VMs across CloudAnalyst toolkit Signal Processing, Informatics,
load balancing VMs during peak hours
• Itpeakworks
hours datacenters deployed at different Communication and Energy
• Using reservation table between • Reduce the response • Improve the elasticity geographical locations Systems
selection and allocation phases time
Journal of Network and Computer Applications 88 (2017) 50–71
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

dynamic load balancing with effective bin packing and VM reconfi- We have investigated and analyzed the general load balancing
guration (DLBPR). DLBPR maps jobs into VMs based on the category; the results are presented in Table 6. The analysis table
required processing speed of the job. The main objectives of their contains article year, authors, key ideas, main objectives, advantages
work were process the jobs within their deadline and to balance the and disadvantages, evaluation techniques, and the journal or confer-
load among the resources. In the proposed approach, the VMs are ence that the article presented. We also showed the name of the
dynamically clustered as small, medium and large according to publisher.
process speed and the jobs are mapped into a suitable VM existing in
the cluster. The clusters are sometimes overloaded due to the arrival 4.5. Application oriented load balancing techniques
of a similar kind of job. In that situation, the VMs may either split or
integrate the VMs in the data center based on the request of the job In this section, we have surveyed and overviewed the literature in
using a receiver-initiated approach. After reconfiguration, the VMs the field of application-oriented load balancing techniques.
will dynamically regroup based on the processing speed of the VMs.
The proposed methodology is composed of three tiers: (1) web tier, • Wei et al. (2015) proposed an efficient application scheduling in
(2) schedule tier, (3) resource allocation tier. Users’ requests are mobile cloud computing based on MAX–MIN ant system. Firstly,
submitted to the web tier at any arbitrary time, which are forwarded the authors presented a local mobile cloud model with detail
to the scheduler tier. The deadline-based scheduler classifies and application scheduling structure. Secondly, they presented a sche-
prioritizes the incoming jobs. These jobs are processed efficiently by duling algorithm for the mobile cloud model based on MAX–MIN
VMs in the resource allocation tier. The proposed approach auto- Ant System (MMAS). Experiments results showed that the algo-
matically improves the throughput and also increases the utilization rithm could effectively promote the performance of the mobile
of the resources. cloud.
• Domanal and Reddy (2015) proposed a hybrid scheduling algorithm • Wei et al. (2013) defined the Hybrid Local Mobile Cloud Model
for load balancing in a distributed environment by combining the (HLMCM) consisting of cloudlet and mobile devices where cloudlet
methodology of Divide-and-Conquer and Throttled algorithms re- plays the role of a central broker while both neighboring mobile
ferred to as DCBT. The authors defined two scenarios. In scenario 1, devices and cloudlet play the role of service provider. The objective
they deployed a distributed environment that consists of a client, a of application scheduling is to maximize the profit as well as a
load balancer and n nodes, which act as Request Handlers (RH) or lifetime of HLMCM while considering the capacity limitations of
servers. The requests come from different clients and the load service providers. They proposed the Hybrid Ant Colony-based
balancer assigns incoming requests or tasks to the available RHs Application Scheduling (HACAS) algorithm to solve the scheduling
or servers. In scenario 2, the CloudSim simulator was used for problem. The algorithm only considers the available resources and
simulation which consisted of a data center, VMs, servers, and the does not consider overhead when calculating the advantage ratio of
load balancer. Here, the client's requests were coming from the mobile devices for joining the cloudlet. Simulation results revealed
Internet users. In both scenarios, the DCBT algorithm was used for that when the load of the system was heavy, HACAS algorithm could
scheduling the incoming client's requests to the available RHs or select those applications with maximum profit and minimum energy
VMs depending on a load of each machine. The proposed DCBT consumption.
utilizes the VMs more efficiently while reducing the execution time • Deye et al. (2013) proposed an approach to make load balancing
of the tasks. more dynamic to better manage the QoS of multi-instance applica-
• Chien et al. (2016) proposed a novel load-balancing algorithm based tions in the cloud, the approach mainly limits the number of
on the method of estimating the end of service time. In their requests through a load balancer equipped with a queue for
algorithm, they considered the actual instant processing power of incoming user requests at given time to send and process the
VM and size of assigned jobs. They included two factors in the requests effectively. Simulation results showed that the approach
method of estimating the end-of-service time in VMs: (1) the improved the system performance.
selected VM should be able to finish it as soon as possible, (2) on • Sarood et al. (2012) developed techniques that reduce the gap
the next allocation request, the load-balancing algorithm has to between application performance on cloud and supercomputers.
estimate the time that all queuing jobs and the next incoming job are The scheme uses object migration to achieve load balance for tightly
completely done in every VM. The VM that corresponds to the coupled parallel applications executing in virtualized environments
earliest will be chosen to distribute the job. The simulation results that suffer from interfering jobs. While restoring load balance, it not
showed that the proposed algorithm improves response time and only reduces the timing penalty caused by interfering jobs but also
processing time. reduces energy consumption significantly.
• Kulkarni and BA (2015) proposed a novel VM load-balancing
algorithm that ensures a uniform assignment of requests to VMs We have investigated and analyzed the application-oriented load
even during peak hours (i.e., when the frequency of received balancing techniques; the results are presented in Table 7. The analysis
requests in the data center is very high) to ensure faster response table contains article year, authors, key ideas, main objectives, ad-
times to users. They modified the active VM algorithm implemented vantages and disadvantages, evaluation techniques, and the journal or
in the CloudAnalyst toolkit that has problems during the peak traffic conference that the article presented. We also showed the name of the
situation. For this purpose, in addition to an allocation table, they publisher.
used a reservation table between the phases of selection and
allocation of VMs. The reservation table maintains the information 4.6. Network-aware task scheduling and load balancing
of the VM reservations suggested by the load balancer to data center
controller, but they did not update the allocation table until the In this section, we have surveyed and overviewed the literature in
notification arrives from allocation phase. The proposed load the field of network-aware task scheduling and load balancing techni-
balancer takes into account both reservation table entry and ques.
allocation statistics table entry for a particular VM id to select a
VM for the next request. The simulations results showed that the • Shen et al. (2016) proposed a probabilistic network-aware task
algorithm allocated requests to VM uniformly even during peak placement for MapReduce scheduling to minimize overall data
traffic situations. transmission cost and delays and hence to reduce job completion
time while balancing the transmission cost reduction and resource

63
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

utilization. They found that a task is faced with three challenges: (1)

Processing Workshops(IEEE)
International Conference on
the available servers for running tasks dynamically change due to

Cloud Computing and Big


Soft computing (springer)
Publication/Presentation

mathematics (Hindawi)

Conference on Parallel
resource allocation and release over time; (2) the data fetching time
of reduce tasks depends on both the placement of reduce tasks and

Journal of Applied

41st International
the locations and sizes of the intermediate data produced by map

Data (IEEE)
tasks; (3) the link load on the routing path also has a significant
impact on the data access latency. In order to reduce the latency, the
link status of the network must be considered in the scheduling
decision. The experimental results showed that the scheduling

A testbed located at department of


computer science at the university
algorithm improved the job completion time and cluster resource

of illinois Urbara Champaaign


utilization.
• Scharf et al. (2015) presented an extension of the OpenStack
Evaluation Techniques

scheduler that enables a network-aware placement of instances by


taking into account bandwidth constraints to and from nodes. Their
solution follows the host-local network resource allocation, and it
can be combined with bandwidth enforcement mechanisms such as
CloudSim

CloudSim

CloudSim

rate limiting. The author presented a prototype that requires only


very few changes in the OpenStack open source software. The
authors showed that for heterogeneous VMs, a network-aware
of

implementation in a public
considering dynamic resource

decision making every time a


considering the completion

placement could achieve a larger network throughput and a more


device can be overloaded
prediction the intensity
requests to obtain appropriate

predictable performance, for example, by avoiding the congestion of


requirement of applications

network resources.
time of application in the


load balancer is invoked

Shen et al. (2016) proposed a new cloud job scheduler with elastic
scheduling algorithm
number of instances

bandwidth reservation in clouds, in which each tenant only needs to


specify job deadline and each job's reserved bandwidth is elastically
distributed

determined by leveraging the elastic feature to maximize the total


adaptive
Disadvantages

job rewards, which represent the worth of successful completion by


• Mobile

cloud

deadlines. It also considers both the computational capacity of VMs


• No

• No

• No
• No
• No

• No
• No

and reserved VM bandwidth in job scheduling. A simulation and real


cluster implementation results showed the efficiency and effective-
of

cloudlet with mobile

heavy load situation selects an


the rate of request

Reducing timing penalty caused


with highs profit and

ness of the algorithm in comparison with other scheduling algo-


energy consumption
rejection in two cases: with and
energy consumption

Reducing energy consumption

rithms.

profit

response time

Kliazovich et al. (2016) proposed a model, called CA-DAG, for cloud


min energy consumption
without resource sharing

Reducing execution time


response time
performance

computing applications taking into account a variety of commu-


by interfering jobs

nication resources of various types used in real systems. This


the

communication-aware model of cloud applications allows making


scalable

• Inapplication
• Decreasing
• Decreasing
• scheduling

separate resource allocation decisions, assigning processors to


• Improving
Improving

• Integrated
• Reducing

• Reducing
• Reducing
Advantages

handle computing jobs and network resources for information


device
• More

transmissions, such as requests for application database. It is based


on DAGs that in addition to computing vertices include separate
a message driven • Load balancing for tightly •


vertices to represent communications. The proposed communica-


idle

Making load balancing more

Mitigating the effects of

tion-aware model creates space for optimization of many existing


sensing capability of mobile

coupled parallel application


Reducing response latency

solutions to resource allocation and, together with performance and


a mobile cloud • Efficient exploiting of
computing, storage and

interference of sharing

energy efficiency metrics of communication systems, will become an


Maximizing the profit
Improving utilization

essential tool in the design of completely new scheduling schemes of


Improving QoS

improved efficiency.
Main Objectives

resources
dynamic

We have investigated and analyzed the network-aware task sche-


device

duling and load balancing techniques; the results are presented in


An overview of application-oriented load balancing techniques.

Table 8. The analysis table contains article year, authors, key ideas,



the definitions of •

a Bio-inspiring •

main objectives, advantages and disadvantages, evaluation techniques,


of

and the journal or conference that the article presented. We also


adaptive runtime system
mobile cloud computing
balancer at a given time
the number

application scheduling
application processing

showed the name of the publisher.


requests through load
model with efficient

4.7. Workflow specific scheduling algorithms


• Designing

algorithm
• Providing
• Verifying
• Limiting

In this section we have surveyed and overviewed the literature on


Key Ideas

• Using

workflow specific scheduling algorithms; articles with regard to bag of


tasks, dependent task, priority based task scheduling are reviewed.


et al. (2012)
Deye et al.

Ghosh and Banerjee (2016) proposed a new enhanced algorithm and


Wei et al.

Wei et al.
Authors

Sarood

implemented it in cloud computing environment, which adds a new


(2015)

(2013)

(2013)

feature like priority basis service of each request. Determining the


priorities of a request, the request allocated to VMs. A Switching
Table 7

2016

2013

2013

2012
Year

queue has proposed to hold the requests, which have been removed
temporarily from the VM due to the arrival of higher priority request

64
E. Jafarnejad Ghomi et al.

Table 8
An overview of network-aware task scheduling and load balancing.

Year Authors Key Ideas Main Objectives Advantages Disadvantages Evaluation Techniques Publication/Presentation

2016 Shen et al. • Considering network • Minimizing data • Reducing job Completion • The optimality of exponential model • Implementing the algorithm on Apache IEEE International
(2016) topology and transmission transmission cost time is not knownPerformance of the HadoopConduct experiments on a Conference on cluster
cost in job scheduling • Balancing transmission • Increasing cluster model did not evaluate under different high-performance computing platform computing
cost reduction and utilization network conditions
resource utilization • Minimizing delay
2016 Shen et al. • Using elastic bandwidth • Finding a job schedule to • Minimizing total job • No automatic bandwidth reservation • Using Facebook synthesized workload IEEE International
(2016) reservation in clouds satisfy the deadline rewards • Simulated datacenter with a rack Conference on Cloud
requirements • Reducing job execution consists of 40 machinesImplementing Computing Technology an
time the algorithm Science
• Efficiency and
effectivenessReal

65
implementation
2016 Kliazovich • Using a communication- • Assigning processors to • Considering the dynamics • No practical validation of proposed • Using Winkler graph generator to Journal of Grid Computing
(2016) aware model in cloud handle computing of cloud environment solution produce workload (Springer)
computing (CA-DAG) jobsUsing network • Using context-data • No considering heterogeneity of • Testbed system architecture composed
resources for information including network cloud environment of a set of identical servers
transmission topology for scheduling
• optimizing makespan
2015 Scharf et al. • Network aware- placement • Extension of OpenStack • Increasing throughput • Fixing time granularity of control A testbed setup of OpenStack “Icehouse” IEEE 24th International
(2015) of instances by taking into Scheduler • More predictable loop to 10 s consisting of a few servers Conference on Computer
account bandwidth performance • Relying on existing filters prevents Communication and
constraints the use of bandwidth as a metric Network (ICCCN)
• No considering the actual topology of
network
Journal of Network and Computer Applications 88 (2017) 50–71
Table 9
An overview of workflow specific scheduling algorithms.

Year Authors Key Idea Main Objectives Advantages Disadvantages Evaluation techniques Journal/Conference Focus on

2017 Cai et al. (2017) • Providing dynamic • Full fill the workflow • Minimizing the cloud • Expectation and variance based task • Using ElasticSim tool Journal of Future Bag of tasks
cloud resource deadline resource renting cost execution time estimation method Generation Computer
E. Jafarnejad Ghomi et al.

scheduling algorithm • Fully use the bag overestimate the practical task Systems (Elsevier)
for BoT workflow structure execution times to some degree
• Agreedysingle type based
method for each
ready BoT
2016 Ghosh and • Priority based service • Improving average • Reducing response time • May cause starvation for low priority CloudSim simulation tool International Conference Priority based
Banerjee (2016) allocation of each user execution time • Reducing execution time requests on Inventive
request • Improving service quality • Long response time for low priority Computation Technology
jobs (IEEE)
2016 Cinque et al. • Providing failure-aware • Improving the execution • Scalable • No, consider the situations where • Implementing on a real grid Proceedings of the 31st Dependable
(2016) scheduling approaches of heavy job batches • Improving performance multicast is not available Annual ACM Symposium scheduling
and scalable monitoring • Reducing bandwidth on Applied Computing
for grid consumption (ACM)
• Improving throughput
2016 Bellavista et al. • Using the publish/ • Providing scalable • Decentralized scheduler • Not fully scalable • implemented and tested in a Journal of Future Dependable
(2016) subscribe paradigm for monitoring and novel troubleshooting • No job completion efficiency real deployment on Generation Computer scheduling
intra-domain scenarios enhanced dependable
• Aalgorithm distributed data centers Systems (Elsevier)
job scheduling • Use standard technology across Europe
to avoid vendor lock-in
• Providing failure-aware
scheduling
2016 Kianpisheh et al. the the probability avoid run-time violations on real world Cluster Computing Workflow

66
(2016)
• Using ant colony system • Minimizing
to develop a robust violation of workflow
• Reducing
of violation of workflow
• No • Simulation
workflow (Springer) Scheduling
workflow scheduler constraints constraints
• No track the workflow at runtime
• Minimizing probability • Reducing expected
of violations penalty at run-time
• Decreasing makespan
and cost
• Considering budget and
deadline of workflows
2015 Moschakisa and • Scheduling of bag of • Optimizing interlinked • Reducing Makespan • The policy of spreading parallel tasks • Proposed approach tested in Journal of Systems and Bag of tasks
Karatzaa (2015) tasks applications cloud systems • Maintaining a good cost- between the clouds is not clear a scientific federated cloud Software (Elsevier)
• Optimizing performance trade-off • No real implementation in cloud
performance and cost • Increasing utilization system
• Improving performance
2015 Zhang and Li • Using an adaptive • Proper mapping of tasks • Reducing response time • Not fully adaptive • CloudSim simulation tool Third International Workflow
(2015) heuristic algorithm for to resources • Optimizing makespan • Focus only on compute intensive Conference on Advanced Scheduling
workflow scheduling • Optimizing load applications Cloud and Big Data
balancing • No considering communication- (IEEE)
• Optimizing failure rate of intensive
tasks
2014 Jaikar et al. • Presenting priority- • Increasing resource • Considering a scientific • No providing migration policies Implementing the algorithm in IEEE 3rd International Priority based
(2014) based VM allocation utilization federated cloud • There is no cost function for cross- OpenNebula Conference on Cloud
algorithm • Reducing energy • Reducing total job cloud VM migration Computing (CloudNet)
consumption execution time
• Improving resource
utilization
• Increasing system
performance
Journal of Network and Computer Applications 88 (2017) 50–71
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

original Stochastic Reward Network models that have been pro-


duced to perform the simulation of the GAMESH failure-aware
scheduling solution. The proposed solution improves job processing
throughput in both intra/inter- domain environments.
• Kianpisheh et al. (2016) investigated the problem of workflow
scheduling regarding the user-defined budget and deadline. The
probability of violation of constraints (POV) has been used as
robustness criteria for a workflow schedule at run-time. By aggre-
gating the execution time distributions of the activities on the critical
path the Probability Density Function (PDF) of makespan is
computed. Ant Colony System has been utilized to minimize an
aggregation of violation function and POV.
• Zhang and Li (2015) proposed an Improved Adaptive heuristic
algorithm (IAHA). At first, the IAHA algorithm makes tasks
prioritization in complex graph considering their impact on each
other, based on graph topology. Through this technique, the
completion time of application can be efficiently reduced. Then, it
Fig. 4. The distribution of studied articles over time From 2008 until February 2017.
is based on adaptive crossover rate and mutation rate to cross and
mutate to control and lead the algorithm to an optimized solution.
The experimental results showed that the proposed method im-
earlier. The authors analyzed the performance of their algorithm
proved response time and makespan.
with respect to Throttled Load Balancing algorithms and Round
Robin.
• Jaikar et al. (2014) proposed system architecture and the VM
We have investigated and analyzed the network-aware task sche-
duling and load balancing techniques; the results are presented in
allocation algorithm for the load balancer in a scientific federated
Table 9. The analysis table contains article year, authors, key ideas,
cloud. They tested the proposed approach in a scientific federated
main objectives, advantages and disadvantages, evaluation techniques,
cloud. Experimental results showed that the proposed algorithm not
and the journal or conference that the article presented. We also
only increased the utilization of resources but also reduced the
showed the name of the publisher. We specified in column “Focus on”,
energy consumption.
• Moschakisa and Karatzaa (2015) developed the simulated annealing
which subcategory the algorithm belongs to.
and thermodynamic simulated annealing in the multi-criteria sche-
5. Discussion and statistics
duling of a dynamic multi-cloud system with VMs of heterogeneous
performance serving Bag-of-Tasks (BoT) applications. The schedul-
In this section, we provide some statistics based on the studied
ing heuristics applied, consider multiple criteria when scheduling
articles. Fig. 4 shows the distribution of the reviewed articles by the
said applications and try to optimize both for performance and cost.
year of publication from 2008 until February 2017. In the Figure we see
Simulation results indicated that the use of these heuristics could
the number of articles in each year on the corresponding slice; for
have a significant impact on performance while maintaining a good
example, the number of studied articles in 2015 is 29 that is the
cost-performance trade-off.
• Cai et al. (2017) proposed A delay-based dynamic scheduling
highest. The percentage of studied articles in each year is shown in the
Figure too. Moreover, the number of articles in 2016 is noteworthy.
algorithm (DDS), DDS is a dynamic cloud resource provisioning
Fig. 4 shows that 3% of the articles were published in 2017, 17% of
and scheduling algorithm to minimize the resource renting cost
them were published in 2016, 27% of them were published in 2015,
while meeting workflow deadlines. New VMs are dynamically rented
11% of them published in 2014, and 9% of them published in 2013. It
by the DDS according to the practical execution state and the
means that 72% of the studied articles have been published in the last
estimated task execution times to fulfill the workflow deadline.
five years.
The bag-based deadline division and bag-based delay scheduling
The distribution of the studied from different publishers is shown in
strategies consider the bag structure to decrease the total renting
Fig. 5. In the Figure, the article frequency of each publisher is shown on
cost. The results showed that the algorithm decreased the resource
the corresponding slice, where 29 out of 108 total articles of journals
renting cost while guaranteeing the workflow deadline compared to
belong to IEEE (27%). To further investigate the foundation journal of
the existing algorithms
• Cinque et al. (2016) proposed a Grid Architecture for scalable
article, 12% of the literature is related to Elsevier, 10% of the literature
is related to Springer, 8% of the literature belongs to ACM, 2% of the
Monitoring and Enhanced dependable job ScHeduling (GAMESH).
literature belongs to IJMECE, 2% of the literature is related to ACEEE,
GAMESH is a completely distributed and highly efficient manage-
42% of them published by others.
ment infrastructure for the dissemination of monitoring data and
In Table 10 and Fig. 6, we showed how the studied articles
troubleshooting of job execution failures in large-scale and multi-
addressed the load balancing QoS metrics. The information is extracted
domain Grid environments. The solutions improve job processing
from Tables 3–9. By referring to Table 10, we can differentiate the
throughput in both intra/inter-domain environments.
• Bellavista et al. (2016) proposed GAMESH, a Grid Architecture for
articles based on single objective and multi-objective load balancing
techniques. References Hsueh et al. (2014), Hou et al. (2014), Babu
scalable Monitoring and Enhanced dependable job ScHeduling. The
et al. (2013), Chien et al. (2016), Scharf et al. (2015), Shen et al.
proposed solution is conceived as a completely distributed and
(2016), Bok et al. (2016), and Kliazovich et al. (2016) in the Table are
highly efficient management infrastructure. The paper relevantly
single objective while the others are multi-objective. Fig. 6 shows that
extends the authors previous work appeared in Cinque et al. (2016).
22% of the studied techniques addressed the response time metric,
With respect to it, this extended version provides additional details
24% addressed the makespan, 19% addressed the resource utilization
about the effective design and implementation of selected and
metric, 9% addressed the throughput, %9 addressed the energy saving,
primary GAMESH components. In addition, it reports a novel and
9% addressed the scalability, 8% addressed the migration time metric.
extensive measurements in both intra-domain and inter-domain
We see that the majority of techniques have concentrated on the
deployments. Moreover, it provides the detailed description of the
response time and makespan metrics.

67
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

Fig. 6. An overview of load balancing metrics addressed by the reviewed techniques.


Fig. 5. The distribution of studied articles based on different publishers.
them used CloudAnalyst, and 18% of them used other tools.
Evaluation techniques used in the articles and the corresponding The venue types of papers are shown in Fig. 8. In the Figure, the
statistics are shown in Fig. 7. We divided evaluation techniques in five absolute number of papers in each venue and the percentage are
classes: Real Testbed, CloudSim, Witten Program, CloudAnalyst and shown. It is shown that 20 papers presented on the conference, 16
the others. Article frequencies in each class are shown on the papers published in journals, and two papers presented in a sympo-
corresponding slice; 19 articles used a real testbed for algorithm sium. To further investigate the paper, 53% of the literature presented
evaluation, eight articles used CloudSim, authors of three articles on the conference, 42% published in journals, and 5% presented in
wrote a program for algorithm evaluation, two articles used symposium.
CloudSim, and seven articles used other techniques. To further
investigate the foundation evaluation techniques of the articles, 49%
of the literature used a real testbed that is the highest, and 20% of them 6. Open issues and future trends
used ClousSim, 3% of them used an arbitrarily written program, 2% of
In this section, we offer major load balancing techniques issues that

Table 10
Load balancing QoS metrics in the reviewed techniques.

# References Energy saving Migration time Response time Scalability Resource utilization Throughput Makespan

1 Scharf et al. (2015) •


2 Shen et al. (2016) •
3 Keliazovich (2016) •
4 Hou et al. (2014) •
5 Bok et al. (2016) •
6 Hsueh et al. (2014) •
7 Babu et al. (2013) •
8 Chient et al. (2016) •
9 Yakhchi et al. (2015) • • • •
10 Dasgupta et al. (2013) • • •
11 Nishant et al. (2012) • •
12 Singh et al. (2015) • • • • •
13 Gutierrez-Garcia and Ramirez-Nafarrate (2015) • • • •
14 Keshvadi et al. (2015) • • • •
15 Tasquire (2015) • • • •
16 Kumarasamy et al. (2015) • • • •
17 Domanal and Reddy (2015) • • • •
18 Kulkarni and BA (2015) • •
19 Ghoneem and Kulkarni (2016) • •
20 Benfia et al. (2017) • • •
21 Yang and Chen (2015) • •
22 Wei et al. (2015) • •
23 Deye et al. (2013) • •
24 Wei et al. (2013) • • •
25 Sarood et al. (2012) • •
26 Shen et al. (2016) • • •
27 Ghosh and Banerjee (2016) • •
28 Cinque et al. (2016) • •
29 Kianpisheh et al. (2016) • •
30 Maschakis et al. (2015) • •
31 Zhang and Li (2015) • •
32 Jaikar et al. (2014) • •
33 Total 7 6 17 7 14 7 18

68
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

Google, Microsoft, and Amazon, other cloud providers are growing too.
In some situations, it is necessary for a cloud provider to send some
workload to another cloud provider for processing for the purpose of
load balancing. In other words, using resources of more than one cloud
provider is a critical requirement for load balancing in the future. In
this case, the cloud providers will face data lock-in problems. Our study
shows that just a few articles have paid attention to these topics.
Therefore, another interesting line for future research can be the
investigation of data lock-in and cross-cloud servicing problems.

7. Conclusion and future works

Balancing of the workload among cloud nodes is one of the most


important challenges that cloud environments are facing today. In this
paper, we surveyed research literature in the load balancing area,
which is the key aspect of cloud computing. We found in the literature,
several metrics for load balancing techniques that should be considered
Fig. 7. Evaluation techniques used by articles. in future load balancing mechanisms. Based on our observations, we
have presented a new classification of load balancing techniques: (1)
Hadoop MapReduce load balancing category, (2) natural phenomenon-
based load balancing category, (3) agent-based load balancing category,
and (4) general load balancing category. In each category, we studied
some techniques and analyzed them in terms of some metrics and
summarized the results in tables. Key ideas, main objectives, advan-
tages, disadvantages, evaluation techniques, publication year were
metrics that we considered for load balancing techniques. Recently,
load balancing techniques are focusing on two critical metrics, That is,
energy saving and reducing carbon dioxide emission. As future works,
we suggest the followings: (1) Study and analyze more recent techni-
ques in each of our proposed categories, (2) Evaluate each technique in
a simulation toolkit and compare them based on new metrics.

References

Abdolhamid, M., Shafi’i, M., Bashir, M.B., 2014. Scheduling techniques in on-demand
grid as a service cloud: a review”. J. Theor. Appl. Inf. Technol. 63 (1), 10–19.
Fig. 8. Studies venue types. Abdullahi, M., Md, Asri Ngadi, Md.A., Abdulhamid, S.M., 2015. Symbiotic organism
search optimization based task scheduling in cloud computing environment. Future
have not been comprehensively and completely addressed. In our Gener. Comput. Syst. 56, 640–650.
Aditya, A., Chatterjee, U., Gobata, S., 2015. A comparative study of different static and
literature review, we found that there is not a perfect technique for dynamic load-balancing algorithm in cloud computing with special emphasis on time
improving the entire load balancing metrics. For example, some factor. Int. J. Curr. Eng. Technol. 3 (5).
techniques considered response time, resource utilization, and migra- Ahmad, F., Chakradhar,S.T., Raghunathan,A., Vijaykumar, T.N., 2012. Tarazu:
optimizing mapreduce on het-erogeneous clusters. International Conference on
tion time, while the others ignored these metrics and considered other Architectural Support for Programming Languages and Operating Systems
metrics. However, it seems that some metrics are mutually exclusive. (ASPLOS). 40(1), 61-74.
For example, relying on VM migration for load balancing may cause an Ahmad, R.W., Gani, A., Hamid, S.H.A., Shiraz, M., 2015. A survey on virtual machine
migration and server consolidation frameworks for cloud data centers. J. Netw.
increase in the response time. Service cost is another metric, which is Comput. Appl. 52, 11–25.
not considered in the studied articles. Presenting a comprehensive Alakeel, A.M., 2010. A guide to dynamic load balancing in distributed computer systems.
technique to improve as many metrics as possible is, therefore, very Int. J. Comput. Sci. Netw. Secur. 10 (6), 153–160.
Apostu, A., Puican, F., Ularu, G., George Suciu, G., Todoran, G., 2013. Study on
desirable. advantages and disadvantages of cloud computing – the advantages of telemetry
Furthermore, our study showed that the energy consumption and applications in the cloud. Recent Adv. Appl. Comput. Sci. Digit. Serv..
carbon emission are two important drawbacks due to the incremental Babu, L.D.D., Krishna, P.V., 2013. Honey bee behavior inspired load balancing of tasks in
cloud computing environments. Appl. Soft Comput. 13 (5), 2292–2303.
growth of the number of datacenters. However, just a few articles
Bellavista, P., Cinque, M., Corradi, A., Foschini, L., Frattini, F., Molina, J.P., 2016.
addressed these two drawbacks. Energy consumption is regarded as an GAMESH: a grid architecture for scalable monitoring and enhanced dependable job
economic efficiency factor while carbon emission is regarded as a scheduling. Future Gener. Comput. Syst..
health-related, and/or an environmental factor. Each of these issues is Benifa, J.V.B., Dejey, 2017. Performance improvement of MapReduce for heterogeneous
clusters based on efficient locality and Replica aware scheduling (ELRAS) strategy.
critically important. Therefore, providing load balancing mechanisms Wirel. Personal. Commun., 1–25.
in a cloud environment while also addressing these two problems is Bhatia, J., Patel, T., Trivedi, H., Majmudar, V., 2012. HTV Dynamic Load-balancing
very desirable too. algorithm for Virtual Machine Instances in Cloud. International Symposium on
Cloud and Services Computing, 15–20.
Recently, a large volume of data is produced daily from social Bok, K., Hwang, J., Jongtae Lim, J., Kim, Y., Yoo, J., 2016. An efficient MapReduce
networks, medical records, e-commerce, e-shopping, e-pay, banking scheduling scheme for processing large multimedia data. Multimed. Tools Appl.,
records, etc. This huge volume of data makes big data, and therefore 1–24.
Cai, Z., Li, X., Ruizc, R., Lia, Q., 2017. A delay-based dynamic scheduling algorithm for
needs near-perfect distribution for fast servicing. Our study showed bag-of-task workflows with stochastic task execution times in clouds. J. Future
that in recent years just a few articles addressed this topic. Further Gener. Comput. Syst. 71, 57–72.
optimization of Hadoop MapReduce for processing big data in the Chethana, R., Neelakantappa, B.B., Ramesh, B., 2016. Survey on adaptive task
assignment in heterogeneous Hadoop cluster. IEAE Int. J. Eng. 1 (1).
future research, is quite promising. Chien, N.K., Son, N.H., HD, 2016. Load-balancing algorithm Based on Estimating Finish
Recently, in addition to the existing popular cloud providers such as Time of Services in Cloud Computing, International Conference on Advanced

69
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

Commutation Technology (ICACT), 228-233. Kianpisheh, S., Charkari, N.M., Kargahi, M., 2016. Ant colony based constrained
Cinque, M., Corradi, A., Luca Foschini,L., Frattini, F., Mol, J.P., 2016. Scalable workflow scheduling for heterogeneous computing systems. Clust. Comput. 19,
Monitoring and Dependable Job Scheduling Support for Multi-domain Grid 1053–1070.
Infrastructures. In: Proceedings of the 31st Annual ACM Symposium on Applied Kliazovich, D., Pecero, J.E., Tchernykh, A., Bouvry, P., Khan, S.U., Zomaya, A.Y., 2016.
Computing. CA-DAG: modeling communication-aware applications for scheduling in cloud
Dagli, M.K., Mehta, B.B., 2014. Big data and Hadoop: a review. Int. J. Appl. Res. Eng. Sci. computing. J. Grid Comput., 1–17.
2 (2), 192. Kolb, L., Thor, A., Rahm, E., 2011. Block-based Load Balancing for Entity Resolution
Daraghmi, E.Y., Yuan, S.M., 2015. A small world based overlay network for improving with MapReduce. International Conference on Information and Knowledge
dynamic load-balancing. J. Syst. Softw. 107, 187–203. Management (CIKM), 2397–2400.
Dasgupta, K., Mandalb, B., Duttac, P., Mondald, J.K., Dame, S., 2013. A Genetic Kolb, L., Thor, A., Rahm, E., 2012. Load Balancing for MapReduce-based Entity
Algorithm (GA) based Load-balancing strategy for Cloud Computing, International Resolution, IEEE In: Proceedings of the 28th International Conference on Data
Conference on Computational Intelligence: Modeling Techniques and Applications Engineering, 618-629.
(CIMTA), 10, 340-347. Komarasamy, D., Muthuswamy, V., 2016. A novel approach for dynamic load balancing
Destanoğlu, O., Sevilgen, F.E., 2008. Randomized Hydrodynamic Load Balancing with effective Bin packing and VM reconfiguration in cloud. Indian J. Sci. Technol. 9
Approach, IEEE International Conference on Parallel Processing, 1, 196-203. (11), 1–6.
Deye, M.M., Slimani, Y., sene, M., 2013. Load Balancing approach for QoS management Koomey, J.G., 2008. Worldwide electricity used in datacenters. Environ. Res. Lett. 3 (3),
of multi-instance applications in Clouds. Proceeding on International Conference on 034008.
Cloud Computing and Big Data, 119–126. Kulkarni, A.K., B, A, 2015. Load-balancing strategy for Optimal Peak Hour Performance
Domanal, S.G., Reddy, G.R.M., 2015. Load Balancing in Cloud Environment using a in Cloud Datacenters. In: Proceedings of theIEEE International Conference on Signal
Novel Hybrid Scheduling Algorithm. IEEE International Conference on Cloud Processing, Informatics, Communication and Energy Systems (SPICES).
Computing in Emerging Markets, 37-42. Kumar, S., Rana, D.H., 2015. Various dynamic load-balancing algorithms in cloud
Doulkeridis, C., Nørvåg, K., 2013. A survey of large-scale analytical query processing in environment: a survey. Int. J. Comput. Appl. 129 (6).
MapReduce. VLDB J., 1–26. Lee, K.H., Choi, H., Moon, B., 2011. Parallel data processing with MapReduce: a survey.
Dsouza, M.B., 2015. A survey of Hadoop MapReduce scheduling algorithms. Int. J. SIGMOD Rec. 40 (4), 11–20.
Innov. Res. Comput. Commun. Eng. 3 (7). Li, R., Hu, H., Li, H., Wu, Y., Yang, J., 2015. MapReduce parallel programming model: a
Fadika, Z., Dede, E., Govidaraju, M., 2011. Benchmarking MapReduce Implementations state-of-the-art survey. Int. J. Parallel Program., 1–35.
for Application Usage Scenarios. In: 2011 IEEE/ACM Proceedings of the 12th Lin, C.Y., Lin, Y.C., 2015. A Load-Balancing Algorithm for Hadoop Distributed File
International Conference on Grid Computing, 0, 90–97. System, International Conference on Network-Based Information Systems.
Farrag, A.A.S., Mahmoud, S.A., 2015. Intelligent Cloud Algorithms for Load Balancing Lua, Y., Xie, Q., Klito, G., Geller, A., Larus, J.R., Greenberg, A., 2011. Join-Idle-Queue: a
problems: A Survey. IEEE In: Proceedings of the Seventh International Conference novel load-balancing algorithm for dynamically scalable web services. Int. J.
on Intelligent Computing and Information Systems (ICICIS 'J 5), 210-216. Perform. Eval. 68, 1056–1071.
Gautam, J.V., Prajapati, H.B., Dabhi, V.K., Chaudhary, S., 2015. A Survey on Job Malladi, R.R., 2015. An approach to load balancing In cloud computing. Int. J. Innov.
Scheduling Algorithms in Big Data Processing. IEEE International Conference on Res. Sci. Eng. Technol. 4 (5), 3769–3777.
Electrical, Computer and Communication Technologies (ICECCT’15), 1-11. Manjaly, J.S., A, CE, 2013. Relative study on task schedulers in Hadoop MapReduce. Int.
Ghoneem, M., Kulkarni, L., 2016. An Adaptive MapReduce Scheduler for Scalable J. Adv. Res. Comput. Sci. Softw. Eng. 3 (5).
Heterogeneous Systems. Proceeding of the International Conference on Data Mesbahi, M., Rahmani, A.M., 2016. Load balancing in cloud computing: a state of the art
Engineering and Communication Technology, 603–6011. survey. Int. J. Mod. Educ. Comput. Sci. 8 (3), 64.
Ghosh, S., Banerjee, C., 2016. Priority Based Modified Throttled Algorithm in Cloud Milani, A.S., Navimipour, N.J., 2016. Load balancing mechanisms and techniques in the
Computing. International Conference on Inventive Computation Technology. cloud environments: systematic literature review and future trends. J. Netw.
Goyal, S., Verma, M.K., 2016. Load balancing techniques in cloud computing Comput. Appl. 71, 86–98.
environment: a review. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 6 (4). Mishra, N.K., Misha, N., 2015. Load balancing techniques: need, objectives and major
Gupta, H., Sahu, K., 2014. Honey bee behavior based load balancing of tasks in cloud challenges in cloud computing: a systematic review. Int. J. Comput. 131 (18).
computing. Int. J. Sci. Res. 3 (6). Moschakisa, I.A., Karatzaa, H.D., 2015. Multi-criteria scheduling of Bag-of-Tasks
Gutierrez-Garcia, J.O., Ramirez-Nafarrate, A., 2015. Agent-based load balancing in applications on heterogeneous interlinked clouds with simulated annealing. J. Softw.
Cloud data centers. Clust. Comput. 18 (3), 1041–1062. Syst. 101, 1–14.
Hefny, H.A., Khafagy, M.H., Ahmed, M.W., 2014. Comparative study load balance Mukhopadhyay, R., Ghosh, D. , Mukherjee, N., 2010. A Study on the application of
algorithms for MapReduce environment. Int. Appl. Inf. Syst. 106 (18), 41. existing load-balancing algorithms for large, dynamic, and heterogeneous distributed
Hou, X., Kumar, A., Varadharajan, V., 2014. Dynamic Workload Balancing for Hadoop systems ACM, A Study on the Application of Existing Load-balancing algorithms for
MapReduce. Proceeding of International Conference on Big data and Cloud Large, Dynamic, and Heterogeneous Distributed System. In Proceedings of 9th
Computing, 56-62. International Conference on Software Engineering, Parallel and Distributed Systems,
Hsueh, S.C., Lin, M.Y., Chiu, Y.C., 2014. A load-balanced MapReduce algorithm for 238–243 .
blocking-based entity-resolution with multiple keys. Parallel Distrib. Comput. Neeraj, R., Chana, I., 2014. Load balancing and job migration techniques in grid: a survey
(AusPDC), 3. of recent trends. Wirel. Personal. Commun. 79 (3), 2089–2125.
Hwang, K., Dongarra, J., Fox, G.C., 2013. Distributed and Cloud Computing: from Nishant, K., Sharma, P., Krishna, V., Gupta, C., Singh, K.P., Nitin, N., Rastogi,R., 2012.
Parallel Processing to the Internet of Things. Load Balancing of Nodes in Cloud Using Ant Colony Optimization. In: Proceedings of
Ivanisenko, I.N., Radivilova, T.A., 2015. Survey of Major Load-balancing algorithms in the 14th International Conference on Modelling and Simulation, 3-8.
Distributed System. Information Technologies in Innovation Business Conference Nuaimi, K., Mohamed, N., Mariam Al-Nuaimi, M., Al-Jaroodi, J., 2012. A Survey of Load
(ITIB). Balancing in Cloud Computing: Challenges and Algorithms, IEEE In: Proceedings of
Jadeja, Y., Modi, K., 2012. Cloud Computing - Concepts, Architecture and Challenges. the Second Symposium on Network Cloud Computing and Applications.
International Conference on Computing, Electronics and Electrical Technologies Palta, R., Jeet, R., 2014. Load balancing in the cloud computing using virtual machine
[ICCEET]. migration: a review. Int. J. Appl. Innov. Eng. Manag. 3 (5), 437–441.
Jaikar, A., Dada, H., Kim, G.R., Noh, S.Y., 2014. Priority-based Virtual Machine Load Patel, H.M., 2015. A comparative analysis of MapReduce scheduling algorithms for
Balancing in a Scientific Federated Cloud. IEEE In: Proceedings of the 3rd Hadoop. Int. J. Innov. Emerg. Res. Eng. 2 (2).
International Conference on Cloud Computing. Polato, I., Re, R., Goldman, A., Kon, F., 2014. A comprehensive view of Hadoop research
Kabir, M.S., Kabir, K.M., Islam, R., 2015. Process of load balancing in cloud computing – a systematic literature review. J. Netw. Comput. Appl. 46, 1–25.
using genetic algorithm. Electr. Comput. Eng.: Int. J. 4 (2). Rajabioun, R., 2011. Cuckoo optimization algorithm. Appl. Soft Comput. 11, 5508–5518.
Kanakala, V.R.T., Reddy, V.K., 2015a. Performance analysis of load balancing techniques Randles, M., Lamb, D., Tareb-Bendia, A., 2010. A Comparative Study into Distributed
in cloud computing environment. TELKOMNIKA Indones. J. Electr. Eng. 13 (3), Load-balancing algorithms for Cloud Computing, IEEE In: Proceedings of the 24th
568–573. International Conference on Advanced Information Networking and Applications
Kanakala, V.R.T., Reddy, V.K., 2015b. Performance analysis of load balancing techniques Workshops, pp. 551–556.
in cloud computing environment. TELKOMNIKA Indones. J. Electr. Eng. 13 (3), Rao, B.T., Reddy, L.S.S., 2011. Survey on improved scheduling in Hadoop MapReduce in
568–573. cloud environments. Int. J. Comput. Appl. 34 (9).
Kansal, N.J., Inderveer Chana, I., 2012. Cloud load balancing techniques: a step towards Rastogi, G., Sushil, R., 2015. Analytical Literature Survey on Existing Load Balancing
green computing. Int. J. Comput. Sci. Issues 9 (1), 238–246. Schemes in Cloud Computing. International Conference on Green Computing and
Kaur, R., Luthra, P., 2014. Load Balancing in Cloud Computing, International Internet of Things (ICGCloT).
Conference on Recent Trends in Information. Telecommunication and Computing, Rathore, N., Channa, I., 2011. A Cognitive Analysis of Load Balancing and job migration
ITC, pp. 1–8. Technique in Grid World Congress on Information and Communication
Kc, K., Anyanwu, K., 2010. Scheduling Hadoop Jobs to Meet Deadlines. In: Proceedings Technologies Congr. Inf. Commun. Technol. (WICT). pp. 77–82.
of the 2nd IEEE International Conference on Cloud Computing Technology and Rathore, N., Chana, I., 2013. A Sender Initiate Based Hierarchical Load Balancing
Science (CloudCom), 388–392. Technique for Grid Using Variable Threshold Value. Signal Processing, Computing
Keshvadi, S., Faghih, B., 2016. A multi-agent based load balancing system in IaaS cloud and Control (ISPCC), IEEE International Conference.
environment. Int. Robot. Autom. J. 1 (1). Ray, S., Sarkar, A.D., 2012. Execution analysis of load-balancing algorithms in cloud
Khalil, S., Salem, S.A., Nassar, S., Saad, E.M., 2013. Mapreduce performance in computing environment. Int. J. Cloud Comput.: Serv. Archit. (IJCCSA) 2 (5).
heterogeneous environments: a review. Int. J. Sci. Eng. Res. 4 (4), 410–416. Sarood, O., Gupta, A., Kale, L.V., 2012. Cloud Friendly Load Balancing for HPC
Khiyaita, A., Zbakh, M., Bakkali, H.E.I., Kettani, D.E.I., 2012. Load balancing cloud Applications: Preliminary Work. International Conference on Parallel Processing
computing: state of art. Netw. Secur. Syst. (JNS2), 106–109. Workshops, 200–205.

70
E. Jafarnejad Ghomi et al. Journal of Network and Computer Applications 88 (2017) 50–71

Scharf, M., Stein, M., Voith,T., Hilt, V., 2015. Network-aware Instance Scheduling in Network and Parallel Computing, 174–181.
OpenStack. International Conference on Computer Communication and Network Vasic, N., Barisits, M., 2009. Salzgeber, V. Making Cluster Applications Energy-Aware, In
(ICCCN), 1-6. ACDC ’09 In: Proceedings of the 1st Workshop on Automated Control for
Selvi, R.T., Aruna, R., 2016. Longest approximate time to end scheduling algorithm in Datacenters and Clouds, ACM, New York, NY, USA, pp. 37–42.
Hadoop environment. Int. J. Adv. Res. Manag. Archit. Technol. Eng. 2 (6). Vernica, R., Balmin, A., Beyer, K.S., Ercegovac, V., 2012. Adaptive MapReduce using
Shadkam, E., Bijari, M., 2014. Evaluation the efficiency of cuckoo optimization situation-aware mappers. International Conference on Extending Database
algorithm. Int. J. Comput. Sci. Appl. 4 (2), 39–47. Technology (EDBT), 420–431.
Shaikh, B., Shinde, K., Borde, S., 2017. Challenges of big data processing and scheduling Wei, X., Fan, J., Lu, Z., Ding, K., 2013,. Application scheduling in mobile cloud
of processes using various Hadoop Schedulers: a survey. Int. Multifaceted Multiling. computing with load balancing. J. Appl. Math., 1–13.
Stud. 3, 12. Wei, X., Fan, J., Wang, T., Wang, Q., 2015. Efficient application scheduling in mobile
Shen, H., Sarker, A., Yuy, L., Feng Deng, F., 2016. Probabilistic Network-Aware Task cloud computing based on MAX–MIN ant system. Soft Comput., 1–15.
Placement for MapReduce Scheduling. In: Proceedings of the IEEE International Xia, Y., Wang, L., Zhao, Q., Zhang, G., 2011. Research on job scheduling algorithm in
Conference on Cluster Computing. Hadoop. J. Comput. Inf. Syst. 7, 5769–5775.
Shen, H., Yu, L., Chen,L., Li, Z., 2016. Goodbye to Fixed Bandwidth Reservation: Job Yahaya, B., Latip, R., Othman, M., Abdullah, A., 2011. Dynamic load balancing policy
Scheduling with Elastic Bandwidth Reservation in Clouds. In: Proceedings of the with communication and computation elements in grid computing with multi-agent
International Conference on Cloud Computing Technology and Science. system integration. Int. J. New Comput. Archit. Appl. (IJNCAA) 1 (3), 757–765.
Sidhu, A.K., Kinger, S., 2013. Analysis of load balancing techniques in cloud computing. Yakhchi, M., Ghafari, S.M., Yakhchi, S., Fazeliy, M., Patooghi, A., 2015. Proposing a Load
Int. J. Comput. Technol. 4 (2). Balancing Method Based on Cuckoo Optimization Algorithm for Energy
Sim, K.M., 2011. Agent-based cloud computing. IEEE Trans. Serv. Comput. 5 (4), Management in Cloud Computing Infrastructures. Published In: Proceedings of the
564–577. 6th International Conference on Modeling, Simulation, and Applied Optimization
Singh, P., Baaga, P., Gupta, S., 2016. Assorted load-balancing algorithms in cloud (ICMSAO).
computing: a survey”. Int. J. Comput. Appl. 143 (7). Yang, S.J., Chen, Y.R., 2015. Design adaptive task allocation scheduler to improve
Singha, A., Juneja, D., Malhotra, M., 2015. Autonomous Agent Based Load-balancing MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57,
algorithm in Cloud Computing. International Conference on Advanced Computing 61–70.
Technologies and Applications (ICACTA), 45, 832–841. Zaharia, M., 2009. Job Scheduling with the Fair and Capacity Schedulers 9. Berkley
Sui, Z., Pallickara, S., 2011. A survey of load balancing techniques forData intensive University.
computing. In. In: Furht, Borko, Escalante, Armando (Eds.), Handbook of Data Zaharia, M., Borthakur, D., Sarma, J.S., 2010. Delay Scheduling: A Simple Technique for
Intensive Computing. Springer, New York, 157–168. Achieving Locality and Fairness in Cluster Scheduling, in Proceedings of the
Tasquier, L., 2015. Agent based load-balancer for multi-cloud environments. Columbia European conference on Computer systems (EuroSys'10), 265–278.
Int. Publ. J. Cloud Comput. Res. 1 (1), 35–49. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I., 2008. Improving
Vaidya, M., 2012. Parallel processing of cluster by Map Reduce. Int. J. Distrib. Parallel MapReduce Performance in Heterogeneous Environments. In: Proceedings of the
Syst. 3 (1). 8th conference on Symposium on Opearting Systems Design and Implementation,
Valvåg, S.V., 2011. Cogset: A High-Performance MapReduce Engine. Faculty of Science 29–42.
and Technology Department of Computer Science, University of Tromsö, 14. Zhang, Y., Li, Y., 2015. An improved Adaptive workflow scheduling Algorithm in cloud
Valvåg, S.V., Johansen, D., 2009. Cogset: A unified engine for reliable storage and environments. In: Proceedings of the Third International Conference on Advanced
parallel processing, In: Proceedings of the Sixth IFIP International Conference on Cloud and Big Data, 112-116.

71

You might also like