0% found this document useful (0 votes)
3 views15 pages

Genetically-Modified Multi-Objective Particle Swarm Optimization Approach For High-Performance Computing Workflow Scheduling

The document presents a Genetically-modified Multi-objective Particle Swarm Optimization (GMPSO) approach for scheduling high-performance computing workflows on hybrid cloud infrastructures. This method aims to optimize conflicting objectives such as Makespan and Economic Cost by incorporating genetic operations into the Multi-objective Particle Swarm Optimization process. The proposed approach demonstrates significant improvements in scheduling efficiency compared to existing algorithms like NSGA-II and OMOPSO, particularly under demanding workload conditions.

Uploaded by

Haithem Hafsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

Genetically-Modified Multi-Objective Particle Swarm Optimization Approach For High-Performance Computing Workflow Scheduling

The document presents a Genetically-modified Multi-objective Particle Swarm Optimization (GMPSO) approach for scheduling high-performance computing workflows on hybrid cloud infrastructures. This method aims to optimize conflicting objectives such as Makespan and Economic Cost by incorporating genetic operations into the Multi-objective Particle Swarm Optimization process. The proposed approach demonstrates significant improvements in scheduling efficiency compared to existing algorithms like NSGA-II and OMOPSO, particularly under demanding workload conditions.

Uploaded by

Haithem Hafsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Applied Soft Computing 122 (2022) 108791

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Genetically-modified Multi-objective Particle Swarm Optimization


approach for high-performance computing workflow scheduling

Haithem Hafsi a , Hamza Gharsellaoui b,c , , Sadok Bouamama a,d
a
National School of Computer Science (ENSI), Manouba University, Tunisia
b
National School of Advanced Sciences and Technologies of Borj Cedria (ENSTAB), Carthage University, Tunisia
c
LISI-INSAT Laboratory, National Institute of Applied Sciences and Technology (INSAT), Carthage University, Tunisia
d
Higher College of Technology (DMC), Dubai, United Arab Emirates

article info a b s t r a c t

Article history: Nowadays, scientific research, industry, and many other fields are greedy regarding computing
Received 24 July 2021 resources. Therefore, Cloud Computing infrastructures are now attracting pervasive interest thanks
Received in revised form 14 February 2022 to their excellent hallmarks such as scalability, high performance, reliability, and the pay-per-use
Accepted 23 March 2022
strategy. The execution of these high-performant applications on such kind of computing environments
Available online 6 April 2022
in respect of optimizing many conflicting objectives brings us to a challenging issue commonly known
Keywords: as the multi-objective workflows scheduling on large scale distributed systems. Having this in mind,
Workflow scheduling we outline in the present paper our proposed approach called Genetically-modified Multi-objective
Multi-objective optimization Particle Swarm Optimization (GMPSO) for scheduling application workflows on hybrid Clouds in the
High-performance computing context of high-performance computing in an attempt to optimize Makespan and Cost. The GMPSO
Hybrid clouds consists of incorporating genetic operations into the Multi-objective Particle Swarm Optimization
to enhance the resulting solutions. To achieve this, we have designed a novel solution encoding
that represents the task ordering, the task mapping and the resource provisioning processes of the
workflow scheduling problem in hybrid Clouds. In addition, a set of particular adaptive evolutionary
operators have been designed. Conducted simulations lead to significant results compared with a set
of well-performed algorithms such NSGA-II, OMOPSO and SMPSO, especially, for the most-demanding
workload of workflows.
© 2022 Elsevier B.V. All rights reserved.

1. Introduction However, these grid resource infrastructures still have limits in


performance and present some security and availability issues. On
The researchers’ community widely developed high-perfor- the other hand, and by dint of development of virtualization tech-
mance computing applications, known as e-science applications niques and the web services architecture, Cloud computing of-
[1,2], in many fields such as predictive modeling and simulations, fers highly-available, rapidly-scalable, and secured IT capabilities.
astrophysics, bio-informatics, computational aerodynamics, etc. They are established in well-managed and high-throughput net-
At the outset, the execution of such applications is performed worked Data Centers [4,5]. After negotiating the QoS of demanded
using mainframes. But, nowadays, and especially in the era of resources, these Cloud resources are delivered on-demand to
big data, these applications are so time-consuming that classical the public users in a pay-per-use fashion. Therefore, all these
stand-alone computing centers cannot perform their execution. characteristics make the Cloud infrastructures a suitable high-
In addition, scientific applications are data-intensive ones that end computing environment for executing data-intensive and
high-speed networking capabilities need to be established [2]. A high-performance scientific applications.
new promising Large Scale Distributed System (LSDS) is designed E-science applications are commonly structured as a workflow
To satisfy resource needs for HPC’s applications. One can cite of tasks [1,6]. Initially, with classic distributed systems such as:
Grid and Cloud Computing infrastructures [3]. The Grid comput- clusters and Grids, the scheduling process is limited to allocating
ing concept consists of federating free, private, and distributed these tasks to a set of already running processing nodes. After
resources in a typical virtual pool using specific middlewares. that, and since the emergence of Cloud computing solutions,
a new phase has been added to the scheduling process: the
∗ Corresponding author at: National School of Advanced Sciences and resource provisioning. The resource provisioning operation allows
Technologies of Borj Cedria (ENSTAB), Carthage University, Tunisia. users to choose the configuration (in number and performance)
E-mail address: [email protected] (H. Gharsellaoui). of their processing machines among the available offers of Cloud

https://doi.org/10.1016/j.asoc.2022.108791
1568-4946/© 2022 Elsevier B.V. All rights reserved.
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

providers. Consequently, in addition to optimizing Grid-wise ob- 2.1. Problem motivation


jectives like the "total execution time" of the applications, the
market-oriented model and the virtualization concept brought by 2.1.1. Scientific application
the Cloud computing have introduced new optimization criteria. To better model scientific applications, determine the ap-
First, the "economic cost" is the main optimization objective propriate leased resources for execution and define the bet-
raised by the use of Cloud infrastructures, especially with the ter scheduling strategy, a set of application criteria need to be
wide variety of choices either in terms of resources’ configu- identified, such like:
rations or in relation to the diversity of the concurrent Cloud • Parallelism model: specifies if the application is defined as
providers. In addition, the fact of externalizing data to the Cloud a single task composed of sequential instructions or multi-
providers’ Data Centers makes the security concern as a critical ple parallel tasks that can be executed on multiprocessors
issue that needs to be studied by the Cloud users. Furthermore, machines or distributed processing nodes.
to meet the Service Level Agreement (SLA) established with their • Resources usage: determines whether the application is
clients, Cloud providers have to optimize additional tasks as vir- CPU-intensive, Data-intensive, or IO-intensive.
tual machines load balancing, VM placement on physical machine • Task dependency: related to the I/O flow between tasks:
and Data Center energy consumption. These arguments made means that a task cannot begin its execution until the output
the scheduling process more challenging and paved a way for data of studies that it depends on are available.
the development of numerous scheduling algorithms of scien-
Most scientific applications are structured with highly dependent
tific workflows in hybrid computing platforms. This optimization
tasks designed to be performed in distributed computing infras-
problem is proved to be an NP-hard problem [7][8,9].
tructure. This form of tasks collection of these applications is
Since the tackled optimized objectives are numerous and con-
known as the scientific workflows [6,18]. In Fig. 1, we exhibit a
flicting, we face a Multi-Objective Optimization Problem (MOOP).
set of popular scientific workflows that replicate real scientific
Population-Based Optimization algorithms are widely adopted applications: Montage for astronomy researches, CyberShake for
methods to solve MOOP [10–14] such as: Genetic Algorithms earthquake science, Sipht and Epigenomics for biological con-
(GA), Ant Colony Optimization (ACO), Bee Colony Optimization cerns, Inspiral dealing with gravitational physics. Depending on
(BCO) and Particle Swarm Optimization (PSO). In the literature, the related application, each scientific workflow type will be
various algorithms are designed [15][4]. The Non-dominated Sort- resolved by performing a set of particular jobs as labeled with
ing Genetic Algorithm Version II (NSGA-II) [16] algorithm is a colors in Fig. 1. Furthermore, each job can be carried out either by
widely used optimization technique for MOOPs. Likewise, Multi- a single task or by multiple parallel tasks. Therefore, the scientific
Objective Particle Swarm Optimization (MOPSO) algorithms are workflows are generated by a set of tasks organized in specific
increasingly used to solve such kinds of problems [17]. dependency structure and categorized into a group of particular
This paper proposes a Genetically-modified Multi-objective jobs.
Particle Swarm Optimization (GMPSO) algorithm that aims to
optimize HPC application workflow scheduling in hybrid com- 2.1.2. The computing infrastructure
puting infrastructures considering Makespan and Economic Cost The Large Scale Distributed Systems, on which the mentioned
as objectives. Our GMPSO approach is a kind of an incorporation scientific workflows are performed, present various types and
models of computing infrastructures described as follows:
of genetic operations into MOPSO process in a specific way. To
achieve this, we have designed a two-dimensional vector encod- • Clusters: the known classic form of networked set of homo-
ing representing the scheduling solution on a hybrid computing geneous processing nodes.
infrastructure and allowing to parse all potential solutions. Ac- • Grids: a sort of multi-users infrastructure federating multi-
cording to our readings, we judge that our encoding is unique ples computer nodes distributed in different domains and
in its use in such kind of problems. In addition, we have devel- orchestrated with an abstraction software layer named Grid
oped a set of novel crossover and mutation operators that allow Middleware allowing to make Grids viewed as a single pool
exploring a significant spectrum of solutions in the solution space. of resources [19]. Many Grid projects were designed to
The remainder of this paper is organized as follows. The state- execute HPC applications like globus [20] and gLite [21].
of-the-art section introduces the key components of our work- • Volunteer Computing(VC): a kind of an arrangement of un-
flow scheduling problem and the hybrid computing infrastruc- used computer resources provided by internet surfers so-
called Volunteers. These resources would be aggregated us-
ture. Then, we expose the related work of workflow scheduling on
ing specific middlewares like BOINC [22]. Unlike Grids, VC
distributed systems. Section 3 presents the scheduling problem
nodes are volatile, less secure and less reliable with high
formulation followed by the definition of our proposed encoding
level of heterogeneity.
and the formulation of the used objective functions. In Section 4,
• Private Clouds are mainly a set of virtual machines launched
we introduce our optimization approach and the design of differ-
from owned datacenters using specific platforms. Thanks to
ent applied operators. We deal and discuss our experimentations virtualization and web services techniques, clouds provide
in Section 5. Finally, Section 6 is reserved for the conclusion and a high-scalable, well-performed and high-available comput-
the perspective ideas. ing infrastructure.
• Public Clouds are market-oriented private Clouds offered by
2. State of the art the cloud provider to cloud users in a pay-per-use model by
negotiating a level of QoS. The Cloud services are provided
to users under three common service models: Software as
This section will outline the general context of our work by a Service (SaaS), Platform as a Service (PaaS) and Infras-
illustrating the elements of the considered distributed system tructure as a Service (IaaS) which is the most used model
and the scheduling strategy followed in our work. Additionally, for scientific concerns. The Cloud market presents a huge
we will do a brief literature review on the workflow scheduling variety of Cloud providers, such as: Amazon Web Services,
optimization problem on distributed systems. Microsoft Azure and Google App Engine.
2
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 1. Different types of scientific workflows.

The mentioned infrastructures share some characteristics as well all Machine Images are grouped together in the called Machine
as they differ in others. In the rest of this paper, we will treat Images Pool as described in Fig. 2.
these HPC infrastructures as two main categories:
2.1.3. Workflow scheduling: A taxonomy
• Free Resources: regrouping free and already launched re- Generally, the workflow scheduling process on distributed
sources from Clusters, Grids, Volunteers nodes, private infrastructures involves mapping the workflow tasks to the avail-
Clouds. able processing nodes to optimize a set of objectives satisfying the
• Paid Resources: on-demand and pay-per-use virtual ma- problem constraints. For this reason, the scheduler needs to get
chines belonging to one or multiple public Clouds. access to two essential pieces of informations such are:

In Fig. 2, we present a general description of the computing • Workflow information: contains workflow structure, task
infrastructures’ components as well as the different schedul- workload, and data size, etc.
ing steps of scientific workflows on these infrastructures. With • Resources’ information: involves processing performance,
the free private resources, we generally have a fixed number availability and bandwidth, etc
of processing nodes that leads to a lack of resources either in As described in Fig. 2, and in addition to the Task Ordering and
the number of nodes, or in performance requirements. For this the Task Mapping, the emergence of the cloud computing concept
reason, the solution consists of scaling the existent private infras- brought a third operation to the workflow scheduling process
tructure with on-demand, pay-per-use resources from the public which is the Resource Provisioning. The resource provisioning aims
Clouds to get a scalable HPC infrastructure named the Hybrid at defining the suitable virtual machines’ collection (number and
Cloud. The paid VMs are launched based on the called Machine types) that best fit the scheduling objectives. Consequently, the
Images which describe the virtual machines’ capabilities (CPU, lease of these VMs next to the on-premise free nodes results in
RAM, Storage, etc.). These Machine Images are predefined by the a hybrid Resource List. At this stage, three scheduling levels are
Public Clouds’ owners. In the rest of this paper, we suppose that distinguished [23] as described as follows:
3
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 2. Scientific workflow scheduling on hybrid cloud: A general view.

• Service-level scheduling: it aims to select the appropriate infrastructure [6,10,25]. When considering only free resources,
cloud provider in the Cloud Market from which the VMs will the cost objective will not be considered: Garg et al. [33] sug-
be provisioned, and the number of leased VMs from each gested an Adaptive Workflow Scheduling (AWS) to minimize the
machine image that responds to the requested QoS of users. execution time of scheduling simple workflow on grid infras-
• Task-level scheduling: finds the optimal task-to-VM map- tructure. In some works, the scheduling problem is handled by
ping according to the specified objectives and constraints. optimizing a single objective and making the other a constraint. In
• VM-level scheduling: This scheduling focuses on optimizing this context, Arabnejad et al. [34] present the Heterogeneous Bud-
the VM placement over the physical hosts inside the Cloud get Constrained Scheduling (HBCS) algorithm, which minimizes
Data Center. Cloud administrators are the most interested in Makespan under a user-defined budget constraint. To select a re-
this level of scheduling. source, they define a worthiness weight calculated by aggregating
normalized execution cost and time on that resource. Similarly, a
In the literature [6,24,25], we identify two basic workflow back-tracking-based heuristic optimizes one objective at the first
scheduling strategies : stage then tries to swap tasks to optimize the other objective.
Zeng et al. [35] adopted this solving model in their algorithm
• Static Scheduling: also called the planning scheduling, which
ScaleStar.
consists of generating a schedule plan that contains the
When dealing with multiple objectives, EMO algorithms are
task mapping to the available resources and their execution
the most adopted techniques especially the Particle Swarm Op-
order. This scheduling plan is performed before running
timization (PSO) algorithms and Genetic Algorithms (GA) such
the application on the computing system by estimating the
in [10–12,24,36]. PSO-based algorithms are extensively used in
different metrics such as the execution time of tasks and
such problems [4]. Verma et al. [37] proposed a new Hybrid PSO
data communication time.
algorithm by improving their previous algorithm Deadline con-
• Dynamic Scheduling: with this kind of schedule, tasks are al-
strained Heterogeneous Earliest Finish Time BHEFT [38] with the
located in their execution. Furthermore, the execution time,
multi-objective PSO (MOPSO) [17]. Yao et al. [39] have designed a
the communication time, and other metrics are updated
multi-swarm multi-objective optimization algorithm (MSMOOA)
progressively. With Cloud infrastructures, the computing
to optimize makespan, cost, and energy consumption. Compared
nodes can be scaled up/down in each scheduling step.
to MOPSO and MOHEFT [40], MSMOOA results improve solutions
In the remainder of this work, we will mainly be focusing on quality. The GA-based algorithms are also present in the litera-
the Task-level and Service-level scheduling following the Static ture [41]. Zhu et al. [42] have introduced an Multi-objective Evo-
scheduling strategy. lutionary Scheduling for Cloud (EMS-C) algorithm. Authors pro-
pose their EMO-based algorithm to optimize workflow schedul-
2.2. Optimizing workflow scheduling on hybrid clouds: Related works ing on Infrastructures as a Service (IaaS) by adopting a real-world
pay-per-use pricing model and by designing novel crossover and
Before starting to expose some related works of workflow mutation operators. Furthermore, hybridization also presents an
scheduling on distributed systems, it is worth mentioning that efficient solution for the scheduling problem [43]. For exam-
there is a broad spectrum of researches focusing on this topic ple, in [44], Ahmad et al. made a hybrid GA-PSO to optimize
in the literature [26–29][4,5,30–32]. The proposed approaches makespan, cost, and load balancing when allocating tasks to
by researchers vary according to various parameters such as the Cloud infrastructures. The Hybrid GA-PSO is executed in a two-
number (single, double or multiple) or the type (Cloud-user- step fashion (GA then PSO) presents good results compared to GA
side or Cloud-provider-side) of the criteria to be optimized, the and PSO separated, but it converges to a single solution. In the
number of workflows or users and the nature of the distributed same context, Amandeep Verma et al. proposed the Hybrid PSO
4
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

(HPSO) algorithm [45]. The latter is the hybrid Multi-objective • Tcomp (Ti ): the computation time of running a task Ti on a
Particle Swarm Optimization (MOPSO) algorithm and the Bud- resource Ri (see Eq. (1))
get and Deadline constrained Heterogeneous Earliest Finish Time • BW (R1, R2): the bandwidth between the two resources R1
BDHEFT [38] algorithm. and R2 (see Eq. (2))
All the mentioned works above deal with free resources only, • Tcomm (Ti , Tj ): the communication time between Ti and Tj
such as grids or paid VMs in the public Cloud. For heterogeneous running, respectively, on Ri and Rj (see Eq. (3))
infrastructures, fewer works have been developed. Bittencourt
et al. [46] proposed the Hybrid Cloud Optimized Cost HCOC algo- Which are formulated as below:
rithm, which minimizes cost when leasing paid VM from public WL(Ti )
Tcomp (Ti ) = (1)
Clouds and adds to existing free resources in order to respect an Pow (Ri)
execution deadline. None of those mentioned above works has
Min(BW (R1), BW (R2))
{
used the matrix representation as an encoding schema. Moreover, R1 ̸ = R2
BW (R1, R2) = (2)
and to the best of our knowledge, this encoding is rarely used ∞ else
in other works related to such kind of scheduling problem [27]. Data(i, j)
Consequently, we have developed adaptive operators to our novel Tcomm (Ti , Tj ) = (3)
solution encoding. Moreover, there are rare works that focused on BW (Ri, Rj)
extending free private resources by leasing paid public ones [47,
48]. 3.2. Scheduling solution encoding

3. Problem formulation & encoding Having G = (V , E ) an application workflow of n tasks and


given F R and MI , respectively, the private Free Resources and
Throughout this section, we will detail the formulation of the list of public paid Machine Images, the scheduling task is
the different components of the adopted workflow scheduling dealt by two processes: the resource provisioning and the task
problem on the hybrid Cloud infrastructure. In addition, we will scheduling.
present our proposed encoding for the scheduling solution and The resource provisioning consists of creating a set of paid
the design of the objective functions. virtual machines PR to form with F R the hybrid resource in-
frastructure HR. By applying the function Clone in Eq. (4) we
3.1. System formulation launch CLi number of virtual machines from each type IMi to get a
homogeneous sub-list of Cloned Resources CR(IMi ). Consequently,
3.1.1. Workflow representation the paid resources PR will be the union of CR(IMi ) subsets of each
A common way to model an application workflow is to deal machine image as mentioned in Eq. (5). Moreover, the CLi values
with Direct Acyclic Graph (DAG) representation. We note a work- will form the K − tuple Clone Vector CV where K = |MI |.
flow G = (V , E ) where V is a set of n nodes representing Fig. 3 summarizes the resource provisioning process in refer-
the application tasks. Let V = {T1 , T2 , . . . , Tn }. These nodes are ence to the machines images list MI and the Clone Vector CV .
connected by a set of edges E defining the precedence relation
between tasks. Let E = {eij |i, j ∈ [1..n]}, i.e. Ti precedes Tj . We Clone : MI ↦ → N
define the set of Ti predecessors so-called parents as Pred(Ti) = (4)
{Tj|eji ∈ E }. Accordingly, we note the list of Ti successors named Clone(IMi ) = CLi
children as Succ(Ti) = {Tj|eij ∈ E }. A task Ti cannot be started CR(IMi ) = {Ri,1 , Ri,2 , . . . , Ri,CLi }, i ∈ [1..K ]
unless all its parents (Pred(Ti)) were terminated. A task with no K
predecessors is called an entry task Tentry , while a task that does

PR = CR(IMi )
not have any successors is known as an exit task Texit . For each (5)
i=1
task Ti of the application, we define a workload WL(Ti ) measured K
by MI (Millions of Instructions), and each edge eij is weighted by ∑
|PR| = mp = CLi
Data(i, j) which represents the data amount transferred from the
i=1
task Ti to the task Tj .
Once the hybrid resource list HR is created, the task schedul-
3.1.2. Infrastructure modeling ing process is proceeded by assigning each task Ti from V to
As we have mentioned at the beginning and because of sim- an HR computing resource Rk as depicted in Eq. (6). This task
plicity, we will consider a hybrid infrastructure containing two mapping is processed with a specific order defined by the func-
types of resources, some are free and others are paid. In general, tion Order(Ti ) = r (Eq. (7)), in a manner that respects the task
we assume that the following parameters define a processing dependencies rules defined in G .
node R in our hybrid infrastructures:
Map : V ↦ → HR
• The processing power Pow(R) measured in MIPS (Millions of (6)
Map(Ti ) = Rk
Instructions Per Second),
• The Price Cost(R) measured in USD per unit of time. Order : V ↦ → [1..n]
(7)
• The bandwidth BW (R) measured in mbps. Order(Ti ) = r

Let F R = {R1 , R2 , . . . , Rmf |(Cost(Ri ) = 0)}, a set of mf Free By applying the two functions Map (Eq. (6)) and Order (Eq. (7)),
Resources. Furthermore, PR = {Rmf +1 , Rmf +2 , . . . , Rmf +mp } a set of we will represent the task mapping and ordering by a matrix M
mp on-demand Paid Resources. As a result, the hybrid infrastruc- of n rows and mf + mp columns. The matrix M will be referenced

ture will be the union of the two sets, noting HR = F R PR. by two vectors: A Resources List vector RL for the columns and a
We note that each paid resource Ri is leased based on a machine Tasks List vector TL for the rows. TL is a vector enclosing the n
image IMk , Let the list of machine images MI = {IMk |k ∈ [1..K ]}. tasks of V ordered based on their levels (see Eq. (9)).
Assume executing a task Ti on a resource Ri we get the following Furthermore, if a task TL[i] is mapped to a resource RL[j] with
values: the order r then we have M[i, j] = r with respect to the task
5
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 3. Resource provisioning process.

dependencies as well as a task should be scheduled to only one and the start time of the task Ti . The finish time FT (Ti ) of a task
resource. Let us the following formulations of M (Eq. (8)): Ti will be determined as the sum of its start time ST (Ti) and its
computation time on the mapped resource Map(Ti ) (Eq. (12)).
∀i ∈ [1..n], ∃j ∈ [1..mf + mp]
Moreover, the start time ST (Ti) is calculated by the recursive
M[i, j] = r ∈ [1..n], ∀Tp ∈ Pred(TL[i]), Order(Tp ) < r
{
(8) relation in Eq. (13).
M[i, k] = 0, ∀k ∈ [1..mf + mI ]\{j} let us the following formulations:

A task level is defined by the recurrent relation in Eq. (9) as RunTime(S ) = max{FT (Ti )} (11)
Ti ∈V
follows:
1, FT (Ti ) = ST (Ti ) + Tcomp(Ti ) (12)
{
Pred(Ti ) = ∅
Lev el(Ti ) = max {Lev el(Tj )} + 1, else (9)
Tj ∈Pred(Ti ) ST (Ti ) = max[Av ail(Map(Ti )), max {FT (Tj ) + Tcom(Tj , Ti )}]
Tj ∈pred(Ti )
(13)
Hence, our proposed encoding for a scheduling solution will + Startup(Map(Ti ))
be defined as S = (CV , M): where CV defines the Clone Vector
resulting from the resource provisioning process while M repre- Noting that Av ail(Map(Ti )) is the availability time of the re-
sents a matrix that comprises the task mapping to resources as source Map(Ti ) which equals to zero if Ti is the first task executed
well as their execution orders. on it, and it will be updated with the finish time of the last
In Fig. 4, a sample of our solution encoding will be explained. completed task on that resource. And, Startup(Map(Ti )) equals to
Taking an application workflow G enclosing 7 tasks, a private a predefined Tstartup constant only if Map(Ti ) is a paid resource and
resources pool F R = [R1 , R2 , R3 ] and a list of machine images Ti is the first task executed on it, else zero. Tstartup is the time taken
MI = [IM1 , IM2 , IM3 ]. By defining mp = 3 as a maximum by a resource to be started and ready for receiving tasks.
number of leased resources, let CV = [1, 0, 2] the Clone Vector Furthermore, as mentioned in Eq. (14), the application exe-
produced by the resource provisioning process. Accordingly, one cution cost is calculated as the sum of the cost of the occupied
VM of type IM1 and two VMs of type IM3 will be leased to obtain a virtual machine Time Slots.
hybrid resource list composed of 6 nodes as depicted in Fig. 4(a).
In Fig. 4(b) a level-wise sort will be performed to generate the ∑ RunTime(Ri )
Cost(S ) = [RoundUp( ) ∗ UnitCost(Ri )] (14)
TL vector. Finally, based on the two developed vectors, the task Slot
Ri ∈PR
mapping process gives a result to the matrix values shown in
Fig. 4(c). The generated matrix can be interpreted that the task knowing that:
scheduling order is [T1 , T3 , T2 , T4 , T6 , T5 , T7 ] respectively assigned RunTime(Ri ) = FT (Tend ) − ST (Tstart ) + Startup(Ri )
to [R4 , R2 , R5 , R3 , R3 , R1 , R6 ]. where Tend is the last task executed on Ri and Tstart is the first
one.
3.3. Objective functions Additionally, for evaluation purposes, we calculate the utiliza-
tion rate of each processing node Ri as described in the following
Consequently, the scheduling problem of a given workflow G equation (Eq. (15)):
on a hybrid Cloud is to find the optimal scheduling solution S ∑
Tcomp(Ti )
Map(Ti )=Ri
that minimizes the total execution time and the execution cost of UR(Ri ) = ∗ 100 (15)
the related application. Let X the decision space that involves the RunTime(Ri )
set of eventual scheduling solutions S . These scheduling solutions
4. Proposed approach
will be evaluated by the objective functions RunTime and cost that
calculate, the overall application execution time, and the total
This section will portray the various milestones of our pro-
economic cost. Thus, the scheduling problem can be formulated
posed approach: the algorithm structure, problem encoding, ob-
as follows in Eq. (10).
jective functions and the different operators.
{
Minimize RunTime (S )
S∈X (10) 4.1. Genetically-modified multi-objective PSO
Minimize Cost (S )
For a given scheduling solution S , the application execution This subsection will be reserved for the design of our proposed
terminates with the last finished task in the task list as mentioned GMPSO algorithm. That is why we will start by giving a brief
in Eq. (11). Noting FT (Ti ) and ST (Ti ) respectively, the finish time overview of the two elementary algorithms: NSGA-II and MOPSO.
6
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 4. Example of solution encoding.

4.1.1. NSGAII algorithm named velocity and noted Vit +1 to achieve the new position Xit +1
NSGA-II is one of the most popular EMO algorithms. The through the formula in Eq. (16). As formulated in Eq. (17), the
NSGA-II algorithm generates a set of Non-dominated solutions velocity value is calculated using the current velocity Vit weighted
known as the Pareto front. The Pareto front is obtained thanks by an inertia weight ω as well as the distance between the current
to an elitism principle that aims to preserve each generation’s position and two particular positions: thepersonal best of particle
Non-dominated solutions. As highlighted in Fig. 5(a), the NSGA-II Xpbest which represents the best position recorded by the particle
algorithm is performed as follows: and the global best of particle Xgbest , also named leader, which
First, and in such a way, the initialization step consists of matches the best position achieved by the whole swarm. These
generating a population of N eventual solutions. Then, after eval- two distances are weighted by the product of a social acceleration
uating the objective functions of each solution in generated pop- coefficients and a random values noted respectively c1 , c2 and
ulation, the sorting process is performed. Actually, based on their r1 , r2 .
level of non-domination, individuals will be ranked into a set of Xit +1 = Xit + Vit +1 (16)
fronts. Moreover, the individuals with the same rank (in other
words, in the same front) will be sorted relying on the crowding Vit +1 = ωVit + c1 r1 (Xpbest − Xit ) + c2 r2 (Xgbest − Xit ) (17)
distance values.
Furthermore, a binary tournament selection identifies a gen- It is tempting to note that PSO deals with only one objec-
eration of parent individuals on which genetic operations will tive problem and leads to a single solution. Multi-Objective PSO
be performed to generate an offspring population. Indeed, the (MOPSO) algorithms are developed to solve multi-objective prob-
crossover operation is applied with a crossover probability for lems such as those depicted in [49]. The main idea, typically
two selected parents. Then, the resulting offspring will be mu- used in MOPSO algorithms, is to store the Non-dominated par-
tated by a mutation operator and under a mutation probability. ticles with respect to the swarm in an external archive to form
These operations will be applied to different parents until having the called leaders which represent the result generated by the
algorithm. In fact, after initializing the swarm’s particles and
a population of 2N individuals. Moreover, this resulting popula-
their corresponding Xpbest and generating the initial leaders as
tion will be sorted as done with the previous one, then truncated
mentioned in Fig. 5(b), the flight of each particle will be calcu-
to get a new population of N individuals. Finally, these opera-
lated similarly in Eq. (16) by selecting a leader from the leaders’
tions will be repeated until a stopping criterion is achieved or a
archive, and after, a potential mutation will be performed; then,
maximum number Ng of iterations is attained.
once the objective functions are evaluated, the pbest particles and
the leaders’ archives will be updated.
4.1.2. MOPSO background OMOPSO is an MOPSO algorithm proposed by Coello et al. [49]
Particle Swarm Optimization is a widely-used population- in which they adopt the crowding distance used in NSGA-II in
based meta-heuristic in Artificial Intelligence. PSO belongs to the selecting leader solutions as well as two turbulence (mutation)
Swarm Intelligence concept inspired by the behavior of swarms operators. Based on the OMOPSO, authors have developed also
in nature such as bird flocks, ant colonies, bee colonies etc. In PSO, the SMPSO algorithm described in [50].
we call a swarm the population enclosing the eventual solutions,
and each individual of this swarm is known as a particle. Each 4.1.3. Proposed model
swarm’s particle changes its position in the search space based on In our proposed approach, we aim at incorporating the NSGA-
its own experience and the ones of the other particles to converge II operations into the execution process of the MOPSO. As de-
to the best solution. Thus, the ith particle of the swarm Xi at the scribed in Fig. 5(c), the NSGA-II process is applied after a pre-
iteration t noted Xit would fly into the search space with a vector defined number of iterations that we called GMJump. As well,
7
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 5. Flowcharts of basic algorithms.

and in the (k ∗ GMJump)th generation, these genetic operations Algorithm 1 GMPSO Algorithm Pseudo-Code
are performed between a randomly selected GMParents leaders
1: Initialize(S w arm, N)
and GMParents particles. Noting that GMParents is a predefined
2: LocalBest ← S w arm
value. For more explanation, we present the pseudo-code of our
3: Leaders ← SelectLeaders(S w arm, NL) ▷ NL is the size of
proposed approach in Algorithm 1. The proposed GMPSO algo-
Leaders
rithm begins by initializing the particle Swarm and the LocalBest
4: Define(GMJump)
list with N scheduling solutions (Lines 1–2). Then, the Leaders list
5: Define(GMparents)
will be filled by the non-dominated solutions of the initialized
6: for i ← 1, N_Iter do
Swarm(Line 3). Therefore, after defining the two metrics GMJump
7: UpdateVelocities()
and GMParents, the main loop of the N_Iter iterations starts with
8: UpdatePositions(S w arm)
updating velocities values and particles’ positions (Lines 7–8)
9: Offsprings ← NULL
as defined, respectively, in Eqs. (16) and (17). Next, and before
10: if N_Iteration mod (GMJump) = 0 then
mutating solutions (particles), the genetic modifications will be
11: for j ← 1, GMparents do
performed (Lines 9–18). We will generate a population of 2 ∗
12: Parents1 ← RandomSelect(S w arm)
GMParents offsprings by crossing a randomly selected GMParents
13: Parents2 ← RandomSelect(Leaders)
particles of leaders with the same number of randomly selected
14: Add(CrossParticles(Parents1, Parents2), Offsprings)
particles from the Swarm. Noting that these operations are exe- 15: end for
cuted only after each GMJump number of iterations. Afterward, 16: Mutate(Offsprings) ▷ Apply Mutation operations
the mutation and evaluation functions will be applied to the 17: E v aluate(Offsprings) ▷ Evaluate objectives functions of
generated offspring solutions (Lines 16–17) as well as for the solutions
Swarm solutions(Lines 19–20). Subsequently, the LocalBest list of 18: end if
particles will be updated by comparing them with the newly 19: Mutate(S w arm)
calculated positions (Line 21). Finally, the Leaders set will be 20: E v aluate(S w arm)
upgraded with the new dominating solutions of the new Swarm 21: UpdateLocalBest(LocalBest)
particles as well as those of generated offsprings. The algorithm 22: UpdateLeaders(Leaders, S w arm, Offsprings)
will return the Leaders set as a final result. 23: end for
24: return Leaders
4.2. Operations

This section will portray the different operators applied within


the various algorithms involved in our approach. shuffling their orders somehow not violating the task dependen-
cies constraint. Afterward, we randomly assign each task to one
4.2.1. Initialization and only one computing resource by filling the corresponding or-
As described in Section 3.2, a scheduling solution S is defined der. In other words, for each task row index, we fill its generated
by the couple S = (CV , M). Thus, a solution initialization implies order to the corresponding column of the hosting resource.
the initialization of the Clone Vector CV and the mapping matrix
M.
Foremost, after fixing a maximum number mp of VMs that the 4.2.2. Crossover operator
user wants to lease and by randomly choosing a machine image The proposed crossover operator aims to make swap between
from the MI list for mp times, the Clone Vector will be built two Cross Windows in two-parent matrix schedules. We call Cross
by the occurrence of each machine image. Consequently, the RL Window an interval of rows where the sets of task order values
vector can be created as described in Fig. 4(a). in the two schedules are the same. In such a way, swapping
Moreover, the mapping matrix initialization starts by generat- these rows allows to preserve task dependencies and avoid order
ing the task orders by taking a task sub-list of equal levels and redundancy.
8
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 6. Example of crossover operation.

Fig. 7. Mutation operators.


As described in Algorithm 2, we begin by searching the Cross
Window delimiters: Start and End. Starting from a random posi-
tion pos and by calling a function named FirstEqual depicted in Fig. 7(a), we illustrate an example of Order_ Mutation operation.
Line 7, we get Start which represents the first index that gives Having T3 the randomly selected task of order 2. Beginning by
coincidence of order sets in the two schedules (Lines 1–2). Then, order 3 arriving at 5, the algorithm gives the maximum task List
similarly to Start, we determine End, starting from a random [T2 , T4 , T6 ] which are not successors of T3 i.e. the last task that can
index between Start and n (the number of tasks). Thus, we get swap order with T3 is T6 . Therefore, T3 order will be delayed to 5
three (at least two) common sub-lists of task orders in the two and all order values between 3 and 5 will be shifted down by 1.
schedules. We can make the swap operation between the rows The VM_ Mutation allows two resources to swap their mapped
Start + 1 and End at this stage. tasks. Simply, this operation is performed by selecting 2 random
Algorithm 2 CrossSched (S 1,S 2) rows and permuting them, as shown in Fig. 7(b).
The CV_ Mutation aims to change the leased VMs configuration
1: pos ← Random(1, n/4) by swapping the clone values of two randomly selected machines
2: Start ← FirstEqual(S 1, S 2, pos) ▷ find the position that images.
gives the first similarity of orders’ list. It is worth mentioning that for OMOPSO and SMPSO algo-
3: pos ← Random(Start + 1, n) rithms, we preserve the mutation operators defined by the au-
4: End ← FirstEqual(S 1, S 2, pos) thors. For instance, Uniform Mutation and NonUniform Mutation
5: Swap S 1 and S 2 Rows between Start+1,End for OMOPSO algorithm and Polynomial Mutation for SMPSO algo-
6: rithm.
7: function FirstEqual(S 1, S 2, pos )
8: List1 ← Ordres(S 1, 1, pos) ▷ gives sub-list order between 5. Experimentation and discussion
1 to posth rows
9: List2 ← Ordres(S 2, 1, pos) This section will be reserved for our approach performance
10: Eq ← pos evaluation. Therefore, we will describe below our experiment
11: while Liste1 ̸ = Liste2 do setup and the simulation results.
12: Eq ← Eq + 1
13: List1 ← Ordres(S 1, 1, Eq) 5.1. Experimentation setup
14: List2 ← Ordres(S 2, 1, Eq)
15: end while To evaluate our proposed approach, we have chosen the Work-
16: end function flowSim [51] framework as a Cloud simulator. WorkflowSim is
In Fig. 6, we describe a simple example of crossover operation a recent extension of the CloudSim toolkit [52], which allows
between two schedules in which we obtain a Cross Window simulating scientific workflows on Cloud infrastructures.
composed by the rows 2, 3, 4. As you noticed very well, the The hybrid Cloud configuration adopted for our experiments is
particularity of this crossover method is that it allows making a described in Table 1. We consider that the free resources are ho-
change in task order and task mapping simultaneously. mogeneous and with lower performance than the paid resources
with various configurations. Based on five paid machines images,
4.2.3. Mutation operators we adopt that the resources list size will be the same as the
For the mutation operators in our approach, we have designed workflow task number.
three mutation operators: Order_ Mutation, VM_ Mutation and CV_ For the workflow dataset, the WorkflowSim presents a set of
Mutation. In the case of mutation application, one of these opera- benchmark workflows provided by the WorkflowGenerator of the
tors will be randomly selected. These operators are undoubtedly Pegasus project [53]. Thus, our experiments are carried out with
designed not to violate the task dependencies constraint. five types of synthetic workflows as described in Section 2.1.1:
The Order_ Mutation consists of selecting a task randomly from Montage for astronomy research, CyberShake for earthquake sci-
a given schedule Matrix and tries to move its order as delayed as ence, Sipht and Epigenomics for biological concerns, Inspiral deal-
possible in such a manner as not to violate task dependencies. In ing with gravitational physics. Noting S1, S2, S3 and S4 the
9
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Table 1 Table 3
Hybrid cloud configuration details. Proposed and Peer Algorithms parameters.
Parameter Free resources (F R) Paid resources (PR) Parameter NSGA-II OMOPSO SMPSO GMPSO
♯Machine Images 1 5 (Swarm/Population) size (N) 100
♯CPUs 1 1-2 Iterations(Ng) 300
MIPS 500 1000–2000 Simulations 20
RAM(MB) 512 512–2048 Leaders Size – 50 50 50
BW(mbps) 1000 1000 Crossover Probability 0.5 – – 0.5
Cost(USD/H) 0 0.025–0.075 Mutation Probability 0.5 0.5 0.5 0.5
C1 , C2 – [1.5 .. 2] [1.5 .. 2.5] [1.5 .. 2.5]
r1 , r2 – [0..1] [0..1] [0..1]
Table 2
W – [0.1 .. 0.5] 0.1 0.1
Workflow benchmark configurations.
GMParents – – – 50
Workflow ♯Lev els ♯Tasks Average tasks length (103 MI) GMJump – – – 3
S1 S2 S3 S4 S1 S2 S3 S4
Montage 9 25 50 100 1000 9,1 10,2 10,8 11,4
Epigenomics 8 24 46 100 997 738,3 880,9 4034 3866,4
CyberShake 4 30 50 100 1000 25,4 30,5 32,2 22,8 of more than 10% for NSGA-II against other algorithms with
Inspiral 6 30 50 100 1000 220,5 235,2 210,2 227,7 some workflow cases such as CyberShake 1000, Inspiral 1000,
Sipht 5 30 60 100 1000 191,2 201 179,2 179,4 Montage 100 and all Sipht workflows. Otherwise, the UR of other
algorithms are in the same range and still having reasonable
values.
At this stage, it is worth bearing in mind that we are working
different workflow sizes, we detail in Table 2 the number of
in the context of HPC. So, our discussion will focus on large-
levels, the number of nodes, and the average processing load
of tasks of each workflow size to get in total 20 benchmark sized workflows (in terms of the number of tasks) and heavy-duty
workflow configurations. workflows (average task length). Thus, in our remaining analysis,
we distinguish two statistical classes: One related to all workflow
5.2. Evaluation metrics types and the other capturing the HPC workflows by focusing on
the S3 and S4 sizes of workflows.
Within the context of Multi-objective problems, many perfor- Based on HVr values reported in Table 4, we notice that
mance indicators are used to evaluate the achieved results [54]. in 13 cases out of 19 of benchmark workflow cases, our pro-
Inverted Generational Distance (IGD) and the Hypervolume(HV) are posed GMPSO algorithm presents higher HVr rates. Fig. 8 shows
commonly used performance metrics. For the two metrics, IGD how the gain in HVr of our algorithm against comparative al-
and HV, we adopt the reference front (in other words the true gorithms is distributed. This gain is calculated as HVr gain =
Pareto front) as the union (in Pareto dominance sense) of the HVr (GMPSO) − HVr (comparativ e algorithm). As noticed, GMPSO
different Pareto fronts generated by all tested algorithms. outperforms NSGA-II for 75% of tested workflows with a HVr gain
In addition to HV and IGD metrics, the virtual machines uti- greater than 16,87% and greater than 26,95% in HPC context. Ad-
lization rate UR and the running time RT of an algorithm will be ditionally, we recorded a gain greater than 40% up to 61% for 25%
reported. We adopt that the UR value is calculated as the average of benchmark workflows. Similarly, we notice that OMOPSO is
utilization percentage of all processing nodes launched by the dominated by our GMPSO approach for all benchmark workflows,
different scheduling solutions generated by the algorithm. The 25% of them record a gain greater than 24,38%. Further, for 50% of
higher the value of UR, the better the quality of a solution. HPC workflows, GMPSO exceeds in performance with a HVr gain
between 16,74% and 43,67%.
5.3. Results and discussion Regarding the SMPSO algorithm, more than 50% of workflows
present non-significant gain values lower than 1,25%, but the
To validate our proposed GMPSO algorithm, and while it maximum reached value of gain is 31,45%. There is no significant
involves the MOPSO and genetic-based concepts, we choose to difference for HPC workflows except an improvement of the
compare our approach with: First, NSGA-II as a well-known minimum observed gain with the value of −0,8% instead of −4,7%
genetic-based multi-objective algorithm. Second, we will com- in the general context.
pare it with OMOPSO and SMPSO, which are proved to be two Analogous interpretations are revealed with box plots drawn
well-performed MOPSO algorithms as explained in [49,50]. Ta- in Fig. 9 using the IGD gain of our proposed GMPSO algorithm
ble 3 displays the parameters settings of the different tested against comparative algorithms and calculated as IGD gain =
algorithms. We have carried out 20 simulations for each bench- IGD(comparativ e algorithm)/IGD(GMPSO). In fact, a considerable
mark workflow. Further, we have defined a population/swarm of gain is achieved, especially in HPC context, for GMPSO versus
100 individuals, and the maximum number of iterations will be NSGA-II and OMOPSO with values greater than, respectively, 2,29
fixed to 300. and 2,44 for 75% of workflow configurations. In the other side,
By performing these simulations, we plotted the different SMPSO is still presenting a global competitive performance with
Pareto fronts with the various workflow configurations. In Fig. 12, regard to our GMPSO algorithm.
we selected a set of these Pareto front plots to be interpreted. Having a closer focus on results, Fig. 10 shows a comparative
Furthermore, the different evaluation metrics will be recorded as analysis of the average values of HVr and IGD metrics between
the average of calculated values among the 20 simulations’ runs. the different algorithms for each type of workflow separately.
Concerning the HV metric, we adopt to derive a hypervolume rate For all workflows sizes as illustrated in Figs. 10(a) and 10(b), we
HVr value for each algorithm with regards to the true Pareto front remark that SMPSO and GMPSO outperform the other algorithms
where HVr (Algorithm) = HV (Algorithm) ∗ 100/HV (TrueFront). (NSGA-II and OMOPSO) for all workflow types with HVr or the
The HVr , IGD and UR values will be collected in Table 4. IGD metric. However, when observing Figs. 10(c) and 10(d) re-
Beginning by having a general reading of results and starting lated to HPC workflows, we can spot that for the Epigenomic
by the UR values, we note that these values are approximately workflows, results favor of GMPSO and NSGA-II versus SMPSO
the same for the different algorithms. Except for a slight advance and OMOPSO in terms of both HVr and IGD metrics. Here, we can
10
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Table 4
Evaluation metrics.
Workflow NSGA-II OMOPSO SMPSO GMPSO
HVr IGD UR HVr IGD UR HVr IGD UR HVr IGD UR
CyberShake 30 81,69 0,122 89,42 94,04 0,084 88,82 95,05 0,067 87,84 95,05 0,067 88,38
CyberShake 50 74,50 0,161 88,46 87,26 0,082 86,23 91,18 0,060 86,8 91,37 0,058 86,45
CyberShake 100 54,02 0,241 87,26 75,59 0,131 79,68 92,02 0,050 83,39 92,33 0,052 83,39
CyberShake 1000 59,35 0,213 87,35 62,57 0,198 69,34 93,98 0,034 76,21 93,30 0,033 77,45
Epigenomics 24 55,89 0,217 90,42 82,71 0,102 89,04 85,23 0,07 92,45 86,87 0,062 91,13
Epigenomics 46 61,40 0,222 88,15 82,69 0,105 87,19 84,12 0,103 90,18 84,15 0,090 89,60
Epigenomics 100 63,73 0,141 93,87 52,83 0,228 92,51 57,18 0,219 92,53 77,22 0,077 94,05
Epigenomics 997 75,32 0,143 88,37 31,37 0,284 90,05 43,59 0,23 90,70 75,04 0,063 87,03
Inspiral 30 59,46 0,235 84,50 88,90 0,088 80,15 91,54 0,071 80,99 91,64 0,073 83,08
Inspiral 50 51,43 0,275 84,1 78,78 0,125 80,92 88,22 0,084 81,06 88,53 0,075 80,4
Inspiral 100 46,52 0,287 83,49 67,40 0,177 74,18 86,69 0,071 76,33 86,55 0,072 76,72
Inspiral 1000 43,27 0,323 80,53 28,19 0,465 67,66 86,79 0,073 68,74 87,13 0,071 68,74
Montage 25 48,91 0,294 83,04 85,06 0,103 80,65 87,14 0,087 83,43 86,86 0,082 82,28
Montage 50 54,72 0,249 85,66 77,74 0,129 76,4 90,85 0,061 80,26 91,05 0,063 79,86
Montage 100 64,88 0,214 87,52 79,69 0,121 71,38 90,57 0,065 75,75 91,83 0,065 75,96
Sipht 30 48,01 0,302 81,95 61,05 0,311 68,22 69,12 0,269 63,68 64,35 0,277 68,5
Sipht 60 40,94 0,373 83,03 61,36 0,303 68,5 77,40 0,224 63,46 81,49 0,179 68,36
Sipht 100 27,33 0,543 84,11 73,38 0,170 72 80,52 0,130 68,42 79,67 0,135 69,09
Sipht 1000 35,01 0,41 81,78 42,45 0,430 67,68 95,04 0,075 63,56 96,28 0,056 64,29

Fig. 8. Box plots for the HVr gain.

Fig. 9. Box plots for the IGD gain.

conclude that the effects of SMPSO and GMPSO are competitive into two sets: low and high-budget solutions. By analyzing these
for all workflow types, except for Epigenomic workflow, GMPSO results parallel with the algorithm Runtime values, we clearly de-
is clearly dominating. And, when examining information included tect the dominance of our proposed algorithm against SMPSO and
in Table 2, we can observe that Epigenomics 997 represents OMOPSO, especially with low budget solutions contrary to high
the highest-performing computing application of the benchmark budget where we notice close results. As a matter of fact, the less
with an average task length of 3866397 MI. Hence, for what re- priced solution generated by SMPSO has a Makespan of 3646026 s
mains in this section, we will study in-depth the Epigenomics 997 (≃ 1012 Hours) with a Cost of 28,25 USD, while GMPSO gives
workflow as the most appropriate example of an HPC application. a result to solutions having an approximate range of Makespan
Taking the Epigenomics 997 workflow as a typical HPC work- between 25 and 34 h with a budget less than 0,7 USD. On the
flow, we compared the different algorithms by sampling some other hand, GMPSO algorithm terminates with only 278 s of extra
generated solutions in the objective space kind of [makespan, Runtime compared to SMPSO, which is considered a negligible
cost] couples. These solutions’ samples are reported in Table 5 value compared to the considerable gain in Makespan and Cost.
11
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 10. Comparison of average of HVr and IGD for different workflow types.

Figs. 12(b) and 12(g) show this Pareto dominance of solutions Table 5
The contribution of our proposed approach with HPC applications: Epigenomics
generated by our proposed approach compared to SMPSO and
997 as example.
OMOPSO.
NSGA-II OMOPSO SMPSO GMPSO
Another aspect worth to be studied is the convergence speed
Low Budget Runtime

of the algorithm toward a stable quality of solutions. Within the


1116 299 275 553
same case of Epigenomics 997 workflow, we traced the evolution
of the HV and IGD metrics of the different algorithms function [83331/17,925] [3894077/27] [3645967/28,6] [85144/12,225]
to the number of iterations. As shown in Fig. 11(a), we spot [83924/14,425] [3941625/26,625] [3646002/28,525] [91305/0,7]
that GMPSO curve presents two pillars: First, at the iteration [89454/0,075] [4000351/26,075] [3646004/28,5] [108802/0,225]
of 30, GMPSO surmount with HV value of 0.23 against 0.19 for [109466/0] [4000879/25,925] [3646026/28,25] [124399/0]
SMPSO, and we mark an advance of NSGA-II with 0.32. Second,
[53049/95,5] [39305/86,825] [37208/80,275] [38958/78,7]
High Budget

at the iteration of 120, GMPSO makes a second jump to the


[56244/81,725] [40005/78,525] [38414/78,625] [40822/78,15]
value of 0.34 and reaches HV≃ 0.43, while SMPSO is still has
[56560/81,525] [46659/76,95] [40894/76,95] [42098/75,075]
values less than 0.22. Mutually with HV metric, Fig. 11(b) reveals
[61474/65,05] [48079/76,625] [41506/76,25] [42446/72,375]
similar interpretations for the IGD ratio evolution. Thereby, we
can confirm that GMPSO starts giving more performant results
versus SMPSO and OMOPSO at an early stage (iteration 30) and
presents superior solutions at a reasonable number of iterations
(120).
12
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

Fig. 11. HV and IGD ratios evolution - Epigenomics 997.

Fig. 12. Results with different workflow types.

6. Conclusion designed crossover and mutation operators give results to a com-


prehensive findings that have been gained compared to NSGA-II,
In summary, we have proposed in this paper a Genetically- OMOPSO, and SMPSO algorithm by conducting simulations on
Modified Multi-Objective Particle Swarm Optimization as a new well-known scientific workflows. Even though we are focusing
model for optimizing workflow scheduling on hybrid Clouds by
on high-performant workflows, the most significant limitation of
invocating genetic operations within the MOPSO process. We
have designed an appropriate encoding that conveniently repre- our work is that the gain achieved with the small workflow sizes
sents scheduling steps: task ordering, resource provisioning and is not significant as it would be in case of large workflows. In
task mapping. The genetic modification of the MOPSO with our other words, we believe our proposed GMPSO algorithm could
13
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

be a springboard for a generalized approach targeting all work- [15] N. Nedjah, L.d.M. Mourelle, Evolutionary multi-objective optimisation: A
flow sizes. Further studies, which take real-time scheduling into survey, Int. J. Bio-Inspired Comput. 7 (1) (2015) 1–25, http://dx.doi.org/10.
1504/IJBIC.2015.067991.
account, will need to be undertaken. Additionally, new objectives
[16] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjec-
and constraint could be added, such as: Fault tolerance, security tive genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002)
and resource utilization. 182–197, http://dx.doi.org/10.1109/4235.996017.
[17] C.A.C. Coello, G.T. Pulido, M.S. Lechuga, Handling multiple objectives with
particle swarm optimization, IEEE Trans. Evol. Comput. 8 (3) (2004)
CRediT authorship contribution statement
256–279, http://dx.doi.org/10.1109/TEVC.2004.826067.
[18] Z. Ahmad, A.I. Jehangiri, M.A. Ala’anzy, M. Othman, R. Latip, S.K.U. Zaman,
Haithem Hafsi: Concept, Design, Analysis, Writing – review & A.I. Umar, Scientific workflows management and scheduling in cloud
editing. Hamza Gharsellaoui: Concept, Design, Analysis, Writing computing: Taxonomy, prospects, and challenges, IEEE Access 9 (2021)
– review & editing. Sadok Bouamama: Concept, Design, Analysis, 53491–53508, http://dx.doi.org/10.1109/ACCESS.2021.3070785.
[19] R. Wankar, Grid computing with globus : An overview and research
Writing – review & editing. challenges, Int. J. Comput. Sci. Appl. (2008).
[20] globus, https://www.globus.org/.
Declaration of competing interest [21] E. Laure, S. Fisher, A. Frohner, C. Grandi, P. Kunszt, A. Krenek, O. Mulmo,
F. Pacini, F. Prelz, J. White, Programming the grid with glite, Comput.
Methods Sci. Technol. 12 (2006) 33–45.
The authors declare that they have no known competing finan- [22] Berkeley open infrastructure for network computing, https://boinc.
cial interests or personal relationships that could have appeared berkeley.edu/.
to influence the work reported in this paper. [23] P.R.C. Kousalya G., Workflow scheduling algorithms and approaches, in:
Automated Workflow Scheduling in Self-Adaptive Clouds. Computer Com-
munications and Networks, Springer, Cham, 2017, http://dx.doi.org/10.
References 1007/978-3-319-56982-6_4.
[24] M. Masdari, S. ValiKardan, Z. Shahi, S.I. Azar, Towards workflow scheduling
[1] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, K. Vahi, in cloud computing: A comprehensive analysis, J. Netw. Comput. Appl. 66
Characterizing and profiling scientific workflows, Future Gener. Comput. (2016) 64–82, http://dx.doi.org/10.1016/j.jnca.2016.01.018.
Syst. 29 (3) (2013) 682–692, http://dx.doi.org/10.1016/j.future.2012.08.015, [25] S. Smanchat, K. Viriyapant, Taxonomies of workflow scheduling problem
Special Section: Recent Developments in High Performance Computing and and techniques in the cloud, Future Gener. Comput. Syst. 52 (2015) 1–
Security. 12, http://dx.doi.org/10.1016/j.future.2015.04.019, Special Section: Cloud
[2] M. Shikha, P. Kaur, Scheduling data intensive scientific workflows in cloud Computing: Security, Privacy and Practice.
environment using nature inspired algorithms, in: B. Hema, S. Mehta, [26] F. Wu, Q. Wu, Y. Tan, Workflow scheduling in cloud: a survey, J. Super-
P. Kaur (Eds.), Nature-Inspired Algorithms for Big Data Frameworks, IGI comput. 71 (9) (2015) 3373–3418, http://dx.doi.org/10.1007/s11227-015-
Global, 2019, pp. 196–217, http://dx.doi.org/10.4018/978-1-5225-5852-1. 1438-4.
ch008. [27] M. Kalra, S. Singh, A review of metaheuristic scheduling techniques in
[3] I. Foster, The anatomy of the grid: Enabling scalable virtual organizations, cloud computing, Egypt. Inform. J. 16 (3) (2015) 275–295, http://dx.doi.
in: R. Sakellariou, J. Gurd, L. Freeman, J. Keane (Eds.), Euro-Par 2001 Parallel org/10.1016/j.eij.2015.07.001.
Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 1–4. [28] Poonam, M. Dutta, N. Aggarwal, Meta-heuristics based approach for
[4] M. Kumar, S. Sharma, A. Goel, S. Singh, A comprehensive survey workflow scheduling in cloud computing: A survey, in: S.S. Dash, M.A.
for scheduling techniques in cloud computing, J. Netw. Comput. Appl. Bhaskar, B.K. Panigrahi, S. Das (Eds.), Artificial Intelligence and Evolutionary
143 (2019) 1–33, http://dx.doi.org/10.1016/j.jnca.2019.06.006, URL https: Computations in Engineering Systems, Springer India, New Delhi, 2016, pp.
//www.sciencedirect.com/science/article/pii/S1084804519302036. 1331–1345.
[5] A. Arunarani, D. Manjula, V. Sugumaran, Task scheduling techniques in [29] Z.-H. Zhan, X.-F. Liu, Y.-J. Gong, J. Zhang, H.S.-H. Chung, Y. Li, Cloud com-
cloud computing: A literature survey, Future Gener. Comput. Syst. 91 puting resource scheduling and a survey of its evolutionary approaches,
(2019) 407–415, http://dx.doi.org/10.1016/j.future.2018.09.014. ACM Comput. Surv. 47 (4) (2015) 63:1–63:33, http://dx.doi.org/10.1145/
[6] M. Adhikari, T. Amgoth, S.N. Srirama, A survey on scheduling strategies 2788397.
for workflows in cloud environment and emerging trends, ACM Comput. [30] M. Adhikari, T. Amgoth, S.N. Srirama, Multi-objective scheduling strategy
Surv. 52 (4) (2019) http://dx.doi.org/10.1145/3325097. for scientific workflows in cloud environment: A firefly-based approach,
[7] F. Zhang, J. Cao, K. Hwang, C. Wu, Ordinal optimized scheduling of scientific Appl. Soft Comput. 93 (2020) 106411, http://dx.doi.org/10.1016/j.asoc.
workflows in elastic compute clouds, in: 2011 IEEE Third International 2020.106411.
Conference on Cloud Computing Technology and Science, 2011, pp. 9–17, [31] B.H. Abed-alguni, N.A. Alawad, Distributed grey wolf optimizer for schedul-
http://dx.doi.org/10.1109/CloudCom.2011.12. ing of workflow applications in cloud environments, Appl. Soft Com-
[8] M. Lavanya, B. Shanthi, S. Saravanan, Multi objective task scheduling put. 102 (2021) 107113, http://dx.doi.org/10.1016/j.asoc.2021.107113, URL
algorithm based on SLA and processing time suitable for cloud environ- https://www.sciencedirect.com/science/article/pii/S1568494621000363.
ment, Comput. Commun. 151 (2020) 183–195, http://dx.doi.org/10.1016/j. [32] M. Adhikari, T. Amgoth, An intelligent water drops-based workflow
comcom.2019.12.050. scheduling for IaaS cloud, Appl. Soft Comput. 77 (2019) 547–566, http://
[9] S. Srichandan, T. Ashok Kumar, S. Bibhudatta, Task scheduling for cloud dx.doi.org/10.1016/j.asoc.2019.02.004, URL https://www.sciencedirect.com/
computing using multi-objective hybrid bacteria foraging algorithm, Future science/article/pii/S1568494619300638.
Comput. Inform. J. 3 (2) (2018) 210–230, http://dx.doi.org/10.1016/j.fcij. [33] R. Garg, A.K. Singh, Adaptive workflow scheduling in grid computing based
2018.03.004. on dynamic resource availability, Eng. Sci. Technol. Int. J. 18 (2) (2015)
[10] Y. Abdi, M.-R. Feizi-Derakhshi, Hybrid multi-objective evolutionary algo- 256–269, http://dx.doi.org/10.1016/j.jestch.2015.01.001.
rithm based on search manager framework for big data optimization [34] H. Arabnejad, J.G. Barbosa, A budget constrained scheduling algorithm
problems, Appl. Soft Comput. 87 (2020) 105991, http://dx.doi.org/10.1016/ for workflow applications, J. Grid Comput. 12 (4) (2014) 665–679, http:
j.asoc.2019.105991. //dx.doi.org/10.1007/s10723-014-9294-7.
[11] T. Chugh, K. Sindhya, J. Hakanen, K. Miettinen, A survey on handling [35] L. Zeng, B. Veeravalli, X. Li, ScaleStar: Budget conscious scheduling
computationally expensive multiobjective optimization problems with precedence-constrained many-task workflow applications in cloud, in:
evolutionary algorithms, Soft Comput. 23 (2019) 3137–3166, http://dx.doi. 2012 IEEE 26th International Conference on Advanced Information Net-
org/10.1007/s00500-017-2965-0. working and Applications, 2012, pp. 534–541, http://dx.doi.org/10.1109/
[12] J.-R. Jian, Z.-H. Zhan, J. Zhang, Large-scale evolutionary optimization: a AINA.2012.12.
survey and experimental comparative study, Int. J. Mach. Learn. Cybern. [36] S. Ding, C. Chen, B. Xin, P.M. Pardalos, A bi-objective load balancing model
11 (2020) 729–745, http://dx.doi.org/10.1007/s13042-019-01030-4. in a distributed simulation system using NSGA-II and MOPSO approaches,
[13] A. Zhou, B.-Y. Qu, H. Li, S.-Z. Zhao, P.N. Suganthan, Q. Zhang, Multiobjective Appl. Soft Comput. 63 (2018) 249–267, http://dx.doi.org/10.1016/j.asoc.
evolutionary algorithms: A survey of the state of the art, Swarm Evol. 2017.09.012.
Comput. 1 (1) (2011) 32–49, http://dx.doi.org/10.1016/j.swevo.2011.03.001. [37] A. Verma, S. Kaushal, A hybrid multi-objective particle swarm optimization
[14] D. Molina, J. Poyatos, J.D. Ser, S. García, A. Hussain, F. Herrera, Compre- for scientific workflow scheduling, Parallel Comput. 62 (2017) 1–19, http:
hensive taxonomies of nature- and bio-inspired optimization: Inspiration //dx.doi.org/10.1016/j.parco.2017.01.002.
versus algorithmic behavior, critical analysis recommendations, Cogn. [38] A. Verma, S. Kaushal, Cost-time efficient scheduling plan for executing
Comput. 12 (2020) 897–939, http://dx.doi.org/10.1007/s12559-020-09730- workflows in the cloud, J Grid Comput. 13 (4) (2015) 495–506, http:
8. //dx.doi.org/10.1007/s10723-015-9344-9.

14
H. Hafsi, H. Gharsellaoui and S. Bouamama Applied Soft Computing 122 (2022) 108791

[39] G.-s. Yao, Y.-s. Ding, K.-r. Hao, Multi-objective workflow scheduling in [48] Y. Liu, L. Wang, X.V. Wang, X. Xu, L. Zhang, Scheduling in cloud manufac-
cloud system based on cooperative multi-swarm optimization algorithm, turing: state-of-the-art and research challenges, Int. J. Prod. Res. 57 (15–16)
J. Central South Univ. 24 (5) (2017) 1050–1062, http://dx.doi.org/10.1007/ (2019) 4854–4879, http://dx.doi.org/10.1080/00207543.2018.1449978.
s11771-017-3508-7. [49] J.J. Durillo, J. García-Nieto, A.J. Nebro, C.A.C. Coello, F. Luna, E. Alba, Multi-
[40] J.J. Durillo, H.M. Fard, R. Prodan, MOHEFT: A multi-objective list-based objective particle swarm optimizers: An experimental comparison, in: M.
method for workflow scheduling, in: 4th IEEE International Conference on Ehrgott, C.M. Fonseca, X. Gandibleux, J.-K. Hao, M. Sevaux (Eds.), Evolu-
Cloud Computing Technology and Science Proceedings, 2012, pp. 185–192, tionary Multi-Criterion Optimization, Springer Berlin Heidelberg, Berlin,
http://dx.doi.org/10.1109/CloudCom.2012.6427573. Heidelberg, 2009, pp. 495–509.
[41] H.Y. Shishido, J.C. Estrella, C.F.M. Toledo, M.S. Arantes, Genetic-based [50] A.J. Nebro, J.J. Durillo, J. Garcia-Nieto, C.A. Coello Coello, F. Luna, E.
algorithms applied to a workflow scheduling algorithm with security and Alba, SMPSO: A new PSO-based metaheuristic for multi-objective op-
deadline constraints in clouds, Comput. Electr. Eng. 69 (2018) 378–394, timization, in: 2009 IEEE Symposium on Computational Intelligence in
http://dx.doi.org/10.1016/j.compeleceng.2017.12.004. Multi-Criteria Decision-Making, MCDM, 2009, pp. 66–73, http://dx.doi.org/
[42] Z. Zhu, G. Zhang, M. Li, X. Liu, Evolutionary multi-objective workflow 10.1109/MCDM.2009.4938830.
scheduling in cloud, IEEE Trans. Parallel Distrib. Syst. 27 (5) (2016) [51] W. Chen, E. Deelman, WorkflowSim: A toolkit for simulating scientific
1344–1357, http://dx.doi.org/10.1109/TPDS.2015.2446459. workflows in distributed environments, in: 2012 IEEE 8th International
[43] M.T. Younis, S. Yang, Hybrid meta-heuristic algorithms for independent Conference on E-Science, 2012, pp. 1–8, http://dx.doi.org/10.1109/eScience.
job scheduling in grid computing, Appl. Soft Comput. 72 (2018) 498–517, 2012.6404430.
http://dx.doi.org/10.1016/j.asoc.2018.05.032. [52] R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F. De Rose, R. Buyya, CloudSim:
[44] A.M. Manasrah, H. Ba Ali, Workflow scheduling using hybrid GA-PSO A toolkit for modeling and simulation of cloud computing environments
algorithm in cloud computing, Wirel. Commun. Mob. Comput. 2018 (2018) and evaluation of resource provisioning algorithms, Softw. Pract. Exper. 41
http://dx.doi.org/10.1155/2018/1934784.
(1) (2011) 23–50, http://dx.doi.org/10.1002/spe.995.
[45] A. Verma, S. Kaushal, A hybrid multi-objective particle swarm optimization
[53] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P.J. Maechling, R.
for scientific workflow scheduling, Parallel Comput. 62 (2017) 1–19, http:
Mayani, W. Chen, R. Ferreira da Silva, M. Livny, K. Wenger, Pegasus,
//dx.doi.org/10.1016/j.parco.2017.01.002.
a workflow management system for science automation, Future Gener.
[46] L.F. Bittencourt, E.R.M. Madeira, HCOC: a cost optimization algorithm for
workflow scheduling in hybrid clouds, J. Internet Serv. Appl. 2 (3) (2011) Comput. Syst. 46 (C) (2015) 17–35, http://dx.doi.org/10.1016/j.future.2014.
207–227, http://dx.doi.org/10.1007/s13174-011-0032-0. 10.008.
[47] Y. Chang, C.-T. Fan, R.-K. Sheu, S.-R. Jhu, S. Yuan, An agent-based work- [54] C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, L. Salomon, Performance
flow scheduling mechanism with deadline constraint on hybrid cloud indicators in multiobjective optimization, European J. Oper. Res. (2020)
environment, Int. J. Commun. Syst. 31 (2018). http://dx.doi.org/10.1016/j.ejor.2020.11.016.

15

You might also like