0% found this document useful (0 votes)
70 views8 pages

Early Prediction of the Cost of HPC Application Execution in the Cloud

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views8 pages

Early Prediction of the Cost of HPC Application Execution in the Cloud

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

Early Prediction of the Cost of HPC Application


Execution in the Cloud

Massimiliano Rak Mauro Turtur, Umberto Villano


Department of Industrial and Information Engineering Department of Engineering
Second University of Naples, Aversa, Italia University of Sannio, Benevento, Italia
[email protected] [email protected], [email protected]

Abstract—Even if clouds are not fit for high-end HPC ap- cost for leasing a small set of virtual cores can be very low,
plications, they could be profitably used to bring the power of especially if there are relaxed time constraints for obtaining
economic and scalable parallel computing to the masses. But this the results. A wise choice among provider offerings often
requires both simple development environments, able to exploit allows to acquire the computing resources needed at very
cloud scalability, and the capability to easily predict the cost of low cost (see for example the EC2 Spot Instances offer [2]).
HPC application runs.
This enables any organization to run parallel code whenever
This paper presents a framework built on the top of a cloud- needed, at a low cost, without investing the capital in rapidly
aware programming platform (mOSAIC) for the development of obsolescing parallel hardware. The second important issue is
bag-of-tasks scientific applications. The framework integrates a cloud elasticity, which allows to scale in/out the number of
cloud-based simulation environment able to predict the behavior virtual cores on-the-fly (i.e., while the application is running),
of the developed applications. Simulations enable the developer based on the particular job requirements, paying just for the
to predict at an early development stage performance and cloud
resources actually used. In other words, cloud computing is
resource usage, and so the infrastructure lease cost on a public
cloud. also a great opportunity for everyone to experiment and to
exploit parallel computing at low cost, using a comfortable
The paper sketches the framework organization and discusses pay-as-you-go model.
the approach followed for application development. Moreover,
some validation tests of prediction results are presented.
We think that the final step to make clouds fully advanta-
geous for sporadic scientific users is providing simple tools to
predict the performance behavior of their application, allowing
I. I NTRODUCTION them to make a tradeoff between performance and leasing
costs. In a previous paper [3] we proposed the use of a cloud-
At least in theory, clouds could be profitably used to enabled programming platform. This platform makes it possi-
bring the power of economic and scalable parallel computing ble to develop cloud applications on the top of a cloud-aware
to the masses. The main obstacles to this process are the programming framework (mOSAIC [4], [5]) by exploiting the
substantial differences between the “traditional” and the cloud- bag-of-tasks programming paradigm. The bag-of-tasks (BOT)
based paradigm, and the lack of adequate development tools paradigm, also known as master-worker, processor farm, . . . , is
to support the porting of legacy application to the cloud. widely understood, and ubiquitous in small and medium-scale
scientific computing.
Furthermore, users/developers of scientific codes are not
prone to tolerate the moderate performance losses due to the
systematic use of virtualization and, above all, to the use in Moreover, in the past we worked on the performance
cloud data centers of networks designed mainly for scalability, prediction of cloud applications developed on the top of the
and not for performance [1]. The high variance of response mOSAIC framework [6]. Already-available tools enable us to
times due to multitenancy and to loads hidden from the user predict the performance of a cloud application without running
view and control, along with always-possible transient failures it on (payed) cloud resources. In this paper, we discuss the
of the cloud infrastructure, do the rest. As a matter of fact, enrichment of the bag-of-task framework with performance
cloud computing is inherently unfit for high-end scientific prediction capabilities, allowing the automatic generation of
applications, which in the near future are likely to be still the application simulation models.
executed in purposely-designed and dedicated HPC systems.
However, there is a wide range of applications widely The remainder of this paper is structured as follows. In
used in science, engineering and for commercial purposes the next section we will examine related work. Section III
that have highly variable response times, are moderately CPU illustrates the rationale and the architecture of the framework
intensive, are not immediately suitable for GPU computing we have implemented for the development of bag-of-tasks
and are made up of loosely coupled tasks, so that computation applications in the cloud. Section IV presents our approach
easily dwarfs communication times. We think that this class of to early performance prediction and Section V shows some of
“para-scientific” applications is an almost ideal candidate for our validation tests. The paper closes with our conclusions and
execution on the cloud. The major advantage is economic: the plans for future research.

978-1-4799-8448-0/15 $31.00 © 2015 IEEE 407


409
DOI 10.1109/SYNASC.2014.61
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
II. R ELATED W ORK care about communication/synchronization details, but
only take into account how data are organized and
A. HPC in the Cloud elaborated.
For the reasons mentioned in the introduction, the literature • Deployment: the developer/user should be able to
on the use of clouds to execute scientific applications is not too start the application over the cloud, choosing the
wide. The potential of clouds for scientific computing linked amount of resources to be used and possibly scaling
to economicity and to on-demand provision is discussed in dynamically them up/down at run time. Fault tolerance
[7]. The performance disadvantages of clouds for scientific is guaranteed by the development framework, and is
computing workloads are presented in [8]. completely hidden at code level.
In [9], the applicability of cloud platforms, and in particular • Execution: the developer/user submits multiple job
of Microsoft Azure, to scientific computing is studied by im- to the application, which performs always the same
plementing a well-known bioinformatics algorithm (BLAST). actions over different data.
An implementation of BOT similar to the one presented in this
paper, although with a few significant differences, is presented Our BOT development framework implements the simple
in [10]. A Java framework for the development of fault-tolerant and common split-work-merge solution pattern. A problem to
applications is proposed in [11]. be solved is split in sub-problems (tasks), and handed out to
task solvers (workers), whose partial results are finally merged
A few papers discuss how to exploit the intrinsic elas-
(see Figure 1). The resulting workflow could be applied to
ticity of clouds, i.e., the ability to increase or decrease the
map-reduce [24] as well. In fact, both patterns involve a split-
amount of computing resources used for application execution.
work-merge sequence. The difference is in the timing, as the
In [12], the Authors present Cloudine, a platform for the
workers in a bag-of-tasks are not constrained to proceed in
development of generic scientific applications able to exploit
lock-step, and can work on sub-jobs asynchronously among
at best cloud elasticity. The paper [13] tackles the problem
them. Bag-of-tasks applications can be developed by extend-
of adding elasticity to existing MPI codes. This is obtained
ing the above described components with problem specific-
by terminating the execution and restarting the program on
algorithms. The details of the development framework are
a different amount of resources, scaling up/down the number
presented in [3].
of computing nodes used. The execution of MPI codes over
a cloud-aware communication library is discussed in [14], The BOT development framework provides all the needed
where CMPI, a novel MPI library based on the cloud-oriented components (splitter, merger, worker and orchestrator) and
optimization proposed in [15], is presented. an API that can be used to integrate the user-supplied ap-
plication code. In fact, all the supplied components are mO-
B. Simulation and Performance Prediction SAIC components. mOSAIC is a cloudware that builds up a
Platform-as-a-Service on the top of computing resources leased
The core of our proposal is the use of a simulation- in Infrastructure-as-a-Service mode from a single or even
based approach for application performance prediction. The multiple cloud providers [4]. The mOSAIC platform offers
use of simulation in the HPC context is widely discussed in an easy way to package and to deploy automatically in the
a number of papers, as [16], [17]. Recent efforts in this field cloud software components. Through the platform interface it
are documented in [18] and [19]. is possible to deploy multiple instances of the same component
For Component-Based Software Systems (CBSS), most and to restart them, in the case of a failure.
approaches available in the literature focus on integrating mOSAIC offers a set of already-developed basic compo-
performance prediction and evaluation techniques at design nents, which can be easily extended by the developer (as
time, i.e., when the system implementation is not available. An we have done in the case of the BOT framework). Among
exception, which has some similarities with our approach, is the standard components, Queues and KVstores play the most
the COMPAS system [20]. The paper [21] presents a complete important roles.
survey of performance strategies for CBSS.
The mOSAIC Queue component is a customized version
CloudCMP [22] is one of few examples of simulation- of the RabbitMQ queue server [25]. It is a software component
based performance predictions of cloud applications, where that offers an API to create messages queues, which can used
the goal is to compare different cloud providers and to select by the applications to communicate each other. After that a
a suitable one. Another notable work in this area is CloudSim queue has been created, all connected applications can send a
[23], which targets the simulation of the entire stack of message through it. All applications registered as consumers
software/hardware components in a cloud infrastructure. will be able to receive the message.
The KVstore component is based on Riak [26]. It offers
III. T HE F RAMEWORK FOR S CIENCE A PPLICATIONS a persistent NoSQL storage service to cloud applications.
The solution we proposed in [3] for BOT cloud application Computation and communication components can store data
development requires several assumptions that span the various in the shared KVstore components, and retrieve them by a
steps of the scientific application life-cycle: key. For example, the HTTPgw can use the KVstore to store
HTTP messages that have a large body; “computing” mOSAIC
• Development: the developer of the application pro- components (cloudlets) can store in a KVstore the results of
vides only the basic sequential code blocks im- their elaboration, in order to make them be accessible to the
plementing the chosen algorithm. He should not external interface.

410
408

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Architecture of the BOT framework

IV. P ERFORMANCE PREDICTION OF BAG - OF - TASKS the cloud application execution (HTTPgw, Orchestrator),
APPLICATIONS communication and storage (Queue and KVstore).
In the previous section we have described our bag-of-tasks The management components of the bag-of-task framework
framework for the development of scientific applications in the (HTTPgw and Orchestrator) have the role of managing
cloud. A key point is that such applications are fully defined the cloud application execution, offering an interface to end
by the number of instances of the mOSAIC components users (HTTPgw) and orchestrating the execution, forwarding
mentioned above and by their interconnections. Moving from the messages to the right splitter when multiple bag-of-tasks
these premises, we have devised a performance model of the are executed on the same resources and starting/stopping the
application that can be used to predict its behavior and to tune triple Splitter-Worker-Merger.
its performance. Our simulation model is driven by a workload
Our performance model is process-based [27], [28], [29], generator, which is in charge to generate the sequence of
in that it is described through a set of discrete-event simulation requests the users issue in time. At the start of simulation,
components whose temporal behavior is described as a process. it begins sending out the messages to the Orchestrator
Event management and discrete-event actions/reactions are according to the chosen workload, described in a configuration
modeled in terms of process synchronization primitives. The file. Currently two simple workload models are available: a set
simulated components have been developed by exploiting the of requests with fixed inter-arrival time, and a Poisson arrival
JADES simulation library [29], which allows the description model that generates messages with random exponential inter-
of process-oriented simulations in Java. arrival time. Moreover, it is also possible to start a multiple
number of concurrent workloads miming the load generated
A noteworthy feature of the solution devised is that the by multiple users.
simulation models expressed through the JADES library can be
The Orchestrator model is fairly straightforward: it
easily evaluated through the mJADES platform [30]. mJADES
continuously receives Jobs from a queue (that mimics a
is a recently-developed system that supports the distribution of
HTTP channel) and forwards them to the Splitter. At the
multiple JADES simulations on cloud resources. The mJADES
state of the art, our model is very simple, since it is based
simulation system is based on a Java-based modular architec-
on the assumption of a fixed application configuration, which
ture. The mJADES simulation manager produces simulation
cannot be dynamically altered (i.e., we cannot start a new
tasks from simulation jobs, and schedules them to be executed
set of Split-Work-Merge components during the simulation).
concurrently on multiple instances of the simulation core. This
So the Orchestrator has just the role of routing the
is a process-oriented discrete-event simulation engine based on
message to the Splitter instances. We aim at improving
the JADES simulation library. The outputs from the runs are
the Orchestrator model in the future.
handed on to a simulation analyzer, whose task is to compute
aggregates and to generate reports for the final user. mJADES The Splitter, Merger and Worker models behave in
has been developed as a collection of mOSAIC components, a similar way, receiving and forwarding the messages from/to
and so the evaluation of the models can be performed on the internal queues accordingly to the described bag-of-tasks
any mOSAIC platform, whether it is the one used by the pattern. The Merger process, after collecting all the interme-
application under study, or a different one. diate job Result messages, sends a job Result message
to a special simulator component, the report generator,
which gathers the results and produces the final simulation
A. Bag-of-tasks Simulation Model reports.
To model a bag-of tasks application, we developed a The computational resources consumed by our core com-
simulation component for each of the core components mak- ponents (i.e., Orchestrator, Splitter, Merger and
ing up the bag-of-task framework (Splitter, Worker Worker processes) are taken into account by means of a
and Merger), and for each component devoted to manage component simulating CPU resource sharing. At simulation

411
409

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
Listing 1. The Queue Simulation Process
public class Queue extends it.unisannio.ing.perflab.jades.core.Process {

private Mailbox inputMailbox;

private Mailbox outputMailbox;

public Queue(String name, double beta0, double beta1, double beta2) {


super(name);
inputMailbox = new Mailbox(name + "inputMailbox");
outputMailbox = new Mailbox(name + "outputMailbox");
}

public void send(Object m) {


inputMailbox.send(m);
}

public void run() {


while (true) {
int msgSize = (Integer) inputMailbox.receive();
int msgInQueue = inputMailbox.msg_in_queue()+1;
hold(beta2 * msgSize + beta1 * msgInQueue + beta0);
outputMailbox.send(msgSize);
}
}

public Object receive() {


return outputMailbox.receive();
}

start-up it is necessary to provide the actual number of avail- • send job frequence (SEND FREQ): job send rate;
able virtual CPUs (vCPUs) and the allocation to vCPUs of the
framework components involved in a run. • tasks (TASKS): number of tasks generated by the
Splitter;
The response time of the queues (i.e., the time needed to
notify a message to a process, once it has been published on • worker overhead (WORK TIME): estimate of the
the queue) is modeled as a function of the number of queued vCPU time required by a Worker to process a Task;
messages and of the dimension of the messages (according to
• allocation map: allocation matrix describing the al-
the iLDS model [31]). Listing 1 shows the code of the process
location of the framework processes to the available
simulating the behavior of the communication queues.
vCPUs.
Figure 2 sketches the proposed simulation model. It should
be noted how close it resembles the structure of the bag-of- The framework tuning parameters are used to model a
tasks application. specific framework instance, taking into account the overhead
introduced by the framework itself, by the platform and by
As previously pointed out, even if the framework actual any underlying software layer. After they have been estimated
behavior strictly depends on the specific algorithm to be for a framework instance, they will not vary between different
implemented, in any case the core bag-of-tasks components simulation runs. In the following section we will describe a
receive messages from queues and forward them to other methodology for the evaluation of such values. The parameters
queues, consuming a suitable amount of CPU time. Starting are:
from this consideration, we can fully describe a bag-of-tasks
instance by means of two sets of parameters. • HTTP overhead (HTTP OH): the communication
delay introduced by a HTTP communication channel;
The application instance parameters represent the values
under the control of the framework user. They describe the • Orchestrator overhead (ORCH OH): the orchestra-
application and the BOT framework configuration to be sim- tor overhead (vCPU time) introduced to process each
ulated, as follows: submitted job and to forward the descriptor to the
Splitter;
• virtual CPUs (vCPU N): vCPUs available to the
mOSAIC platform; • Splitter overhead (SPLIT OH): overhead introduced
(vCPU time) to execute a single split operation, to
• vCPU time slice (vCPU SLICE): vCPU scheduler
create and to forward a Task;
preemption time;
• Merger overhead (MERGE OH): overhead (vCPU
• workers (WORKERS): number of Worker instances;
time) introduced to execute a single merge operation;
• splitters (SPLITTERS): number of Splitter instances;
• Job Descriptor message (JOB MSG SIZE): size of
• jobs (JOBS): number of jobs generated and submitted; the Job Descriptor messages;

412
410

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
• Result message (RESULT MSG SIZE): size of the
Result messages;
• Task message (TASK MSG SIZE): size of the Task
messages;
• HTTP channel: beta0, beta1, beta2 parameters (see
Listing 1) to model the HTTP channel;
• bag-of-tasks queues: beta0, beta1, beta2 parameters
to model the framework internal queues.

B. Tuning of Algorithm-dependent Components


The above presented simulation model is completely in-
dependent of the algorithm implemented in the bag-of-tasks
application. The specific characterics of the application affect
the model only as far as the number and dimension of messages
the Splitter generates, the Worker(s) elaborates and
the Merger joins together are concerned. Moreover, they Fig. 3. The Queue Benchmark mOSAIC Application
affect the amount of CPU time required by each of the above
components to perform its own actions.
to identify the role of the core bag-of-tasks compo-
The first set of information (number and dimension of nents. This is not a complex task, because the BOT
messages generated) is usually dependent only on the problem paradigm is usually close to the behavior of most
data dimension, and so is known beforehand, when the simu- common applications. In this phase the developer
lation takes place. On the other hand, the CPU time needed for starts writing and rethinking its own code.
each elaboration needs to be estimated. Moreover, the times 2) Prepare the set of data to be compared: con-
spent in the other bag-of-tasks framework components (queues, currently with algorithm development, the developer
storages, orchestrator) need to be evaluated taking into account identifies the way in which the application will be
the type and number of resources leased from the cloud that used in the future, i.e., how many requests will
will be used to execute the application. be served and how the work will be split among
workers. This activity can be conducted concurrently
In order to estimate the time behavior of the above men-
with development, so that it is possible to adapt
tioned components we use an approach based on the execution
splitting and merging algorithms in order to reduce
of ad-hoc benchmarks, which run using the benchmarking
the execution costs in future, on the basis of the
framework integrated in mOSAIC API [32]. The methodology
simulation predictions.
used is thoroughly dealt with in paper [6]. Following this
3) Execute the benchmarks and collect the results:
approach, we develop dedicated benchmark application for
the benchmark applications are launched in the target
each specific component of the application; the mOSAIC
mOSAIC cloud, obtaining performance figures for
framework automates the process of their execution. Bench-
every component on the actual execution platform. It
marks for the basic components, namely queue Servers and
should be noted that, if the application is not fully de-
KV stores, are offered by the mOSAIC framework out-of-the-
veloped, the developer will only run the benchmarks
box. The corresponding benchmarking micro-applications are
for the core components. The CPU time requirements
organized as shown in Figure 3. We have developed similar
of not fully-developed code can be estimated by
benchmarking applications for the core components of our
means of static software analysis. Of course, this is
bag-of-tasks framework (Splitter, Worker and Merger,
likely to affect adversely the prediction accuracy.
Orchestrator). The execution of such applications, auto-
4) Obtain a performance prediction by executing
mated by the framework, makes it possible to collect response
the simulation model: the simulation model can be
times in a single csv file, from which, using regression models,
executed with multiple synthetic workloads, using the
it is possible to find the value of the timing parameters to be
parameters estimated as described in the previous
used for simulation.
steps, obtaining performance predictions for different
scenarios of interest.
C. Early Prediction: Bag-of-tasks Development Methodology
Program performance simulation is an interesting matter, Following such approach, the developer is able to predict
but it becomes a fundamental technique if it can be adopted at the performance and resource usage (and related costs) from
the very early development stages, before the complete devel- the early development stages of his cloud application.
opment and deployment of the actual application. We believe
that performance prediction can be integrated in the BOT V. R ESULTS
application life cycle, adopting a development methodology
made out of the following steps: The goal of this paper is not to provide ultimate perfor-
mance measurements for real-world applications. Our objective
1) Identify the Split/Work/Merge Algorithms: the is just to evaluate the feasibility of the proposed approach,
developer has to rethink of its algorithm in order i.e., the development of scientific code on the top of a cloud-

413
411

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Bag-Of-Tasks simulation model

Parameter Name Value


aware programming framework, exploiting early performance HTTP OH 50 ms
prediction techniques for making cost/performance trade-offs. ORCH OH 10 ms
SPLIT OH 5 ms
As outlined in the previous sections, the framework we MERGE OH 5 ms
propose is composed of (i) the cloud BOT development HTTP CHANNEL beta0 1500
HTTP CHANNEL beta1 0.500
framework, which enables the developer to run his applications HTTP CHANNEL beta2 0.010
on cloud environment, (ii) the simulation framework, which BOT QUEUES beta0 65
BOT QUEUES beta1 0.003
enables the developer to predict application performance even BOT QUEUES beta2 0.001
in the early development stages and (iii) the benchmark TABLE II. T ESTBED TIMING PARAMETERS
applications, which are used to evaluate the timing parameters
for simulation.
In order to validate the approach, we run benchmark load, expressed in MFLOPS and obtained as a parameter
application on the target VMs to gather the timing param- from the task description. The Splitter and Merger do not
eters for the simulation environment, using a minimal set actually perform splitting/merging, but just activate Workers
of resources. Then we compared a real application with its and collect response times, respectively. We submitted the
simulation, varying both the workload and the amount of same workload both to the skeletal application running in the
resources assigned to the application run. real execution environment and to the simulator, comparing the
measured and predicted completion times. For our tests, we
A. Measurement of Timing Parameters used the workloads briefly described in Table III, which are
Our BOT framework makes it possible to obtain applica- representative of light (WORK TIME=15 ms) and medium-
tions that are completely independent of the real environment heavy (WORK TIME=200 ms) load for the workers. To vary
on which it will run. The number and the characteristics of dynamically the number of tasks, we used a pseudo-random
cloud resources leased for execution only affect performance. uniform distribution.
We capture the characteristics of the actual execution environ- parameter value
ment through a set of benchmarks, whose results produce the vCPU 2
timing parameters successively used for simulation. JOBS 30
SEND FREQ 1s
For our tests, we have deployed the mOSAIC platform on TASKS uniform(100,400)
WORK TIME 15 ms; 200 ms
Virtual Machines (VM) leased from the Amazon Web Services WORKERS 4
(AWS) infrastructure. The characteristics of the acquired VMs SPLITTERS 2
TASK MSG SIZE 400 B
are shown in Table I. Running our benchmark suite, we JOB MSG SIZE 20 B
obtained the values in Table II. RESULT MSG SIZE 20 B
TABLE III. W ORKLOAD FOR T EST 1 AND 2
Parameter Value
instance type c1.medium
vCPU 2
RAM 1.7 GB Figure 4 shows on the x-axis the job number (30 jobs
storage 350 GB
network performance moderate
are submitted, according to Table III) and on the y-axis its
TABLE I. AWS VM INSTANCE DETAILS completion time. The completion time associated to job #30 is
the total completion time for the whole burst of 30 jobs. The
simulation results are summarized in Table IV, where is also
B. Simulation Validation Varying the Workload reported the estimated vCPU usage.
We have developed a skeletal BOT application, where the Summarizing the results shown in Figure 4, for Test 1
work method in the Worker class is able to process a given (WORK TIME=15 ms) we measured a completion time of

414
412

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Comparison between skeletal application and simulated completion times

Metric Test 1 Test 2


Completion Time 551 1134 The test outcome matches the results proposed in [3], where
Sent task 7354 7354 we evaluated the framework overhead.
vCPU1 usage 0.29% 0.99%
vCPU2 usage 0.04% 0.99% parameter value
TABLE IV. T EST SIMULATION RESULTS vCPU 4
JOBS 10
SEND FREQ 1s
TASKS 30
512 ms versus a predicted one of 551 ms, with a relative error WORK TIME 100 ms
WORKERS 1 to 5
of 7.61% (considering also intermediate jobs completion times, SPLITTERS 1
the error ranges from a minimum of 0.77% to a maximum of TASK MSG SIZE 20 B
JOB MSG SIZE 20 B
28.57%). For Test 2 (WORK TIME=200 ms) we measured a RESULT MSG SIZE 20 B
completion time of 1383 ms against a predicted of 1134 ms, TABLE V. W ORKLOAD FOR W ORKER S CALABILITY T EST
with a relative error of about 18% (considering intermediate
jobs, the error ranges from 4.37% to 18.62%).
Table VI shows the simulation results versus the real
In both cases simulation offers good predictive capacities. execution of the synthetic application. It should be noted that
Moreover, even at this stage (no actual application developed) simulation output offers the same information on scalability
it offers interesting information about resource usage. The as tests on real hardware (presented previously in [3]). This
vCPU usage in Test 1 (the one with lower WORK TIME) proves simulation capability of catching the behavior of the
is very low, as most of execution time is spent in communi- real system, which can be fruitfully exploited to avoid expen-
cations. This is a relevant hint for the developer of real code, sive executions on real resources.
who can tune its implementation so as to obtain a coarser task
granularity and hence a more efficient balance of computing Number of Workers Real App Response Time Simulated Time (s)
and communication. 1 84 94
2 44 49
3 31 35
C. Simulation Validation Varying the Resources Used 4 30 33
5 30 33
In the following we compare the performance of the TABLE VI. S CALABILITY: R EAL AND S IMULATED R ESPONSE TIMES
skeletal application executed on real resources to the one
obtained by simulation, using for the real and the simulated
run a variable number of workers. A higher number of vCPUs VI. C ONCLUSIONS AND F UTURE W ORK
is clearly necessary for these tests; leasing additional vCPUs
on AWS of the same type used in the previous test, we can The aim of the work described in this paper is to propose an
use once again the timing parameters in Table II. Only the approach that integrates the development of scientific applica-
allocation matrix in the simulator configuration file has to be tion on top of a cloud platform and its performance prediction
updated. through a dedicated simulation environment.
We submitted the workload described in Table V to the To obtain this integration, on one hand we have built a
synthetic application, varying the number of workers and com- development framework, currently specialized for the bag-of-
paring it to the real measurements on the skeletal application. task paradigm, which exploits the API and the components

415
413

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.
provided by the mOSAIC platform. On the other, we have [13] A. Raveendran, T. Bicer, and G. Agrawal, “A framework for elastic
developed a set of simulation components for JADES. These execution of existing MPI programs,” in Parallel and Distributed Pro-
cessing Workshops and Phd Forum (IPDPSW), 2011 IEEE International
simulation components correspond one-to-one to the mOSAIC Symposium on, May 2011, pp. 940–947.
components for the BOT framework. Besides presenting the
[14] Y. Gong, B. He, and J. Zhong, “An overview of cmpi: Network
approach, we have discussed the outcome of our preliminary performance aware mpi in the cloud,” in Proceedings of the 17th
performance tests used to evaluate the simulation accuracy. ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, ser. PPoPP ’12. New York, NY, USA: ACM, 2012,
In our tests, we offered some examples of usage under pp. 297–298.
different workloads and using different amount of resources, in [15] ——, “Network performance aware MPI collective communication
order to show how it will be possible for a developer to predict operations in the cloud,” Parallel and Distributed Systems, IEEE Trans-
the behavior of is application, without running it on (paid) actions on, 2013.
cloud resources, but simply using the associated simulation [16] R. Aversa, A. Mazzeo, N. Mazzocca, and U. Villano, “Heterogeneous
environment. system performance prediction and analysis using PS,” IEEE Concur-
rency, vol. 6, no. 3, pp. 20–29, Jul./Sep. 1998.
Our future research work will focus on the extensive testing [17] B. Di Martino, E. Mancini, M. Rak, R. Torella, and U. Villano, “Cluster
of the framework and simulation components, by collecting systems and simulation: from benchmarking to off-line performance
measurements on real-world scientific codes running in pri- prediction,” Concurrency and Computation: Practice and Experience,
vate and commercial cloud environments. We also plan to vol. 19, no. 11, pp. 1549–1562, 2007.
implement alternative frameworks for additional programming [18] S. Achour, M. Ammar, B. Khmili, and W. Nasri, “MPI-PERF-SIM:
Towards an automatic performance prediction tool of MPI programs
paradigms. on hierarchical clusters,” in Parallel, Distributed and Network-Based
Processing (PDP), 2011 19th Euromicro International Conference on.
R EFERENCES IEEE, 2011, pp. 207–211.
[1] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity [19] P. Clauss, M. Stillwell, S. Genaud, F. Suter, H. Casanova, and M. Quin-
data center network architecture,” SIGCOMM Comput. Commun. Rev., son, “Single node on-line simulation of MPI applications with SMPI,”
vol. 38, no. 4, pp. 63–74, Aug. 2008. in Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE
International. IEEE, 2011, pp. 664–675.
[2] [Online]. Available: https://aws.amazon.com/ec2/purchasing-
options/spot-instances/ [20] A. Mos and J. Murphy, “A framework for performance monitoring,
modelling and prediction of component oriented distributed systems,”
[3] A. De Benedictis, M. Rak, M. Turtur, and U. Villano, “Cloud-aware in Proceedings of the 3rd international workshop on Software and
development of scientific applications,” in 23rd IEEE International performance. ACM, 2002, pp. 235–236.
WETICE Conference WETICE-2014, June 2014, pp. 149–154.
[21] H. Koziolek, “Performance evaluation of component-based software
[4] D. Petcu, C. Craciun, M. Neagul, S. Panica, B. D. Martino, S. Ven- systems: A survey,” Performance Evaluation, vol. 67, no. 8, pp. 634–
ticinque, M. Rak, and R. Aversa, “Architecturing a sky computing 658, 2010.
platform,” in ServiceWave Workshops, ser. Lecture Notes in Computer
Science, M. Cezon and Y. Wolfsthal, Eds., vol. 6569. Springer, 2010, [22] A. Li, X. Yang, S. Kandula, and M. Zhang, “Cloudcmp: comparing
pp. 1–13. public cloud providers,” in Proceedings of the 10th annual conference
on Internet measurement. ACM, 2010, pp. 1–14.
[5] D. Petcu and M. Rak, “Open-source cloudware support for the portabil-
ity of applications using cloud infrastructure services,” in Cloud Com- [23] R. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya,
puting, ser. Computer Communications and Networks, Z. Mahmood, “Cloudsim: a toolkit for modeling and simulation of cloud computing
Ed. Springer London, 2013, pp. 323–341. environments and evaluation of resource provisioning algorithms,”
Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011.
[6] A. Cuomo, M. Rak, and U. Villano, “Performance prediction of
cloud applications through benchmarking and simulation,” International [24] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on
Journal of Computational Science and Engineering, vol. in the press, large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
2014. [25] A. Videla and J. J. W. Williams, RabbitMQ in action: distributed
[7] C. A. Lee, “A perspective on scientific cloud computing,” in Proceed- messaging for everyone . Rabbit MQ in action. Shelter Island NY:
ings of the 19th ACM International Symposium on High Performance Manning, 2012.
Distributed Computing, ser. HPDC ’10. New York, NY, USA: ACM, [26] [Online]. Available: http://basho.com/riak/
2010, pp. 451–459. [27] K. Perumalla and R. Fujimoto, “Efficient large-scale process-oriented
[8] A. Iosup, S. Ostermann, M. Yigitbasi, R. Prodan, T. Fahringer, and parallel simulations,” in Proc. of the 30th Winter Simulation Conference,
D. H. J. Epema, “Performance analysis of cloud computing services 1998, pp. 459–466.
for many-tasks scientific computing,” Parallel and Distributed Systems, [28] H. Schwetman, “CSIM19: a powerful tool for building system models,”
IEEE Transactions on, vol. 22, no. 6, pp. 931–945, June 2011. in Proc. of the 33nd Winter Simulation Conference. IEEE, 2001, pp.
[9] W. Lu, J. Jackson, and R. Barga, “Azureblast: A case study of 250–255.
developing science applications on the cloud,” in Proceedings of the [29] A. Cuomo, M. Rak, and U. Villano, “Process-oriented discrete-event
19th ACM International Symposium on High Performance Distributed simulation in Java with continuations:quantitative performance evalua-
Computing, ser. HPDC ’10. New York, NY, USA: ACM, 2010, pp. tion,” in Proc. of the International Conference on Simulation and Mod-
413–420. eling Methodologies, Technologies and Applications (SIMULTECH).
[10] D. Agarwal and S. Prasad, “Azurebot: A framework for bag-of-tasks SciTePress, 2012, pp. 87–96.
applications on the azure cloud platform,” in Parallel and Distributed [30] M. Rak, A. Cuomo, and U. Villano, “mjades: Concurrent simulation in
Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE the cloud.” in CISIS, L. Barolli, F. Xhafa, S. Vitabile, and M. Uehara,
27th International, May 2013, pp. 2139–2146. Eds. IEEE, 2012, pp. 853–860.
[11] E. Mocanu, V. Galtier, and N. Tapus, “Generic and fault-tolerant bag-of- [31] M. Rak, R. Aversa, B. D. Martino, and A. Sgueglia, “Web services
tasks framework based on javaspace technology,” in Systems Conference resilience evaluation using lds load dependent server models,” JCM,
(SysCon), 2012 IEEE International, March 2012, pp. 1–6. vol. 5, no. 1, pp. 39–49, 2010.
[12] G. Galante and L. Bona, “Constructing elastic scientific applications us- [32] G. Aversano, M. Rak, and U. Villano, “The mosaic benchmarking
ing elasticity primitives,” in Computational Science and Its Applications framework: Development and execution of custom cloud benchmarks.”
ICCSA 2013, ser. Lecture Notes in Computer Science, B. Murgante, Scalable Computing: Practice and Experience, vol. 14, no. 1, 2013.
S. Misra, M. Carlini, C. Torre, H.-Q. Nguyen, D. Taniar, B. Apduhan,
and O. Gervasi, Eds. Springer Berlin Heidelberg, 2013, vol. 7975, pp.
281–294.

416
414

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:02:14 UTC from IEEE Xplore. Restrictions apply.

You might also like