0% found this document useful (0 votes)
30 views

Cesar 2017

This document discusses the introduction of computational thinking, parallel programming, and performance engineering into an interdisciplinary master's program at Universitat Autònoma of Barcelona. The program was started 5 years ago and combines subjects from mathematics, physics, and computer science. Parallel programming and applied modeling and simulation are included to teach students how to analyze problems computationally and consider performance. The methodology and experience refining the program are discussed.

Uploaded by

Syaeful Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Cesar 2017

This document discusses the introduction of computational thinking, parallel programming, and performance engineering into an interdisciplinary master's program at Universitat Autònoma of Barcelona. The program was started 5 years ago and combines subjects from mathematics, physics, and computer science. Parallel programming and applied modeling and simulation are included to teach students how to analyze problems computationally and consider performance. The methodology and experience refining the program are discussed.

Uploaded by

Syaeful Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Accepted Manuscript

Introducing computational thinking, parallel programming and


performance engineering in interdisciplinary studies

Eduardo Cesar, Ana Cortés, Antonio Espinosa, Tomàs Margalef, Juan


Carlos Moure,
Anna Sikora, Remo Suppi

PII: S0743-7315(17)30005-9
DOI: http://dx.doi.org/10.1016/j.jpdc.2016.12.027
Reference: YJPDC 3600

To appear in: J. Parallel Distrib. Comput.

Received date: 13 June 2016


Revised date: 13 December 2016
Accepted date: 30 December 2016

Please cite this article as: E. Cesar, A. Cortés, A. Espinosa, T. Margalef, J.C. Moure, A.
Sikora, R. Suppi, Introducing computational thinking, parallel programming and performance
engineering in interdisciplinary studies, J. Parallel Distrib. Comput. (2017),
http://dx.doi.org/10.1016/j.jpdc.2016.12.027

This is a PDF file of an unedited manuscript that has been accepted for publication. As a
service to our customers we are providing this early version of the manuscript. The manuscript
will undergo copyediting, typesetting, and review of the resulting proof before it is published in
its final form. Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)

A Master on Modelling for science and engineering was started 5 years ago.

It is an interdisciplinary Master’s degree with teacher from mathematics, Physics and


Computer Science.

Parallel programming and Applied modelling and simulation are subjects included in the
Master’s program.

The teaching methodology applied in both topics is completely presented.


*Manuscript
Click here to view linked References

Introducing Computational Thinking, Parallel


Programming and Performance Engineering in
Interdisciplinary Studies

Eduardo Cesar, Ana Cortés, Antonio Espinosa, Tomàs Margalef∗, Juan Carlos
Moure, Anna Sikora, Remo Suppi
Computer Architecture and Operating Systems Department,
Universitat Autònoma de Barcelona. 08193 Cerdanyola del Vallès.
Spain

Abstract

Nowadays, many fields of science and engineering are evolving through the joint
contribution of complementary fields. Computer science, and especially High
Performance Computing, has become a key factor in the development of many
research fields, establishing a new paradigm called computational science. Re-
searchers and professionals from many different fields require knowledge of High
Performance Computing, including parallel programming, to develop fruitful
and efficient work in their particular field. Therefore, at Universitat Autònoma
of Barcelona (Spain), an interdisciplinary Master on ”Modeling for Science and
Engineering” was started 5 years ago to provide a thorough knowledge of the
application of modeling and simulation to graduate students in different fields
(Mathematics, Physics, Chemistry, Engineering, Geology, etc.). In this Mas-
ter’s degree, ”Parallel Programming” appears as a compulsory subject because
it is a key topic for them. The concepts learned in this subject must be applied
to real applications. Therefore, a complementary subject on ”Applied Model-

I This work has been partially supported by MINECO-Spain under contracts TIN2014-

53234-C2-1-R and TIN2014-53172-P.


∗ Corresponding author

Email addresses: [email protected] (Eduardo Cesar), [email protected] (Ana


Cortés), [email protected] (Antonio Espinosa), [email protected]
(Tomàs Margalef), [email protected] (Juan Carlos Moure), [email protected]
(Anna Sikora), [email protected] (Remo Suppi)

Preprint submitted to Journal of LATEX Templates October 6, 2016


ing and Simulation” has also been included. It is very important to show the
students how to analyze their particular problems, think about them from a
computational perspective and consider the related performance issues. So, in
this paper, the methodology and the experience in introducing computational
thinking, parallel programming and performance engineering in this interdis-
ciplinary Master’s degree are shown. This overall approach has been refined
through the Master’s life, leading to excellent academic results and improving
the industry and students appraisal of this programme.
Keywords: Parallel programming, Message Passing, Shared Memory, GPUs,
MPI, OpenMP, CUDA, Agent-Based Models, Model Simulation

1. Introduction

Many fields of science and engineering are applying techniques and recent
advances in complementary fields. In this interdisciplinary context, researchers
and professionals with greater knowledge of problem modeling and High Per-
5 formance Computing (HPC) are in high demand from companies and research
centers. Since 2011, Universitat Autònoma of Barcelona hosts a Master’s degree
in Modeling for Science and Engineering to provide these kinds of professionals
to those companies and centers.
The Master’s involves an interdisciplinary collaboration among professors
10 from various departments; mainly Physics, Mathematics, and Computer Archi-
tecture and Operating Systems. The main objective is to provide the average
science graduates with mathematical and computational tools to treat differ-
ent types of scientific and/or technological problems. It covers a large range of
problems, introducing many different approaches and tools. In particular, the
15 students are provided with the basic knowledge to be able to model a physical
system involved in some problems, to represent the model mathematically and
to solve the problem applying different methods such as differential partial equa-
tions, optimization, time series, and related methods. The students learn how to
analyze their particular problems and think about them from a computational

2
20 perspective, i.e. formulating a problem and expressing its solution(s) in a way
that a computer can effectively carry out. Finally, they have to think in parallel
solutions and learn HPC technology to evaluate and improve the performance
of models, applications and simulators. In order to include all these different
issues in the Master’s, we define three training pillars:

25 • Definition of complex systems

• Mathematical representation and resolution of these systems

• Computational thinking for parallel systems and performance engineering

In this master, advanced programming topics such as performance engineer-


ing are specifically brought to science background postgraduate students. These
30 skills are traditionally out of a science degree and are not commonly found in
master’s studies elsewhere.
Students perceive the extensive and interdisciplinary training offered by the
Master’s as a significant asset in their curricula. This is enforced by the pre-
sentations that several companies, which apply the introduced techniques and
35 methods in their everyday business processes, make to students as part of the
subjects regular activities. In this way, students can be aware of the signifi-
cant impact of such techniques and methods on the productivity of a specific
company. Moreover, most of these companies offer short internships and often
hire students from the Master’s. This fact encourages students to enroll in the
40 Master’s and is crucial in its success.
Students enroll from different degrees, such as Mathematics, Physics, Chem-
istry, Engineering or Computer Science. The studies in this Master’s have been
very successful and are attractive to students from many different countries and
with different backgrounds. The number of applications for this Master’s is
45 growing every year, and in the current 2016-17 course, several student applica-
tions were rejected since the maximum number of registered students has been
reached. Figure 1 shows the evolution in the number of enrolled students from
year 2011-12 until the current year 2016-17. The Master’s started with just 8

3
Figure 1: Evolution of the number of students and their background degrees

students, but it has been growing continuously and now it has more than 30
50 registered students. Figure 1 shows a clear indication of the high impact of the
studies, perceived from the point of view of the students enrolled every year.
Among the students, the most common backgrounds are mathematics and
physics; but other backgrounds, such as chemistry, life sciences (biology, bio-
chemistry, biotechnology), mechanical engineering or computing engineering are
55 increasing in number. This heterogeneity in the students background implies
a diverse prior knowledge and experience in programming: some of them have
some experience with FORTRAN, others have a light knowledge of C, or have
some knowledge of programming in Python or other languages. This hetero-
geneity in the students programming background introduces a crucial point in
60 the teaching objective since it is necessary to establish a common foundational
language. Then, we can teach them parallel programming and high performance
computing as resources to be applied to different fields.
The main goal is to combine three aspects:

• Although most students have some knowledge on programming, they usu-

4
65 ally do not show any significant skills on computational thinking or per-
formance engineering. So, we plan to show them a complete view, from
the problem definition, to the performance analysis and tuning.

• We want to introduce a complete view of the existing parallel program-


ming paradigms, analyzing the performance aspects involved. We do not
70 focus on a single approach, but we present all the most commonly used
approaches.

• To make it useful, it is necessary to apply the presented concepts, meth-


ods and tools to a set of real cases from different fields of science and
engineering.

75 In this context, we aim to provide good training in parallel programming


and a solid experience of efficiency analysis of the implementation of several
real applications. For that purpose, the Master’s contains, among others, two
subjects, namely, Parallel Programming and Applied Modeling and Simulation.
These two subjects provide a base for computational thinking and for developing
80 efficient solutions addressed to complex scientific problems. Moreover, the con-
tents and activities in both of them have been planned using a well-established
teaching methodology.
The outstanding academic results obtained in these subjects throughout the
Master’s life, as well as their excellent reception among students, has encouraged
85 us to share their structure, contents and successful strategies in this work.
This paper focuses on the description of the training on parallel program-
ming and applied modeling and simulation offered to the students. In order
to introduce the main differences of the proposed teaching approach to similar
existing systems, Section 2 analyzes the related work. In Section 3, we present
90 the basic teaching methodology applied to the subjects we focused on and high-
light relevant concepts on parallel programming shown to the students in order
to establish a common framework. Section 4 describes the parallel program-
ming approaches presented to the students and the different activities planned.

5
Then, Section 5 presents some examples of case study applications that are in-
95 troduced to the students along with proposals for further developments. Section
6 resumes some global academic results. Finally, Section 7 presents the main
conclusions of this teaching experience.

2. Related work and experiences

Already in the 90’s, several scientists [1][2] realized the need for training com-
100 putational scientists due to, among other reasons, the dramatic effects expected
from parallel computing development on computers’ performance and capabili-
ties. This training should be focused in providing computational skills for solv-
ing complex problems to professionals of different areas (Chemistry, Physics,
Mathematics, and also Computer Science). After 20 years, parallel computing
105 is having the expected effects and training computational scientists is still a
relevant discussion issue [3].
Parallel programming is significantly more complex than sequential one and,
consequently, teaching it is a challenge, especially in the case of students with
little computer science background. For this reason, general proposals for intro-
110 ducing parallel thinking and programming, such as [4] and [5], are still presented
and discussed in education forums. Contents presented in these proposals in-
clude lessons about the main elements related to parallel systems and parallel
programming. Moreover, there are also works, such as [6], proposing strate-
gies for simplifying the understanding of parallel computing concepts by non-
115 computer scientists.
Based on this background, and taking into consideration industry and re-
search requirements, many universities have implemented postgraduate pro-
grams for training computational scientists. Most of these programs ([7] [8]
[9][10][11][12][13]) explicitly include subjects on parallel thinking and program-
120 ming, even though, there are programs which do not include related contents
explicitly ([14][15]).
Generally, these are two-year programs that offer one single subject dedi-

6
cated to parallel programming. This topic usually introduces the main charac-
teristics of parallel architectures and heavily relies on practical exercises, since
125 general computational concepts have been introduced in other subjects. In ad-
dition, in most cases the course is focused only in one programming paradigm
(shared memory [10] or message passing [11]) or a high level language [7].
There are many similarities among these postgraduate programs and our
proposal. For example, all the proposals are based on parallel teaching founda-
130 tions and give great importance to practical training.
There are, though, some significant differences. First, it is worth men-
tioning that we are offering two subjects related to parallel programming and
computational thinking in our master program, even though it is a one-year
programme. One of the subjects (Parallel Programming) is compulsory, so
135 everybody is getting the foundations, and the other (Applied Modeling and
Simulation) is complementary, for the students interested in getting a deeper
knowledge. Second, the Parallel Programming subject covers all current parallel
programming paradigms, i.e., shared memory (OpenMP and Massive Parallel
Processors (GPUs)) and message passing (MPI). We think that this approach
140 gives students a wider view of parallel programming on currently available archi-
tectures, although, it may sacrifice some degree of detail. We are also providing
innovative content introducing recent parallel programming extensions like Cilk
Plus, and new industrial standards like OpenACC, addressed to simplify parallel
software engineering.
145 Concerning Modeling and Simulation topics, there are proposals on the sub-
ject in the above programs, but they are mostly focused on specific areas. For
example, in many of them there are courses on modeling and simulation of non-
linear systems, mathematical & numerical modeling and simulation (including
simulation on HPC architectures and commercial software) or in specific fields
150 of knowledge (e.g. Biological Systems with Differential Equations, Fluids and
Soft Matter, Combat Modeling, Simulation Modeling in Transportation Net-
works, etc.). We consider that our modeling and simulation training on high
performance computers using ABM allows students to analyze the potential of

7
High Performance Simulation on complex models that are close to their area of
155 expertise.

3. Interdisciplinary teaching methodology

Modern teaching methodologies use two dimensional models [16] to describe


the correlations between the degree of knowledge objectivity and the amount of
teaching activities done by teachers and students in the subject. These models
160 typically expose a balance between formal academic and personal student ex-
periences and provide a way to distribute prominence between the teacher and
the student. Hence, we can consider different possible orientations in teaching
methodologies: teacher-centered expository, balanced interactivity between the
teacher and the students, and, finally, student-centered discovery-based.
165 We have analyzed the most suitable methodology for our interdisciplinary
Master’s programme. To do so, we have evaluated the learning pyramid defi-
nitions [17], the knowledge competences described in the Tuning project [18],
and the recommendations given in the white book of Computer Science Higher
Education by the Spanish National Agency for Quality Assessment and Accred-
170 itation (ANECA) [19].
Moreover, we have also taken into consideration that students enrolled in
this Master’s programme have different expertise levels of computer science
knowledge. Consequently, more experienced students need methodologies that
foster student discussion and practical experimentation, while less proficient
175 students need direct teaching methods to effectively be introduced to relevant
technical topics.
In this context, both of the subjects that we present in this paper are based
on an even distribution of the expository and interactive methodologies. The
students’ learning process is shown in Figure 2, where the leading role between
180 teachers and students is balanced throughout the duration of the subject. We
define a list of stages (A to D) in each of the subjects to which we apply our
teaching methodology. In the early stages, such as A, we apply more direct,

8
Figure 2: Two-dimensional teaching methodologies

classic teacher-centered methodologies. In the last part of the subject, such as


D, the general knowledge assimilated by the student is more mature. Then,
185 we apply student-centered methodologies exploring practical activities and pro-
moting interaction between the students.
Taking into account the characteristics of the Master’s degree, we are using
the following teaching strategies and techniques:

• Lecture session: the teacher exposes the most essential topics of the sub-
190 ject to provide a broad, common knowledge background for all of the
students.

• Lab session: practical, interactive experimental activities, usually done


in groups. Work is oriented to gain experience in the usage of tools and
practically resolve theoretical concepts with the help of guided exercises.

195 • Conceptual maps: used to define strategies to create an information struc-


ture that will generate an adequate hierarchy of the concepts taught in
the subject.

• Case studies: activities based on the study of well-known, practical prob-


lems. The analysis of a given problem that is developed by the students
200 is compared with an existing solution to extract relevant conclusions.

9
• Bibliography review : a list of documents is given to the students, so that
they have a basic corpus of documents associated with the subject. The
list contains a reduced number of books and articles that the students
must know and use throughout the subject. An extra reference document
205 list is also provided allowing the students to get more insight on specific
interests.

• Experimental portfolio: a list of work tasks to be done during the practical


sessions which the students can use to auto-evaluate their own knowledge
and look for extra information from the teacher or other sources.

210 • Invited conferences: Special lectures given by invited speakers where dis-
cussion is open for relevant subject topics. The objective is to foster the
interactions between the students and the professional experts.

The rest of the paper describes the objectives, principles and detailed method-
ology used in two specific subjects of the Master’s focused on computational
215 thinking and performance engineering, namely Parallel Programming and Ap-
plied Modeling and Simulation. These subjects, which are focused on High
Performance Computing, provide the basic concepts to introduce the students
to computational thinking, solving a given problem in a parallel and efficient
way and learning to apply the principles of performance engineering to scientific
220 or industrial applications.
Taking into account the chosen teaching methodology, the Parallel Program-
ming subject starts by providing a general programming background of C lan-
guage. This is done through the use of introductory lectures and programming
labs. Then, we present the basic theoretical concepts of parallel programming by
225 combining lectures on computer architecture and a selected bibliography review.
Next, students must survey a list of relevant parallel algorithms with some gen-
eral introductory lectures and are given an experimental portfolio to analyze ma-
trix multiplication in practice. From here, students must analyze a selected list
of case studies of parallel computational patterns like map, reduce and stencil.

10
230 In this part of the subject, they have to apply the computational thinking con-
cepts to an experimental portfolio with examples like parallel prefix and convex
hull. Finally, they receive a conceptual map of programming paradigms: shared
memory, message passing and accelerator-oriented massively-parallel program-
ming. These lectures are complemented with several lab sessions where students
235 use performance analysis tools to develop a full performance engineering cycle
for example applications.
The Applied Modeling and Simulation subject adopts a similar methodolog-
ical approach. First, the students need to attend a short number of lectures for
the development of a simulation model. Then, they are provided with the expla-
240 nation of several case studies, such as emergency evacuation and meteorological
services where they have to compare their own designs with already existing
solutions. Finally, the students must apply performance engineering principles
in several lab sessions addressed to analyze the performance of the simulation
process.
245 In the next sections, we are going to provide more detailed descriptions of
the particular objectives, contents and how the planned activities are put into
practice for the two subjects, Parallel Programming and Applied Modeling and
Simulation. Finally, we provide some conclusions obtained from the implemen-
tation of these subjects over the last few years.

250 4. Parallel Programming

Parallel Programming is a core subject in this interdisciplinary Master’s.


The first challenge to tackle is to set a common practical background for the
students. The students of this Master’s typically have some programming knowl-
edge of high level languages such as Java or Python, but they usually have a
255 limited knowledge of the C programming language. Since C is at the core of
High Performance Computing, the very first part of the subject is devoted to
introducing the students to its main concepts and to provide the means for
them to work on several programming exercises. Students usually succeed in

11
this initial training, in part due to their high interest and their previous pro-
260 gramming experience. Our previous experiences have shown us that devoting
some time for setting this basic C knowledge becomes a hard requirement before
introducing shared memory or message passing programming.
Once the students have learned the C programming principles, it is necessary
to introduce them to the basic concepts of parallel programming. The first point
265 to present is the general idea of parallelism itself and how HPC computing
platforms are designed. So, a general introduction to parallel and distributed
systems, multi-core processors, memory hierarchy and accelerators, is presented
to the students. These objectives present a challenge because it is necessary to
provide the students useful, real architecture concepts while avoiding excessively
270 deep details that are complex to relate to programming issues and may become
a threat to the assimilation of the relevant knowledge. For this reason, we
provide a gentle, summarized introduction with selected further readings for
those students particularly interested in the architectural aspects.
The following point in the subject is an introduction to parallel algorithms.
275 The computational aspects of parallel algorithm design must be introduced to
the students, showing them different current paradigms and related tools. We
provide details on several parallel algorithms for different computational prob-
lems. The first problem considered is matrix multiplication, which most of
them know very well and have already programmed sequentially. We start by
280 showing them how the problem is inherently parallel. Several matrix multipli-
cation parallel algorithms are shown and analyzed considering different aspects
such as computational complexity, communication requirements, data structure
layout and size and memory requirements. These different algorithms are an-
alyzed considering the previously mentioned architectural aspects, showing the
285 implications of computing capabilities, communication network and memory
limitations.
Throughout the subject, we identify several important parallel computation
patterns [20], which are used in many examples. The map pattern is exemplified
by the vector addition algorithm (and the outer loops of matrix multiplication).

12
290 It is an appropriate pattern to introduce parallelism as it does not involve any
dependence or communication among threads. The reduce pattern is studied in
the inner loop of matrix multiplication, we use it to introduce the problem of
synchronization and the idea of re-associating arithmetic operations to increase
parallelism. The stencil pattern is used to simulate the movement of a string,
295 and requires synchronization, sharing, and communications of boundary data.
Two additional parallel computation patterns are studied by means of the
exercises proposed to the students. The parallel prefix algorithm (scan pat-
tern) and the convex hull problem (divide and conquer or recursive pattern) are
proposed so that students can analyze the problem and find out the sources of
300 potential parallelism in the algorithm. The students compare their proposals
considering aspects such as algorithm complexity, memory and communication
requirements.
Once the basic concepts of programming and parallelism have been presented
to the students, it is feasible to enter the core part of the Parallel Program-
305 ming subject. In this part, three paradigms are presented: Shared memory,
Message passing and Accelerator-oriented massively-parallel program-
ming (GPUs). The rationale for this organization is that developing programs
with a shared memory model, such as OpenMP, requires a simple modification
of a C sequential program by including just some directives. So, the students can
310 parallelize their sequential C programs in just one lab session. After OpenMP,
MPI is introduced. In this case, it is necessary to think about how to paral-
lelize the algorithm, which processes must be defined, how such processes must
communicate, and so on. This implies a greater effort from the students. The
last approach introduced is OpenACC and CUDA as programming models for
315 GPUs (accelerators), which requires a more detailed understanding of memory
hierarchy and the coordinated use of thousands of threads to reach relevant
performance gains.
The programming sessions are complemented with the introduction of per-
formance analysis tools to understand the benefits of parallel programming
320 and to detect and correct performance bottlenecks. Fundamental performance

13
engineering abstractions are introduced, like the speedup concept, Amdahl’s
and Little’s Laws, and the Roofline model.
The steps of the learning evolution shown in Figure 2 are applied in this
subject for each of the aforementioned topics. Consequently, the initial lectures
325 (one in most cases) are used by the professor to introduce the main concepts
regarding the topic and, next, students assume incrementally more and more
responsibility in the subsequent sessions associated with each topic.
The specific development of these topics is covered in the following subsec-
tions.

330 4.1. Shared memory: OpenMP

As mentioned above, once students are familiarized with C and basic con-
cepts of parallel algorithms, the most natural way to introduce parallel applica-
tions development is by using OpenMP [21].
OpenMP is a portable and flexible directive-based API for shared-memory
335 parallel programming which, for some basic code constructions, allows us to
express parallelism in an extremely simple way. Given these characteristics, it
has become the de-facto standard for multicore shared-memory architectures.
In addition, current laptops and desktop computers have multicore processors
and, consequently, students can test all the examples given in class and develop
340 new ideas on their own computers.
After a few motivating examples, such as the one shown in Listing 1, the
contents of the theoretical OpenMP lecture (2 hours) are structured as follows:

• Introduction. Shared memory model, concept of thread, shared and


local (private) variables, and need for synchronization.

345 • Fork-join model. The #pragma omp parallel clause. Introducing par-
allel regions. Data management clauses (private, shared, firstprivate,
lastprivate)

• Data parallelism: parallelizing loops. The #pragma omp for clause.

14
• Task parallelism: sections. The #pragma omp sections and #pragma
350 omp section clauses.

• OpenMP runtime environment function calls. Getting the number


of threads of a parallel region, getting the thread id, and other functions.

• Synchronization. Implicit synchronization, nowait clause. Controlling


executing threads, master, single, and barrier clauses. Controlling
355 data dependencies, atomic and reduction clauses.

• Performance considerations. Balancing thread load, schedule clause.


Eliminating barriers and critical regions.

The instructor plays a central role in this lecture and, consequently, it has
been structured following the corresponding strategy, i.e. theoretical lecture, as
360 presented in Section 3. This structure starts by describing the most general and
essential concepts (shared memory model, threads and synchronization). Next,
it introduces different parallel constructs ordered according to their conceptual
complexity: all threads doing the same work (parallel construct), all threads
executing the same code on different portions of data (parallel for construct),
365 and threads executing different tasks (parallel section construct). Then, it intro-
duces several OpenMP synchronization mechanisms, which naturally leads to a
discussion of their negative performance implications and strategies to minimize
their use.

Listing 1: OpenMP simple example: adding two vectors.

#pragma omp p a r a l l e l f o r
370 fo r ( i = 0 ; i < N; i++ )
c [ i ] = a[ i ] + b[ i ];

After this lecture, the student should assume the central role and be able
to apply the acquired theoretical knowledge to real cases. Consequently, the
concepts introduced in this lecture are reinforced in a lab session (2 hours with
375 an instructor and 6 hours of autonomous development of practical exercises),

15
where students must use OpenMP to parallelize the code for simulating the
movement of a string developed in the C labs (see Listing 2). In this way,
students continue their work and can experience the advantages of using the 4
cores available in each piece of lab equipment.

Listing 2: String simulation main computation loop.

380 fo r ( t =1; t<=T ; t++) {


f o r ( x=1; x<X; x++)
U3 [ x ] = L2∗U2 [ x ] + L∗ (U2 [ x+1]+U2 [ x −1]) − U1 [ x ] ;
double ∗TMP =U3 ;
// r o t a t e u s a g e o f v e c t o r s
385 U3=U1 ; U1=U2 ; U2=TMP;
}

Parallelizing this code with OpenMP is straightforward, as can be seen in


Listing 3. Its only complexity is that the clause firstprivate(T,U1,U2,U3)
must be used to ensure that each thread does the same vector rotation using its
390 private copies. This parallelization is specially designed to be done in a short
time, leaving students plenty of opportunities to test the code and analyze its
behavior.

Listing 3: Parallelized string simulation main computation loop.

#pragma omp p a r a l l e l f i r s t p r i v a t e (T, U1 , U2 , U3)


f o r ( t =1; t<=T ; t++) {
395 #pragma omp f o r
f o r ( x=1; x<X; x++)
U3 [ x ] = L2∗U2 [ x ] + L∗ (U2 [ x+1]+U2 [ x −1]) − U1 [ x ] ;
double ∗TMP =U3 ;
// r o t a t e u s a g e o f v e c t o r s
400 U3=U1 ; U1=U2 ; U2=TMP;
}

Several of the strategies presented in Section 3 have been applied in the de-
sign of this lab session. First, the lab session strategy has been used to train the

16
students in the use of the most common tools used in the lab: compilers (gcc),
405 monitoring tools (likwid[22], perf[23]), remote access and resource management
(SGE[24]). A detailed manual describing these tools with examples has been
elaborated with this objective. In this case, students develop this exercise in
groups of two. Second, the case study strategy has been used to work on the
problem of parallelizing the string movement simulator previously described. In
410 this case, students are provided with a very short outline of the problem, so
they must explore different approaches to the solution on their own. Third, the
conceptual maps strategy has been used to make students organize and summa-
rize the concepts learned. Students must write and deliver a report describing
their solution to the problem and the tests done on the application they have
415 developed. Finally, the experimental portfolio strategy is used in order to follow
the student’s evolution through the set of exercises developed in the lab.

4.2. Message passing: MPI

After explaining parallelism at a multi-core level using shared memory, the


next step is to introduce cluster parallelism (distributed memory) using message
420 passing. With this objective, the subject includes two lectures (4 hours) on
Message Passing Interface (MPI) [25] and two lab sessions (4 hours with an
instructor and 12 hours of autonomous development of practical exercises).
MPI is by far the most used interface for developing distributed memory
parallel programs, mainly because many libraries have been implemented based
425 on the MPI consortium specification (OpenMPI, MPICH, Intel MPI, etc.). MPI
includes plenty of features, but this subject focuses on presenting the basic
MPI program structure and the functions for point-to-point as well as collective
communication.
The contents of the MPI lectures are structured as follows:

430 • Message passing paradigm. Distributed memory parallel computing,


the need for a mechanism for interchanging information. Introducing MPI
history.

17
• MPI program structure. Initializing and finalizing the environment
MPI Init and MPI Finalize. Communicator’s definition (MPI COMM WORLD),
435 getting the number of processes in the application (MPI Comm size) and
the process rank (MPI Comm rank). General structure of an MPI call.

• Point-to-point communication. Sending (MPI Send) and receiving


messages (MPI Recv). Sending modes: standard, synchronous, buffered
and ready send.

440 • Blocking and non-blocking communications. Waiting for an opera-


tion completion (MPI Wait and MPI Test).

• Collective communication. Barrier, broadcast, scatter, gather and


reduce operations.

• Performance considerations. Overlapping communication and com-


445 putation. Measuring time (MPI Time). Discussion on the communication
overhead. Load balancing.

The instructor plays a central role in these lectures and, consequently, they
have been structured following the corresponding strategy, i.e. theoretical lec-
ture, as presented in Section 3. This structure starts by describing the most
450 general and essential concepts (distributed memory model, processes and mes-
sage passing). Next, it introduces the basic MPI concepts: program structure,
communicators, process identifier and MPI function naming convention. Then,
it introduces different types of communication ordered according to their con-
ceptual complexity: point-to-point blocking communication, point-to-point non-
455 blocking communication, and collective communication. This naturally leads to
a discussion of the impact of each type of communication on the application
performance and programming complexity. This discussion on MPI applica-
tions performance is also used to present the load balancing problem and some
strategies to overcome it.
460 Students work around these concepts in the lab sessions by developing a
simple program for computing π approximation using the dartboard approach

18
[26]. This approach simulates throwing darts at a dartboard on a square backing.
As each dart is thrown randomly, the ratio of darts hitting the board to those
landing on the square is equal to the ratio between the two areas , which is π/4.
465 A parallel implementation of this algorithm consists of a certain number of
processes throwing a fixed number of darts and calculating their own approxi-
mation of π, then one of the processes (the master) receives all approximations
and calculates the average value. In this solution, workers send their results to
the master (process with rank 0) using point-to-point communication.
470 A second approach consists of distributing the total number of throws among
all the processes, and each of them will calculate its own number of hits (darts
in the circle) and send it to the master process, which will compute the π
approximation. In this case, the master sends the number of throws that must
be done by each process and receives the number of hits, always using collective
475 communication functions.
As in the case of OpenMP, several of the strategies presented in Section 3
have been applied in the design of these lab sessions. First, the lab session
strategy has been used to train the students in the use of MPI tools: mpicc,
mpirun and mpe[27]. Also in this case, students develop this exercise in groups
480 of two. Second, the case study strategy has been used to work on the problem
of π computation; again, students are provided with a very short outline of
the problem, so they must explore different approaches to the solution on their
own. Third, the conceptual maps strategy has been used because students must
write and deliver a report describing their solutions to the problem and the
485 tests done on the applications they have developed. Finally, the experimental
portfolio strategy is used in order to follow the students’ evolution through the
set of exercises developed in the lab.

4.3. GPUs: CUDA and OpenACC

After introducing OpenMP and MPI programming models, our objective is


490 to teach students the principles of effectively using computational accelerators
like GPUs. As we did previously in the case of multi-core systems, we start by

19
presenting the OpenACC toolkit to the students, then providing them with a
deeper view of accelerators with CUDA.
OpenACC [28] is an open specification for compiler directives for parallel
495 programming. With the use of high level directives, similar to OpenMP, appli-
cations can be accelerated without losing portability across processor architec-
tures.
CUDA is an extension for massively parallel programming of GPUs (or ac-
celerators). We choose CUDA instead of OpenCL because of the existence of
500 efficient and mature compiling, debugging and profiling tools, and because of
the extensive information available. The contents of the lectures are structured
as follows:

• Introduction. Hierarchy of threads: warp, CTA (Cooperating Thread


Array) and grid. 3-dimensional thread identifiers.

505 • Model of an accelerator: host and device. Moving data between


host and device. Allocating memory on the device and synchronizing the
execution.

• Architectural restrictions. Warp size. Maximum CTA and grid di-


mensions.

510 • Memory space. Global, local and shared memory.

• Synchronization. Warp-level and CTA-level synchronization.

• Performance considerations. Excess of threads to tolerate the la-


tencies of data dependencies. Increasing work per thread to improve
instruction-level parallelism.

515 The lecture uses vector addition as an example to introduce the OpenACC
and CUDA syntax. Four implementations are provided and evaluated using: (a)
one single thread, (b) one CTA, (c) a grid of CTAs where each thread performs a
single addition, and (d) a grid of CTAs with more work per thread. We show the
performance results (deceiving, for the first implementations) to motivate the

20
520 different solutions and the need for developing good performance engineering
skills.
We also present Thrust [29], a high-level parallel algorithm library written in
C++, to show the students the benefit of learning object-oriented programming
and software engineering concepts. However, due to the limited background of
525 our students and obvious time limitations, it is out of the scope of our subject
to provide further information on Thrust usage.
Students must use OpenACC and CUDA in the lab sessions to parallelize
the code that simulates the movement of a string. They explore, step by step,
the different obstacles they must face to exploit the full potential of GPUs and
530 increase performance ≈10x with respect to the multicore CPU code.
The methodological strategies used in this part of the subject are very sim-
ilar to those on the previous parts. First, lecture sessions provide a general
introduction to the concepts defined above. Then, lab sessions train the stu-
dents in the use of Nvidia software development tools. Students receive the
535 vector addition case study and must provide incremental solutions that rely on
performance-oriented design decisions.

4.4. Performance analysis: tools

It is not just important to be able to develop applications using the different


approaches taught throughout the subject. In general, parallel programming’s
540 main goal is to improve application performance and, consequently, performance
analysis should be introduced to students.
During the subject labs, students use basic tools, such as Intel VTune, nvprof
and nvvp [30], perf Linux command [23], jumpshot [31] and likwid [22] to visu-
alize and analyze the behavior of their applications. These tools are enough for
545 the simple applications developed and the small cluster used in this subject.
However, our students will likely participate in the development of real par-
allel applications during their professional life. Consequently, a lecture (2 hours)
is used to describe the performance analysis cycle shown in Figure 3 and intro-
duce the main tools currently available for supporting each of these steps.

21
Figure 3: Performance analysis in the application development cycle.

550 The instructor plays a central role in this lecture and, consequently, it has
been structured following the theoretical lecture strategy presented in Section
3. In this case, the contents are naturally guided by the performance analysis
cycle presented in Figure 3. Consequently, measurement and monitoring con-
cepts and tools are presented first. For example, Performance API (PAPI) [32]
555 and Dyninst [33] are mentioned as supporting tools for getting execution mea-
surements. Then, performance analysis approaches and tools are discussed. For
example, Tuning and Analysis Utilities (TAU) [34], Scalasca [35] and Paraver
[36] are presented as analysis and visualization tools. Finally, automatic/dy-
namic tuning concepts and tools are introduced. In this case, Periscope Tuning
560 Framework (PTF) [37], MATE [38] and Elastic [39] are presented as automatic
analysis and tuning tools.

22
4.5. Learning outcomes

Our objective in this subject is that, taking into consideration that the
students of this Master’s program come from different fields, the student achieves
565 the following learning outcomes:

• Analyze, synthesize, organize and plan projects with content related to


parallel programming in the student’s field of study. We expect that our
students will be able to integrate the knowledge acquired in this subject
into their current and future projects.

570 • Apply specific methodologies, techniques and resources to conduct research


and produce innovative results in the area of specialization. Using paral-
lelism could allow our students to obtain better results (better precision,
more results, faster results), which could help to achieve new innovative
objectives in their specific projects.

575 • Continue the learning process, to a large extent autonomously. In order


to achieve this outcome, students have been ”forced” in this subject to
tackle practical problems autonomously, which includes thinking about
the problem’s solution, but also consulting bibliographical sources looking
for the proper functionality or tool to implement it.

580 • Identify sources of parallelism in a computational problem. The theoreti-


cal foundations of the process for designing a parallel solution to a given
problem are presented at the beginning of this subject. Then, these foun-
dations are taken into consideration when discussing each subject topic
and subsequently applied in the lab sessions.

585 • Design and develop the parallel solutions to a computational problem tak-
ing the characteristics of the available hardware into account. To achieve
this outcome, the subject includes techniques and tools to implement par-
allel applications on multi-core, cluster and accelerator architectures.

23
• Interpret information from performance-analysis tools and be able to con-
590 sider application-specific design decisions to improve performance. Stu-
dents use appropriate tools to analyze the performance of an application
and are asked to include the results and impact of their analysis in the lab
reports.

Finally, it is worth mentioning that the Computer Architecture and Oper-


595 ating Systems department of Universitat Autònoma of Barcelona has received
support from computation industry leaders for the design and development of
computation labs. We have been appointed by Intel as an Academic Partner
with the use of the Intel Parallel Studio as one of the programming environ-
ments for the practical laboratories and we have also been selected as a GPU
600 Teaching Center by Nvidia Corporation for introducing CUDA, OpenACC and
GPU technology into computer architecture studies.

5. Applied Modeling and Simulation

The main goal of the Applied Modeling and Simulation subject is to


introduce the students to real applications that use modeling and simulation and
605 that must apply parallel programming techniques to improve their performance.
It is highly significant to show the students how High Performance Computing
is necessary to make these real applications practical.
The main concepts for this subject are developed in two different parts, each
with a different methodology:

610 1. Case studies in collaboration with industry and research laboratories that
use modeling and simulation activities every day.
2. Simulation model development and performance analysis

In the following subsections, we present detailed descriptions of these two parts


of the subject.

24
615 5.1. Case studies
The first part, Case studies, is conducted in collaboration with industry and
research laboratories that use modeling and simulation activities every day. The
activities carried out include invited lectures from researchers that work in these
laboratories and use modeling to carry out their work.
620 The first case considered is the paradigmatic example of meteorological ser-
vices. Everybody watches the weather forecast on TV every day and can imagine
the complexity of the models involved, with huge meshes of points with hun-
dreds of variables estimated for every point, and the computing requirements
needed to provide a real prediction. However, in this particular case, it is known
625 that weather prediction models show chaotic behavior. The way to keep this
behavior as limited as possible is to execute not just a single simulation, but
a complete set of scenarios (called ensemble) and apply statistical methods to
conform the final prediction. This meteorological modeling and prediction part
is presented by members of the Servei Meteorològic de Catalunya (Meteorolog-
630 ical Service of Catalonia). Obviously, it is outside of the scope of the subject
to develop a meteorological model, but, the students can use some small spe-
cific models such as wind field models (WindNinja [40]) to analyze its execution
time, scalability and speedup. In this context, some students (one or two per
year) may enroll in an internship in this meteorological service developing code
635 for some particular model or applying parallel programming techniques to some
of the existing models.
In a similar way, a collaboration has been established with the IC3-BSC (In-
stitut Català de Ciències del Clima- Barcelona Supercomputing Center), but,
in this case, the models and predictions are related to climatological models in-
640 volving very large time scales. In this case, the real time aspect is not so critical,
since the predictions are considered for decades or even centuries. However, the
main point is to run hundreds or thousands of simulations with different pa-
rameters that make the total amount of computational requirements extremely
high. Also in this case, some students carry out an internship in this center,
645 where they have access to very large computing resources and can do studies on

25
speedup and scalability.

5.2. Simulation model development and its performance analysis

In the second part, the students develop a certain simulation model and
analyze its performance. In this case, the teaching strategy is based on three
650 well-defined parts: lecture sessions (including conceptual maps), lab sessions
and, finally, experimental porfolio. In the last two parts, students develop a
project and carry out lab sessions with teacher supervision. At the beginning,
the teacher presents the concepts on a particular modeling technique (agent-
based modeling -ABM-) and a conceptual map to show the hierarchy of the
655 concepts to be developed. These types of models (ABM) is used to model real
systems from different areas of knowledge that are close to the initial knowledge
of students. ABM can represent complex patterns of behavior through simple
rules and provide useful information about the system dynamics of the real
world. In addition, it is a kind of simulation that needs high computing power
660 when the number of individuals increases which is suitable for the objectives
pursued in the area of HPC. As case study, a model of emergency evacuation
using ABM is analyzed [41] and the students must perform some practical ex-
ercises to extend the model and analyze its performance in lab sessions. There
are different aspects for model analysis: the environment and the information
665 (doors and exit signals), policies and procedures for evacuation, and the so-
cial characteristics of individuals that affect the response during the evacuation.
Moreover, the following hypothesis are defined as a starting point for the model:

• In emergency evacuation situations, people are generally nervous or even


panicking, so they tend to act irrationally.

670 • Individuals try to move as quickly as possible (more than normal).

• Individuals try to achieve their objectives and may try to push each other
in their attempt to exit through a specific door, causing physical injury to
other individuals.

26
Figure 4: Agent Based Modeling for emergency evacuations.

Students receive a partial model that includes the management of the evac-
675 uation of an enclosed area that presents a certain building structure (walls,
access, etc.) and obstacles, with particular signaling and the corresponding safe
zones and exits. The model also includes individuals who should be evacuated
to safe areas. This model has been developed to support different parameters
such as: individuals with different ages, total number of people in the area,
680 number of exits, number of chained signals and safe areas, speed of each indi-
vidual, and probability of exchanging information with other individuals. The
model mentioned is implemented in NetLogo [42] and Figure 4 represents its
main characteristics.
The first practical work for the students requires using a single-core archi-
685 tecture in the lab to analyze the performance of the model and then incorporate
a new, not covered, policy: overcrowding in exit zones [43]. Students must then

27
complete a new performance analysis of the new model.
Considering the variability of each individual in the model, a stability anal-
ysis is required. For this, the Chebyshev Theorem (also spelled as Tchebycheff)
690 will be used with a confidence interval of 95% and α = 0.05, m = 6. The re-
sult for this analysis indicates that at least 720 simulations must be done to
obtain statistically reliable data. Taking into account the 720 executions on one
core processor, the simulation time (average) is 7.34 hours for 1000 individuals
and 27.44 hours for 1,500 individuals per scenario. In order to use this tool
695 as a Decision Support System (DSS), the students are instructed in necessary
HPC techniques and the embarrassingly parallel computing model is presented
as a method to reduce the execution time and the decision-making process time
[44][45]. Therefore, students must learn how to execute multiple parametric
Netlogo model runs in a multi-core system and how to make a performance
700 analysis to evaluate the efficiency and scalability of the method.
Finally, the instructor offers a set of tasks based on the developed model
(experimental portfolio), proposing new challenges and additional specifications
so the student can assess their knowledge. These sessions are performed au-
tonomously by the student in the laboratory, but the work is evaluated and
705 represents a percentage of the grade of the module.

5.3. Learning outcomes

The objective of this subject is that the students achieve the following learn-
ing outcomes:

• Describe the different components of a system and the interactions between


710 them. Students must analyze a complex system and identify its main
components.

• Identify the parameters that determine how a system works. Students


must determine the main parameters of the complex system and analyze
their effect on the system behavior.

28
715 • Implement appropriate numerical methods to solve models in the particular
field under consideration. The mathematical models involved in complex
systems usually must be solved by numerical methods. Students must
implement different numerical methods efficiently and integrate them into
simulation tools.

720 • Simulate the behavior of complex systems. Students must use the simula-
tion tools to study the behavior of the system.

• Validate the simulation results with the predictions of the models and the
behavior of the real system. Students must compare the results obtained
from the simulation with the real behavior of the system to validate the
725 correctness of the simulation tools.

6. Academic results

The global result is very positive and, after 5 years of teaching in this interdis-
ciplinary Master’s, we have observed that students achieve a much wider range
of knowledge, which allows them to tackle problems from disciplines different
730 from their original background. Actually, most of them get jobs in companies
where they apply the concepts of computational thinking and performance engi-
neering to improve the applications developed at those companies in areas such
as traffic simulation, geophysical simulation, water pollution simulation, spread
of disease simulation, and many others, independently of their background.
735 Currently, the number of qualified candidates exceeds the number of offers
and academic results are very high. The success rate (calculated as number
students that pass the subject / number of assessed students) and the perfor-
mance rate (calculated as number of students that pass the subject / number
of enrolled students) for both of the subjects presented in this paper are very
740 high, up to 100%. The detailed results of the whole Master’s can be seen on the
Master’s official web page [46].
Quantitative results are relevant to assess the success of the presented sub-
jects, but we consider that the feedback given by students and companies has

29
been very important for building this success. Students answer each semester
745 a survey about the studied subjects. This survey includes subjective questions
such as ”Tell us what have you liked the most from this course”, ”Do you have
any improvement suggestion?” or ”Do you consider the workload of this course
has been adequate?”. In the latest editions, students have expressed that these
subjects are extremely useful and are a great complement to other subjects
750 (specially those related to Big Data) of the Master’s programme. Most of them
have considered that the workload is adequate, and some have suggested that
it is possible to go deeper in certain contents. Overall, they have qualified the
2016 edition of these subjects with 92/100.
Finally, it is important to mention that there are many companies that
755 collaborate in both subjects. They provide certain models that the students
can utilize in the lab sessions and deliver invited talks. These conferences are
very fruitful for students and enrich their knowledge. Students can see a real
application and an actual usage of the theoretical bases taught in the subject
lectures.

760 7. Conclusions

Many fields of science and engineering are evolving through the contribution
of complementary fields. This implies that project teams in companies and
research centers have significant interdisciplinary components. It is necessary
for people from different fields to be able to establish a common ground and
765 understand the requirements and the effects of the problems and solutions for
all members of the team. In this sense, it is worth mentioning the performance
effects of application design decisions, which, many times, invalidate otherwise
valid ideas.
High Performance Computing (HPC), including parallel and distributed pro-
770 gramming, becomes a central factor that is applied to many fields from science
and engineering. So, it is necessary for students from various fields to receive
significant training in HPC. In this way, they will be able to design and develop

30
their own applications and, even more importantly, they will understand the
decisions needed to get the most from a given computational platform, such
775 as which is the most suitable programming paradigm, which are the most rel-
evant performance metrics and how to measure them. In this way, they can
establish a common language with computer scientists and work together in the
development of more powerful and successful applications.
In this interdisciplinary context, we show our experience of teaching parallel
780 programming in interdisciplinary studies at a graduate level. We present a
methodological background with the main principles and activities applied to
the subject development. We have used a balanced perspective in which teachers
first use direct methodologies to establish the background of the subject; then,
students are progressively presented with more interactive, practical activities,
785 and have to understand the problems and propose their own solutions.
The main conclusion is that the experience has been very successful and
most students enjoy developing parallel programs, analysing their behavior and
trying to improve their performance. After this experience, it would be very
interesting to introduce similar subjects at the undergraduate level, so that
790 students from different fields are able to apply High Performance Computing
techniques to their computational problems from the very beginning.

References

[1] G. C. Fox, Parallel computing and education, Daedalus 121 (1) (1992)
111–118.

795 [2] G. M. Schneider, D. Schwalbe, T. M. Halverson, Teaching computational


science in a liberal arts environment, SIGCSE Bulletin 30 (2) (1998) 57–60.

[3] L. Carter, R. Botts, C. Crockett, Computational science programs: The


background research, in: 2012 Frontiers in Education Conference Proceed-
ings, 2012, pp. 1–6.

31
800 [4] A. Marowka, Think parallel: Teaching parallel programming today, IEEE
Distributed Systems Online 9 (8) (2008) 1–8.

[5] G. Lammers, C. Brown, Work in progress - extending parallelism education


to the first year with a bottom-up approach, in: 2011 Frontiers in Education
Conference (FIE), 2011, pp. 1–2.

805 [6] H. Neeman, L. Lee, J. Mullen, G. Newman, Analogies for teaching parallel
computing to inexperienced programmers, SIGCSE Bull. 38 (4) (2006) 64–
67.

[7] Mathematical Modelling and Scientific Computing MSc, https://www.


ucc.ie/en/ckr36/, [Online; accessed 21-Sept-2016].

810 [8] Computational Science and Engineering, https://www.seas.harvard.


edu/programs/graduate/computational-science-and-engineering/,
[Online; accessed 21-Sept-2016].

[9] Modelling and Computational Science MSc, http://catalog.uoit.ca/,


[Online; accessed 21-Sept-2016].

815 [10] Master Programme in Computational Science, http://www.uu.se/


en/admissions/master/masterprogrammes/, [Online; accessed 21-Sept-
2016].

[11] Master of Engineering - Modeling and Simulation, http://catalog.odu.


edu/graduate/frankbattencollegeofengineeringandtechnology/
820 modelingsimulationvisualizationengineering/, [Online; accessed
21-Sept-2016].

[12] Interdisciplinary Program in Computational Science, Engineering & Math,


https://www.ices.utexas.edu/graduate-studies/, [Online; accessed
21-Sept-2016].

825 [13] EPFLs Master in Computational Science & Engineering, http://cse.


epfl.ch/, [Online; accessed 21-Sept-2016].

32
[14] Computational and Mathematical Engineering MS Degree, http://scpd.
stanford.edu/programs/masters-degrees, [Online; accessed 21-Sept-
2016].

830 [15] Master of Science in Analytics and Modeling, http://www.valpo.edu/


grad/compsci/, [Online; accessed 21-Sept-2016].

[16] P. Hernandez, Construyendo el constructivismo. Criterios para su funda-


mentacion y su aplicacion instruccional, Vol. 1, Paidos, 1997.

[17] J. P. Lalley, M. R. H, The learning pyramid: does it point teachers in the


835 right direction?, Education 128 (1) (2007) 64.

[18] Tuning Project. Tuning Educational Structures in Europe,


http://www.unideusto.org/tuningeu/images/stories/documents/
General_Brochure_final_version.pdf, [Online; accessed April 2016].

[19] White Book. Degree in Computer Engineering. ANECA, http:


840 //www.aneca.es/media/150388/libroblanco_jun05_informatica.pdf,
[Online; accessed April 2016].

[20] M. McCool, J. Reinders, A. Robison, Structured Parallel Programming:


Patterns for Efficient Computation, 1st Edition, Morgan Kaufmann Pub-
lishers Inc., San Francisco, CA, USA, 2012.

845 [21] OpenMP, http://openmp.org/, [Online; accessed 18-May-2015].

[22] Lightweight performance tools, https://code.google.com/p/likwid/,


[Online; accessed 18-May-2015].

[23] perf: Linux profiling, https://perf.wiki.kernel.org/index.php/Main_


Page, [Online; accessed 18-May-2015].

850 [24] Sun Grid Engine (SGE) QuickStart, http://star.mit.edu/cluster/


docs/0.92rc2/guides/sge.html, [Online; accessed 02-June-2016].

33
[25] Message Passing Interface Forum, http://www.mpi-forum.org/, [Online;
accessed 18-May-2015].

[26] Parallel Programming in C, http://gribblelab.org/CBootcamp/A2_


855 Parallel_Programming_in_C.html, [Online; accessed 20-Sept-2016].

[27] MPI Parallel Environment (MPE), http://www.mcs.anl.gov/research/


projects/perfvis/software/MPE/, [Online; accessed 02-June-2016].

[28] C. Enterprise, C. Inc, NVIDIA, T. P. Group, The openacc application


programming interface.

860 [29] N. Bell, J. Hoberock, Thrust: a productivity-oriented library for cuda,


GPU Computing Gems: Jade Edition.

[30] Cuda Visual Profiler, http://docs.nvidia.com/cuda/


profiler-users-guide/index.html#visual-profiler, [Online; ac-
cessed 18-May-2015].

865 [31] Performance Visualization, http://www.mcs.anl.gov/research/


projects/perfvis/software/viewers/, [Online; accessed 18-May-
2015].

[32] Performance API, http://icl.cs.utk.edu/papi/, [Online; accessed 18-


May-2015].

870 [33] Dyninst API, http://www.dyninst.org/, [Online; accessed 18-May-2015].

[34] S. Shende, A. D. Malony, The Tau Parallel Performance System, IJHPCA


20 (2) (2006) 287–311.

[35] F. Wolf, Scalasca, in: Encyclopedia of Parallel Computing, 2011, pp. 1775–
1785.

875 [36] Paraver, http://www.bsc.es/computer-sciences/performance-tools/


paraver, [Online; accessed 18-May-2015].

34
[37] R. Miceli, G. Civario, A. Sikora, E. César, M. Gerndt, H. Haitof, C. B.
Navarrete, S. Benkner, M. Sandrieser, L. Morin, F. Bodin, AutoTune: A
Plugin-Driven Approach to the Automatic Tuning of Parallel Applications,
880 in: Applied Parallel and Scientific Computing - 11th International Confer-
ence, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected
Papers, 2012, pp. 328–342.

[38] A. Morajko, O. Morajko, T. Margalef, E. Luque, MATE: dynamic perfor-


mance tuning environment, in: Euro-Par 2004 Parallel Processing, 10th
885 International Euro-Par Conference, Pisa, Italy, August 31-September 3,
2004, Proceedings, 2004, pp. 98–106.

[39] A. Martı́nez, A. Sikora, E. César, J. Sorribes, ELASTIC: A large scale


dynamic tuning environment, Scientific Programming 22 (4) (2014) 261–
271.

890 [40] J. Forthofer, K. Shannon, B. W. Butler, Initialization of high resolution


surface wind simulations using nws gridded data, in: Proceedings of 3rd
Fire Behavior and Fuels Conference; 25-29 October, 2010.

[41] D. Helbing, L. Buzna, A. Johansson, T. Werner, Self-organized pedestrian


crowd dynamics: Experiments, simulations, and design solutions, Trans-
895 portation Science 39 (1) (2005) 1–24. doi:10.1287/trsc.1040.0108.
URL http://dx.doi.org/10.1287/trsc.1040.0108

[42] NetLogo. Wilensky, U. (1999). Center for Connected Learning and


Computer-Based Modeling, Northwestern University, Evanston, IL.,
https://ccl.northwestern.edu/netlogo/index.shtml, [Online; ac-
900 cessed 18-May-2015].

[43] A. Gutierrez-Milla, F. Borges, R. Suppi, E. Luque, Individual-oriented


model crowd evacuations distributed simulation, in: Proceedings of the
International Conference on Computational Science, ICCS 2014, Cairns,
Queensland, Australia, 10-12 June, 2014, 2014, pp. 1600–1609. doi:

35
905 10.1016/j.procs.2014.05.145.
URL http://dx.doi.org/10.1016/j.procs.2014.05.145

[44] I. T. Foster, Designing and building parallel programs - concepts and tools
for parallel software engineering, Addison-Wesley, 1995.

[45] A. Gutierrez-Milla, F. Borges, R. Suppi, E. Luque, Crowd dynamics mod-


910 eling and collision avoidance with openmp, in: Proceedings of the 2015
Winter Simulation Conference, Huntington Beach, CA, USA, December 6-
9, 2015, 2015, pp. 3128–3129. doi:10.1109/WSC.2015.7408433.
URL http://dx.doi.org/10.1109/WSC.2015.7408433

[46] Official Master’s Degree in Modelling for Science and Engineering,


915 http://www.uab.cat/web/studying/official-master-s-degrees/
master-s-degree-in-figures-1334300576994.html?param1=
1307112830469, [Online; accessed 3-June-2016].

36
*Author Biography & Photograph

He got the BS degree in computer science in 1992 from University Simón Bolívar. He
got the MSc in computer science in 1994 and in 2006 the PhD in computer science,
both from Universitat Autònoma de Barcelona. Since 1998 his investigation is related
to parallel and distributed computing. He is currently involved in Spanish national
projects and the European project Autotune. His main interests are focused on parallel
applications and automatic performance analysis and tuning. He has been involved in
the definition of performance models for automatic and dynamic performance tuning
on cluster environments.
*Author Biography & Photograph

Ana Cortés received both her first degree and her PhD in Computer Science from the Universitat
Autònoma de Barcelona (UAB), Spain, in 1990 and 2000, respectively. She is currently associate
professor of Computer Science at the UAB, where she is a member of the High performance
Computing Applications for Science and Engineering of the Computer Architecture and
Operating Systems Department. Her current research interests concern performance
engineering of high performance computing environmental sciences applications.
*Author Biography & Photograph

Antonio Espinosa received the BSc degree in computer science in 1994 and the PhD degree in
computer science in 2000. He is a postdoctoral researcher in the Computer Architecture and
Operating Systems Department at the Universitat Autonoma de Barcelona. During the last 10
years, he has participated in several european and national projects related to bioinformatics
and high-performance computing, in collaboration with a number of biotechnology companies
and research institutions.
*Author Biography & Photograph

Tomàs Margalef got a BS degree in physics in 1988 from Universitat Autònoma de Barcelona
(UAB). In 1990 he obtained the MSc in Computer Science and in 1993 the PhD in Computer
Science from UAB. Since 1988 he has been working in several aspects related to parallel and
distributed computing. Currently, his research interests focuses on development of high
performance applications, automatic performance analysis and dynamic performance tuning.
Since 1997 he has been working on exploiting parallel/distributed processing to accelerate and
improve the prediction of forest fire propagation. Since 2007 he is Full Professor at the
Computer Architecture and Operating systems department. He is an ACM member.
*Author Biography & Photograph

Juan C. Moure received his B.Sc. degree in computer science and his Ph.D. degree in
computer architecture from Universitat Autónoma de Barcelona (UAB). Since 2008 he
is associate professor with the Computer Architecture and Operating Systems
Department at the UAB, where he teaches computer architecture and parallel
programming. He has participated in several European and Spanish projects related to
high-performance computing. His current research interest focuses on the usage of
massively parallel architectures and the application of performance engineering
techniques to open research problems in bioinformatics, signal processing, and
computer vision. He is reviewer for various magazines and symposiums and has
authored numerous papers in journals and conferences.
*Author Biography & Photograph

She got the BS degree in computer science in 1999 from Technical University of Wroclaw (Poland).
She got the MSc in computer science in 2001 and in 2004 the PhD in computer science, both from
Universitat Aut\`onoma de Barcelona (Spain). Since 1999 her investigation is related to parallel
and distributed computing. She is currently involved in Spanish national projects and she
participated in the European project Autotune. Her main interests are focused on high
performance parallel applications, automatic performance analysis and dynamic tuning. She has
been involved in programming tools for automatic and dynamic performance tuning on cluster
and Grid environments.
*Author Biography & Photograph

Remo Suppi is associate professor (since 1997) in the Department of Computer Architecture & Operating
Systems and member of the High Performance Computing for Efficient Applications and Simulation
Research Group (HPC4EAS) at Universitat Autònoma de Barcelona (UAB) . He received his PhD in
Computer Science (1996) from UAB and the diploma in Electronic Engineering from the Universidad
Nacional de La Plata (Argentina). His current research interests include computer simulation, distributed
systems, high performance and distributed simulation applied to ABM or individual oriented models. Dr.
Suppi regularly publishes in journals, conference proceedings, and books and is associate researcher at
Informatics Research Institute LIDI (Argentina).
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image

You might also like