Cesar 2017
Cesar 2017
PII: S0743-7315(17)30005-9
DOI: http://dx.doi.org/10.1016/j.jpdc.2016.12.027
Reference: YJPDC 3600
Please cite this article as: E. Cesar, A. Cortés, A. Espinosa, T. Margalef, J.C. Moure, A.
Sikora, R. Suppi, Introducing computational thinking, parallel programming and performance
engineering in interdisciplinary studies, J. Parallel Distrib. Comput. (2017),
http://dx.doi.org/10.1016/j.jpdc.2016.12.027
This is a PDF file of an unedited manuscript that has been accepted for publication. As a
service to our customers we are providing this early version of the manuscript. The manuscript
will undergo copyediting, typesetting, and review of the resulting proof before it is published in
its final form. Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)
A Master on Modelling for science and engineering was started 5 years ago.
Parallel programming and Applied modelling and simulation are subjects included in the
Master’s program.
Eduardo Cesar, Ana Cortés, Antonio Espinosa, Tomàs Margalef∗, Juan Carlos
Moure, Anna Sikora, Remo Suppi
Computer Architecture and Operating Systems Department,
Universitat Autònoma de Barcelona. 08193 Cerdanyola del Vallès.
Spain
Abstract
Nowadays, many fields of science and engineering are evolving through the joint
contribution of complementary fields. Computer science, and especially High
Performance Computing, has become a key factor in the development of many
research fields, establishing a new paradigm called computational science. Re-
searchers and professionals from many different fields require knowledge of High
Performance Computing, including parallel programming, to develop fruitful
and efficient work in their particular field. Therefore, at Universitat Autònoma
of Barcelona (Spain), an interdisciplinary Master on ”Modeling for Science and
Engineering” was started 5 years ago to provide a thorough knowledge of the
application of modeling and simulation to graduate students in different fields
(Mathematics, Physics, Chemistry, Engineering, Geology, etc.). In this Mas-
ter’s degree, ”Parallel Programming” appears as a compulsory subject because
it is a key topic for them. The concepts learned in this subject must be applied
to real applications. Therefore, a complementary subject on ”Applied Model-
I This work has been partially supported by MINECO-Spain under contracts TIN2014-
1. Introduction
Many fields of science and engineering are applying techniques and recent
advances in complementary fields. In this interdisciplinary context, researchers
and professionals with greater knowledge of problem modeling and High Per-
5 formance Computing (HPC) are in high demand from companies and research
centers. Since 2011, Universitat Autònoma of Barcelona hosts a Master’s degree
in Modeling for Science and Engineering to provide these kinds of professionals
to those companies and centers.
The Master’s involves an interdisciplinary collaboration among professors
10 from various departments; mainly Physics, Mathematics, and Computer Archi-
tecture and Operating Systems. The main objective is to provide the average
science graduates with mathematical and computational tools to treat differ-
ent types of scientific and/or technological problems. It covers a large range of
problems, introducing many different approaches and tools. In particular, the
15 students are provided with the basic knowledge to be able to model a physical
system involved in some problems, to represent the model mathematically and
to solve the problem applying different methods such as differential partial equa-
tions, optimization, time series, and related methods. The students learn how to
analyze their particular problems and think about them from a computational
2
20 perspective, i.e. formulating a problem and expressing its solution(s) in a way
that a computer can effectively carry out. Finally, they have to think in parallel
solutions and learn HPC technology to evaluate and improve the performance
of models, applications and simulators. In order to include all these different
issues in the Master’s, we define three training pillars:
3
Figure 1: Evolution of the number of students and their background degrees
students, but it has been growing continuously and now it has more than 30
50 registered students. Figure 1 shows a clear indication of the high impact of the
studies, perceived from the point of view of the students enrolled every year.
Among the students, the most common backgrounds are mathematics and
physics; but other backgrounds, such as chemistry, life sciences (biology, bio-
chemistry, biotechnology), mechanical engineering or computing engineering are
55 increasing in number. This heterogeneity in the students background implies
a diverse prior knowledge and experience in programming: some of them have
some experience with FORTRAN, others have a light knowledge of C, or have
some knowledge of programming in Python or other languages. This hetero-
geneity in the students programming background introduces a crucial point in
60 the teaching objective since it is necessary to establish a common foundational
language. Then, we can teach them parallel programming and high performance
computing as resources to be applied to different fields.
The main goal is to combine three aspects:
4
65 ally do not show any significant skills on computational thinking or per-
formance engineering. So, we plan to show them a complete view, from
the problem definition, to the performance analysis and tuning.
5
Then, Section 5 presents some examples of case study applications that are in-
95 troduced to the students along with proposals for further developments. Section
6 resumes some global academic results. Finally, Section 7 presents the main
conclusions of this teaching experience.
Already in the 90’s, several scientists [1][2] realized the need for training com-
100 putational scientists due to, among other reasons, the dramatic effects expected
from parallel computing development on computers’ performance and capabili-
ties. This training should be focused in providing computational skills for solv-
ing complex problems to professionals of different areas (Chemistry, Physics,
Mathematics, and also Computer Science). After 20 years, parallel computing
105 is having the expected effects and training computational scientists is still a
relevant discussion issue [3].
Parallel programming is significantly more complex than sequential one and,
consequently, teaching it is a challenge, especially in the case of students with
little computer science background. For this reason, general proposals for intro-
110 ducing parallel thinking and programming, such as [4] and [5], are still presented
and discussed in education forums. Contents presented in these proposals in-
clude lessons about the main elements related to parallel systems and parallel
programming. Moreover, there are also works, such as [6], proposing strate-
gies for simplifying the understanding of parallel computing concepts by non-
115 computer scientists.
Based on this background, and taking into consideration industry and re-
search requirements, many universities have implemented postgraduate pro-
grams for training computational scientists. Most of these programs ([7] [8]
[9][10][11][12][13]) explicitly include subjects on parallel thinking and program-
120 ming, even though, there are programs which do not include related contents
explicitly ([14][15]).
Generally, these are two-year programs that offer one single subject dedi-
6
cated to parallel programming. This topic usually introduces the main charac-
teristics of parallel architectures and heavily relies on practical exercises, since
125 general computational concepts have been introduced in other subjects. In ad-
dition, in most cases the course is focused only in one programming paradigm
(shared memory [10] or message passing [11]) or a high level language [7].
There are many similarities among these postgraduate programs and our
proposal. For example, all the proposals are based on parallel teaching founda-
130 tions and give great importance to practical training.
There are, though, some significant differences. First, it is worth men-
tioning that we are offering two subjects related to parallel programming and
computational thinking in our master program, even though it is a one-year
programme. One of the subjects (Parallel Programming) is compulsory, so
135 everybody is getting the foundations, and the other (Applied Modeling and
Simulation) is complementary, for the students interested in getting a deeper
knowledge. Second, the Parallel Programming subject covers all current parallel
programming paradigms, i.e., shared memory (OpenMP and Massive Parallel
Processors (GPUs)) and message passing (MPI). We think that this approach
140 gives students a wider view of parallel programming on currently available archi-
tectures, although, it may sacrifice some degree of detail. We are also providing
innovative content introducing recent parallel programming extensions like Cilk
Plus, and new industrial standards like OpenACC, addressed to simplify parallel
software engineering.
145 Concerning Modeling and Simulation topics, there are proposals on the sub-
ject in the above programs, but they are mostly focused on specific areas. For
example, in many of them there are courses on modeling and simulation of non-
linear systems, mathematical & numerical modeling and simulation (including
simulation on HPC architectures and commercial software) or in specific fields
150 of knowledge (e.g. Biological Systems with Differential Equations, Fluids and
Soft Matter, Combat Modeling, Simulation Modeling in Transportation Net-
works, etc.). We consider that our modeling and simulation training on high
performance computers using ABM allows students to analyze the potential of
7
High Performance Simulation on complex models that are close to their area of
155 expertise.
8
Figure 2: Two-dimensional teaching methodologies
• Lecture session: the teacher exposes the most essential topics of the sub-
190 ject to provide a broad, common knowledge background for all of the
students.
9
• Bibliography review : a list of documents is given to the students, so that
they have a basic corpus of documents associated with the subject. The
list contains a reduced number of books and articles that the students
must know and use throughout the subject. An extra reference document
205 list is also provided allowing the students to get more insight on specific
interests.
210 • Invited conferences: Special lectures given by invited speakers where dis-
cussion is open for relevant subject topics. The objective is to foster the
interactions between the students and the professional experts.
The rest of the paper describes the objectives, principles and detailed method-
ology used in two specific subjects of the Master’s focused on computational
215 thinking and performance engineering, namely Parallel Programming and Ap-
plied Modeling and Simulation. These subjects, which are focused on High
Performance Computing, provide the basic concepts to introduce the students
to computational thinking, solving a given problem in a parallel and efficient
way and learning to apply the principles of performance engineering to scientific
220 or industrial applications.
Taking into account the chosen teaching methodology, the Parallel Program-
ming subject starts by providing a general programming background of C lan-
guage. This is done through the use of introductory lectures and programming
labs. Then, we present the basic theoretical concepts of parallel programming by
225 combining lectures on computer architecture and a selected bibliography review.
Next, students must survey a list of relevant parallel algorithms with some gen-
eral introductory lectures and are given an experimental portfolio to analyze ma-
trix multiplication in practice. From here, students must analyze a selected list
of case studies of parallel computational patterns like map, reduce and stencil.
10
230 In this part of the subject, they have to apply the computational thinking con-
cepts to an experimental portfolio with examples like parallel prefix and convex
hull. Finally, they receive a conceptual map of programming paradigms: shared
memory, message passing and accelerator-oriented massively-parallel program-
ming. These lectures are complemented with several lab sessions where students
235 use performance analysis tools to develop a full performance engineering cycle
for example applications.
The Applied Modeling and Simulation subject adopts a similar methodolog-
ical approach. First, the students need to attend a short number of lectures for
the development of a simulation model. Then, they are provided with the expla-
240 nation of several case studies, such as emergency evacuation and meteorological
services where they have to compare their own designs with already existing
solutions. Finally, the students must apply performance engineering principles
in several lab sessions addressed to analyze the performance of the simulation
process.
245 In the next sections, we are going to provide more detailed descriptions of
the particular objectives, contents and how the planned activities are put into
practice for the two subjects, Parallel Programming and Applied Modeling and
Simulation. Finally, we provide some conclusions obtained from the implemen-
tation of these subjects over the last few years.
11
this initial training, in part due to their high interest and their previous pro-
260 gramming experience. Our previous experiences have shown us that devoting
some time for setting this basic C knowledge becomes a hard requirement before
introducing shared memory or message passing programming.
Once the students have learned the C programming principles, it is necessary
to introduce them to the basic concepts of parallel programming. The first point
265 to present is the general idea of parallelism itself and how HPC computing
platforms are designed. So, a general introduction to parallel and distributed
systems, multi-core processors, memory hierarchy and accelerators, is presented
to the students. These objectives present a challenge because it is necessary to
provide the students useful, real architecture concepts while avoiding excessively
270 deep details that are complex to relate to programming issues and may become
a threat to the assimilation of the relevant knowledge. For this reason, we
provide a gentle, summarized introduction with selected further readings for
those students particularly interested in the architectural aspects.
The following point in the subject is an introduction to parallel algorithms.
275 The computational aspects of parallel algorithm design must be introduced to
the students, showing them different current paradigms and related tools. We
provide details on several parallel algorithms for different computational prob-
lems. The first problem considered is matrix multiplication, which most of
them know very well and have already programmed sequentially. We start by
280 showing them how the problem is inherently parallel. Several matrix multipli-
cation parallel algorithms are shown and analyzed considering different aspects
such as computational complexity, communication requirements, data structure
layout and size and memory requirements. These different algorithms are an-
alyzed considering the previously mentioned architectural aspects, showing the
285 implications of computing capabilities, communication network and memory
limitations.
Throughout the subject, we identify several important parallel computation
patterns [20], which are used in many examples. The map pattern is exemplified
by the vector addition algorithm (and the outer loops of matrix multiplication).
12
290 It is an appropriate pattern to introduce parallelism as it does not involve any
dependence or communication among threads. The reduce pattern is studied in
the inner loop of matrix multiplication, we use it to introduce the problem of
synchronization and the idea of re-associating arithmetic operations to increase
parallelism. The stencil pattern is used to simulate the movement of a string,
295 and requires synchronization, sharing, and communications of boundary data.
Two additional parallel computation patterns are studied by means of the
exercises proposed to the students. The parallel prefix algorithm (scan pat-
tern) and the convex hull problem (divide and conquer or recursive pattern) are
proposed so that students can analyze the problem and find out the sources of
300 potential parallelism in the algorithm. The students compare their proposals
considering aspects such as algorithm complexity, memory and communication
requirements.
Once the basic concepts of programming and parallelism have been presented
to the students, it is feasible to enter the core part of the Parallel Program-
305 ming subject. In this part, three paradigms are presented: Shared memory,
Message passing and Accelerator-oriented massively-parallel program-
ming (GPUs). The rationale for this organization is that developing programs
with a shared memory model, such as OpenMP, requires a simple modification
of a C sequential program by including just some directives. So, the students can
310 parallelize their sequential C programs in just one lab session. After OpenMP,
MPI is introduced. In this case, it is necessary to think about how to paral-
lelize the algorithm, which processes must be defined, how such processes must
communicate, and so on. This implies a greater effort from the students. The
last approach introduced is OpenACC and CUDA as programming models for
315 GPUs (accelerators), which requires a more detailed understanding of memory
hierarchy and the coordinated use of thousands of threads to reach relevant
performance gains.
The programming sessions are complemented with the introduction of per-
formance analysis tools to understand the benefits of parallel programming
320 and to detect and correct performance bottlenecks. Fundamental performance
13
engineering abstractions are introduced, like the speedup concept, Amdahl’s
and Little’s Laws, and the Roofline model.
The steps of the learning evolution shown in Figure 2 are applied in this
subject for each of the aforementioned topics. Consequently, the initial lectures
325 (one in most cases) are used by the professor to introduce the main concepts
regarding the topic and, next, students assume incrementally more and more
responsibility in the subsequent sessions associated with each topic.
The specific development of these topics is covered in the following subsec-
tions.
As mentioned above, once students are familiarized with C and basic con-
cepts of parallel algorithms, the most natural way to introduce parallel applica-
tions development is by using OpenMP [21].
OpenMP is a portable and flexible directive-based API for shared-memory
335 parallel programming which, for some basic code constructions, allows us to
express parallelism in an extremely simple way. Given these characteristics, it
has become the de-facto standard for multicore shared-memory architectures.
In addition, current laptops and desktop computers have multicore processors
and, consequently, students can test all the examples given in class and develop
340 new ideas on their own computers.
After a few motivating examples, such as the one shown in Listing 1, the
contents of the theoretical OpenMP lecture (2 hours) are structured as follows:
345 • Fork-join model. The #pragma omp parallel clause. Introducing par-
allel regions. Data management clauses (private, shared, firstprivate,
lastprivate)
14
• Task parallelism: sections. The #pragma omp sections and #pragma
350 omp section clauses.
The instructor plays a central role in this lecture and, consequently, it has
been structured following the corresponding strategy, i.e. theoretical lecture, as
360 presented in Section 3. This structure starts by describing the most general and
essential concepts (shared memory model, threads and synchronization). Next,
it introduces different parallel constructs ordered according to their conceptual
complexity: all threads doing the same work (parallel construct), all threads
executing the same code on different portions of data (parallel for construct),
365 and threads executing different tasks (parallel section construct). Then, it intro-
duces several OpenMP synchronization mechanisms, which naturally leads to a
discussion of their negative performance implications and strategies to minimize
their use.
#pragma omp p a r a l l e l f o r
370 fo r ( i = 0 ; i < N; i++ )
c [ i ] = a[ i ] + b[ i ];
After this lecture, the student should assume the central role and be able
to apply the acquired theoretical knowledge to real cases. Consequently, the
concepts introduced in this lecture are reinforced in a lab session (2 hours with
375 an instructor and 6 hours of autonomous development of practical exercises),
15
where students must use OpenMP to parallelize the code for simulating the
movement of a string developed in the C labs (see Listing 2). In this way,
students continue their work and can experience the advantages of using the 4
cores available in each piece of lab equipment.
Several of the strategies presented in Section 3 have been applied in the de-
sign of this lab session. First, the lab session strategy has been used to train the
16
students in the use of the most common tools used in the lab: compilers (gcc),
405 monitoring tools (likwid[22], perf[23]), remote access and resource management
(SGE[24]). A detailed manual describing these tools with examples has been
elaborated with this objective. In this case, students develop this exercise in
groups of two. Second, the case study strategy has been used to work on the
problem of parallelizing the string movement simulator previously described. In
410 this case, students are provided with a very short outline of the problem, so
they must explore different approaches to the solution on their own. Third, the
conceptual maps strategy has been used to make students organize and summa-
rize the concepts learned. Students must write and deliver a report describing
their solution to the problem and the tests done on the application they have
415 developed. Finally, the experimental portfolio strategy is used in order to follow
the student’s evolution through the set of exercises developed in the lab.
17
• MPI program structure. Initializing and finalizing the environment
MPI Init and MPI Finalize. Communicator’s definition (MPI COMM WORLD),
435 getting the number of processes in the application (MPI Comm size) and
the process rank (MPI Comm rank). General structure of an MPI call.
The instructor plays a central role in these lectures and, consequently, they
have been structured following the corresponding strategy, i.e. theoretical lec-
ture, as presented in Section 3. This structure starts by describing the most
450 general and essential concepts (distributed memory model, processes and mes-
sage passing). Next, it introduces the basic MPI concepts: program structure,
communicators, process identifier and MPI function naming convention. Then,
it introduces different types of communication ordered according to their con-
ceptual complexity: point-to-point blocking communication, point-to-point non-
455 blocking communication, and collective communication. This naturally leads to
a discussion of the impact of each type of communication on the application
performance and programming complexity. This discussion on MPI applica-
tions performance is also used to present the load balancing problem and some
strategies to overcome it.
460 Students work around these concepts in the lab sessions by developing a
simple program for computing π approximation using the dartboard approach
18
[26]. This approach simulates throwing darts at a dartboard on a square backing.
As each dart is thrown randomly, the ratio of darts hitting the board to those
landing on the square is equal to the ratio between the two areas , which is π/4.
465 A parallel implementation of this algorithm consists of a certain number of
processes throwing a fixed number of darts and calculating their own approxi-
mation of π, then one of the processes (the master) receives all approximations
and calculates the average value. In this solution, workers send their results to
the master (process with rank 0) using point-to-point communication.
470 A second approach consists of distributing the total number of throws among
all the processes, and each of them will calculate its own number of hits (darts
in the circle) and send it to the master process, which will compute the π
approximation. In this case, the master sends the number of throws that must
be done by each process and receives the number of hits, always using collective
475 communication functions.
As in the case of OpenMP, several of the strategies presented in Section 3
have been applied in the design of these lab sessions. First, the lab session
strategy has been used to train the students in the use of MPI tools: mpicc,
mpirun and mpe[27]. Also in this case, students develop this exercise in groups
480 of two. Second, the case study strategy has been used to work on the problem
of π computation; again, students are provided with a very short outline of
the problem, so they must explore different approaches to the solution on their
own. Third, the conceptual maps strategy has been used because students must
write and deliver a report describing their solutions to the problem and the
485 tests done on the applications they have developed. Finally, the experimental
portfolio strategy is used in order to follow the students’ evolution through the
set of exercises developed in the lab.
19
presenting the OpenACC toolkit to the students, then providing them with a
deeper view of accelerators with CUDA.
OpenACC [28] is an open specification for compiler directives for parallel
495 programming. With the use of high level directives, similar to OpenMP, appli-
cations can be accelerated without losing portability across processor architec-
tures.
CUDA is an extension for massively parallel programming of GPUs (or ac-
celerators). We choose CUDA instead of OpenCL because of the existence of
500 efficient and mature compiling, debugging and profiling tools, and because of
the extensive information available. The contents of the lectures are structured
as follows:
515 The lecture uses vector addition as an example to introduce the OpenACC
and CUDA syntax. Four implementations are provided and evaluated using: (a)
one single thread, (b) one CTA, (c) a grid of CTAs where each thread performs a
single addition, and (d) a grid of CTAs with more work per thread. We show the
performance results (deceiving, for the first implementations) to motivate the
20
520 different solutions and the need for developing good performance engineering
skills.
We also present Thrust [29], a high-level parallel algorithm library written in
C++, to show the students the benefit of learning object-oriented programming
and software engineering concepts. However, due to the limited background of
525 our students and obvious time limitations, it is out of the scope of our subject
to provide further information on Thrust usage.
Students must use OpenACC and CUDA in the lab sessions to parallelize
the code that simulates the movement of a string. They explore, step by step,
the different obstacles they must face to exploit the full potential of GPUs and
530 increase performance ≈10x with respect to the multicore CPU code.
The methodological strategies used in this part of the subject are very sim-
ilar to those on the previous parts. First, lecture sessions provide a general
introduction to the concepts defined above. Then, lab sessions train the stu-
dents in the use of Nvidia software development tools. Students receive the
535 vector addition case study and must provide incremental solutions that rely on
performance-oriented design decisions.
21
Figure 3: Performance analysis in the application development cycle.
550 The instructor plays a central role in this lecture and, consequently, it has
been structured following the theoretical lecture strategy presented in Section
3. In this case, the contents are naturally guided by the performance analysis
cycle presented in Figure 3. Consequently, measurement and monitoring con-
cepts and tools are presented first. For example, Performance API (PAPI) [32]
555 and Dyninst [33] are mentioned as supporting tools for getting execution mea-
surements. Then, performance analysis approaches and tools are discussed. For
example, Tuning and Analysis Utilities (TAU) [34], Scalasca [35] and Paraver
[36] are presented as analysis and visualization tools. Finally, automatic/dy-
namic tuning concepts and tools are introduced. In this case, Periscope Tuning
560 Framework (PTF) [37], MATE [38] and Elastic [39] are presented as automatic
analysis and tuning tools.
22
4.5. Learning outcomes
Our objective in this subject is that, taking into consideration that the
students of this Master’s program come from different fields, the student achieves
565 the following learning outcomes:
585 • Design and develop the parallel solutions to a computational problem tak-
ing the characteristics of the available hardware into account. To achieve
this outcome, the subject includes techniques and tools to implement par-
allel applications on multi-core, cluster and accelerator architectures.
23
• Interpret information from performance-analysis tools and be able to con-
590 sider application-specific design decisions to improve performance. Stu-
dents use appropriate tools to analyze the performance of an application
and are asked to include the results and impact of their analysis in the lab
reports.
610 1. Case studies in collaboration with industry and research laboratories that
use modeling and simulation activities every day.
2. Simulation model development and performance analysis
24
615 5.1. Case studies
The first part, Case studies, is conducted in collaboration with industry and
research laboratories that use modeling and simulation activities every day. The
activities carried out include invited lectures from researchers that work in these
laboratories and use modeling to carry out their work.
620 The first case considered is the paradigmatic example of meteorological ser-
vices. Everybody watches the weather forecast on TV every day and can imagine
the complexity of the models involved, with huge meshes of points with hun-
dreds of variables estimated for every point, and the computing requirements
needed to provide a real prediction. However, in this particular case, it is known
625 that weather prediction models show chaotic behavior. The way to keep this
behavior as limited as possible is to execute not just a single simulation, but
a complete set of scenarios (called ensemble) and apply statistical methods to
conform the final prediction. This meteorological modeling and prediction part
is presented by members of the Servei Meteorològic de Catalunya (Meteorolog-
630 ical Service of Catalonia). Obviously, it is outside of the scope of the subject
to develop a meteorological model, but, the students can use some small spe-
cific models such as wind field models (WindNinja [40]) to analyze its execution
time, scalability and speedup. In this context, some students (one or two per
year) may enroll in an internship in this meteorological service developing code
635 for some particular model or applying parallel programming techniques to some
of the existing models.
In a similar way, a collaboration has been established with the IC3-BSC (In-
stitut Català de Ciències del Clima- Barcelona Supercomputing Center), but,
in this case, the models and predictions are related to climatological models in-
640 volving very large time scales. In this case, the real time aspect is not so critical,
since the predictions are considered for decades or even centuries. However, the
main point is to run hundreds or thousands of simulations with different pa-
rameters that make the total amount of computational requirements extremely
high. Also in this case, some students carry out an internship in this center,
645 where they have access to very large computing resources and can do studies on
25
speedup and scalability.
In the second part, the students develop a certain simulation model and
analyze its performance. In this case, the teaching strategy is based on three
650 well-defined parts: lecture sessions (including conceptual maps), lab sessions
and, finally, experimental porfolio. In the last two parts, students develop a
project and carry out lab sessions with teacher supervision. At the beginning,
the teacher presents the concepts on a particular modeling technique (agent-
based modeling -ABM-) and a conceptual map to show the hierarchy of the
655 concepts to be developed. These types of models (ABM) is used to model real
systems from different areas of knowledge that are close to the initial knowledge
of students. ABM can represent complex patterns of behavior through simple
rules and provide useful information about the system dynamics of the real
world. In addition, it is a kind of simulation that needs high computing power
660 when the number of individuals increases which is suitable for the objectives
pursued in the area of HPC. As case study, a model of emergency evacuation
using ABM is analyzed [41] and the students must perform some practical ex-
ercises to extend the model and analyze its performance in lab sessions. There
are different aspects for model analysis: the environment and the information
665 (doors and exit signals), policies and procedures for evacuation, and the so-
cial characteristics of individuals that affect the response during the evacuation.
Moreover, the following hypothesis are defined as a starting point for the model:
• Individuals try to achieve their objectives and may try to push each other
in their attempt to exit through a specific door, causing physical injury to
other individuals.
26
Figure 4: Agent Based Modeling for emergency evacuations.
Students receive a partial model that includes the management of the evac-
675 uation of an enclosed area that presents a certain building structure (walls,
access, etc.) and obstacles, with particular signaling and the corresponding safe
zones and exits. The model also includes individuals who should be evacuated
to safe areas. This model has been developed to support different parameters
such as: individuals with different ages, total number of people in the area,
680 number of exits, number of chained signals and safe areas, speed of each indi-
vidual, and probability of exchanging information with other individuals. The
model mentioned is implemented in NetLogo [42] and Figure 4 represents its
main characteristics.
The first practical work for the students requires using a single-core archi-
685 tecture in the lab to analyze the performance of the model and then incorporate
a new, not covered, policy: overcrowding in exit zones [43]. Students must then
27
complete a new performance analysis of the new model.
Considering the variability of each individual in the model, a stability anal-
ysis is required. For this, the Chebyshev Theorem (also spelled as Tchebycheff)
690 will be used with a confidence interval of 95% and α = 0.05, m = 6. The re-
sult for this analysis indicates that at least 720 simulations must be done to
obtain statistically reliable data. Taking into account the 720 executions on one
core processor, the simulation time (average) is 7.34 hours for 1000 individuals
and 27.44 hours for 1,500 individuals per scenario. In order to use this tool
695 as a Decision Support System (DSS), the students are instructed in necessary
HPC techniques and the embarrassingly parallel computing model is presented
as a method to reduce the execution time and the decision-making process time
[44][45]. Therefore, students must learn how to execute multiple parametric
Netlogo model runs in a multi-core system and how to make a performance
700 analysis to evaluate the efficiency and scalability of the method.
Finally, the instructor offers a set of tasks based on the developed model
(experimental portfolio), proposing new challenges and additional specifications
so the student can assess their knowledge. These sessions are performed au-
tonomously by the student in the laboratory, but the work is evaluated and
705 represents a percentage of the grade of the module.
The objective of this subject is that the students achieve the following learn-
ing outcomes:
28
715 • Implement appropriate numerical methods to solve models in the particular
field under consideration. The mathematical models involved in complex
systems usually must be solved by numerical methods. Students must
implement different numerical methods efficiently and integrate them into
simulation tools.
720 • Simulate the behavior of complex systems. Students must use the simula-
tion tools to study the behavior of the system.
• Validate the simulation results with the predictions of the models and the
behavior of the real system. Students must compare the results obtained
from the simulation with the real behavior of the system to validate the
725 correctness of the simulation tools.
6. Academic results
The global result is very positive and, after 5 years of teaching in this interdis-
ciplinary Master’s, we have observed that students achieve a much wider range
of knowledge, which allows them to tackle problems from disciplines different
730 from their original background. Actually, most of them get jobs in companies
where they apply the concepts of computational thinking and performance engi-
neering to improve the applications developed at those companies in areas such
as traffic simulation, geophysical simulation, water pollution simulation, spread
of disease simulation, and many others, independently of their background.
735 Currently, the number of qualified candidates exceeds the number of offers
and academic results are very high. The success rate (calculated as number
students that pass the subject / number of assessed students) and the perfor-
mance rate (calculated as number of students that pass the subject / number
of enrolled students) for both of the subjects presented in this paper are very
740 high, up to 100%. The detailed results of the whole Master’s can be seen on the
Master’s official web page [46].
Quantitative results are relevant to assess the success of the presented sub-
jects, but we consider that the feedback given by students and companies has
29
been very important for building this success. Students answer each semester
745 a survey about the studied subjects. This survey includes subjective questions
such as ”Tell us what have you liked the most from this course”, ”Do you have
any improvement suggestion?” or ”Do you consider the workload of this course
has been adequate?”. In the latest editions, students have expressed that these
subjects are extremely useful and are a great complement to other subjects
750 (specially those related to Big Data) of the Master’s programme. Most of them
have considered that the workload is adequate, and some have suggested that
it is possible to go deeper in certain contents. Overall, they have qualified the
2016 edition of these subjects with 92/100.
Finally, it is important to mention that there are many companies that
755 collaborate in both subjects. They provide certain models that the students
can utilize in the lab sessions and deliver invited talks. These conferences are
very fruitful for students and enrich their knowledge. Students can see a real
application and an actual usage of the theoretical bases taught in the subject
lectures.
760 7. Conclusions
Many fields of science and engineering are evolving through the contribution
of complementary fields. This implies that project teams in companies and
research centers have significant interdisciplinary components. It is necessary
for people from different fields to be able to establish a common ground and
765 understand the requirements and the effects of the problems and solutions for
all members of the team. In this sense, it is worth mentioning the performance
effects of application design decisions, which, many times, invalidate otherwise
valid ideas.
High Performance Computing (HPC), including parallel and distributed pro-
770 gramming, becomes a central factor that is applied to many fields from science
and engineering. So, it is necessary for students from various fields to receive
significant training in HPC. In this way, they will be able to design and develop
30
their own applications and, even more importantly, they will understand the
decisions needed to get the most from a given computational platform, such
775 as which is the most suitable programming paradigm, which are the most rel-
evant performance metrics and how to measure them. In this way, they can
establish a common language with computer scientists and work together in the
development of more powerful and successful applications.
In this interdisciplinary context, we show our experience of teaching parallel
780 programming in interdisciplinary studies at a graduate level. We present a
methodological background with the main principles and activities applied to
the subject development. We have used a balanced perspective in which teachers
first use direct methodologies to establish the background of the subject; then,
students are progressively presented with more interactive, practical activities,
785 and have to understand the problems and propose their own solutions.
The main conclusion is that the experience has been very successful and
most students enjoy developing parallel programs, analysing their behavior and
trying to improve their performance. After this experience, it would be very
interesting to introduce similar subjects at the undergraduate level, so that
790 students from different fields are able to apply High Performance Computing
techniques to their computational problems from the very beginning.
References
[1] G. C. Fox, Parallel computing and education, Daedalus 121 (1) (1992)
111–118.
31
800 [4] A. Marowka, Think parallel: Teaching parallel programming today, IEEE
Distributed Systems Online 9 (8) (2008) 1–8.
805 [6] H. Neeman, L. Lee, J. Mullen, G. Newman, Analogies for teaching parallel
computing to inexperienced programmers, SIGCSE Bull. 38 (4) (2006) 64–
67.
32
[14] Computational and Mathematical Engineering MS Degree, http://scpd.
stanford.edu/programs/masters-degrees, [Online; accessed 21-Sept-
2016].
33
[25] Message Passing Interface Forum, http://www.mpi-forum.org/, [Online;
accessed 18-May-2015].
[35] F. Wolf, Scalasca, in: Encyclopedia of Parallel Computing, 2011, pp. 1775–
1785.
34
[37] R. Miceli, G. Civario, A. Sikora, E. César, M. Gerndt, H. Haitof, C. B.
Navarrete, S. Benkner, M. Sandrieser, L. Morin, F. Bodin, AutoTune: A
Plugin-Driven Approach to the Automatic Tuning of Parallel Applications,
880 in: Applied Parallel and Scientific Computing - 11th International Confer-
ence, PARA 2012, Helsinki, Finland, June 10-13, 2012, Revised Selected
Papers, 2012, pp. 328–342.
35
905 10.1016/j.procs.2014.05.145.
URL http://dx.doi.org/10.1016/j.procs.2014.05.145
[44] I. T. Foster, Designing and building parallel programs - concepts and tools
for parallel software engineering, Addison-Wesley, 1995.
36
*Author Biography & Photograph
He got the BS degree in computer science in 1992 from University Simón Bolívar. He
got the MSc in computer science in 1994 and in 2006 the PhD in computer science,
both from Universitat Autònoma de Barcelona. Since 1998 his investigation is related
to parallel and distributed computing. He is currently involved in Spanish national
projects and the European project Autotune. His main interests are focused on parallel
applications and automatic performance analysis and tuning. He has been involved in
the definition of performance models for automatic and dynamic performance tuning
on cluster environments.
*Author Biography & Photograph
Ana Cortés received both her first degree and her PhD in Computer Science from the Universitat
Autònoma de Barcelona (UAB), Spain, in 1990 and 2000, respectively. She is currently associate
professor of Computer Science at the UAB, where she is a member of the High performance
Computing Applications for Science and Engineering of the Computer Architecture and
Operating Systems Department. Her current research interests concern performance
engineering of high performance computing environmental sciences applications.
*Author Biography & Photograph
Antonio Espinosa received the BSc degree in computer science in 1994 and the PhD degree in
computer science in 2000. He is a postdoctoral researcher in the Computer Architecture and
Operating Systems Department at the Universitat Autonoma de Barcelona. During the last 10
years, he has participated in several european and national projects related to bioinformatics
and high-performance computing, in collaboration with a number of biotechnology companies
and research institutions.
*Author Biography & Photograph
Tomàs Margalef got a BS degree in physics in 1988 from Universitat Autònoma de Barcelona
(UAB). In 1990 he obtained the MSc in Computer Science and in 1993 the PhD in Computer
Science from UAB. Since 1988 he has been working in several aspects related to parallel and
distributed computing. Currently, his research interests focuses on development of high
performance applications, automatic performance analysis and dynamic performance tuning.
Since 1997 he has been working on exploiting parallel/distributed processing to accelerate and
improve the prediction of forest fire propagation. Since 2007 he is Full Professor at the
Computer Architecture and Operating systems department. He is an ACM member.
*Author Biography & Photograph
Juan C. Moure received his B.Sc. degree in computer science and his Ph.D. degree in
computer architecture from Universitat Autónoma de Barcelona (UAB). Since 2008 he
is associate professor with the Computer Architecture and Operating Systems
Department at the UAB, where he teaches computer architecture and parallel
programming. He has participated in several European and Spanish projects related to
high-performance computing. His current research interest focuses on the usage of
massively parallel architectures and the application of performance engineering
techniques to open research problems in bioinformatics, signal processing, and
computer vision. He is reviewer for various magazines and symposiums and has
authored numerous papers in journals and conferences.
*Author Biography & Photograph
She got the BS degree in computer science in 1999 from Technical University of Wroclaw (Poland).
She got the MSc in computer science in 2001 and in 2004 the PhD in computer science, both from
Universitat Aut\`onoma de Barcelona (Spain). Since 1999 her investigation is related to parallel
and distributed computing. She is currently involved in Spanish national projects and she
participated in the European project Autotune. Her main interests are focused on high
performance parallel applications, automatic performance analysis and dynamic tuning. She has
been involved in programming tools for automatic and dynamic performance tuning on cluster
and Grid environments.
*Author Biography & Photograph
Remo Suppi is associate professor (since 1997) in the Department of Computer Architecture & Operating
Systems and member of the High Performance Computing for Efficient Applications and Simulation
Research Group (HPC4EAS) at Universitat Autònoma de Barcelona (UAB) . He received his PhD in
Computer Science (1996) from UAB and the diploma in Electronic Engineering from the Universidad
Nacional de La Plata (Argentina). His current research interests include computer simulation, distributed
systems, high performance and distributed simulation applied to ABM or individual oriented models. Dr.
Suppi regularly publishes in journals, conference proceedings, and books and is associate researcher at
Informatics Research Institute LIDI (Argentina).
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image
*Author Biography & Photograph
Click here to download high resolution image