Parallel Computing: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
Parallel Computing: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
Operating System Concepts – 10th Edition Silberschatz, Galvin and Gagne ©2018
Outline
▪ Review
▪ Overview parallelism
• Types of parallelism
– Simple perspective: Three types of parallelism
– Coupling and parallelism perspective
– Concurrency perspective
• Design elements parallel system
– Amdahl’s Law
– Modeling Parallel Computation
• Example NEC Cluster
▪ Programming parallel systems
• Multi-core memory share systems (OpenMP)
• MPI process and messaging (OpenMPI)
Operating System Concepts – 10th Edition 4.2 Silberschatz, Galvin and Gagne ©2018
Multiple-Processor Scheduling
Operating System Concepts – 10th Edition 5.3 Silberschatz, Galvin and Gagne ©2018
Multithreaded Multicore System
▪ Chip-multithreading (CMT)
assigns each core multiple
hardware threads. (Intel refers to
this as hyperthreading.)
Operating System Concepts – 10th Edition 5.4 Silberschatz, Galvin and Gagne ©2018
Multicore Programming
▪ Types of parallelism
• Data parallelism – distributes subsets of the same data across
multiple cores, same operation on each
• Task parallelism – distributing threads across cores, each thread
performing unique operation
Operating System Concepts – 10th Edition 4.5 Silberschatz, Galvin and Gagne ©2018
Data and Task Parallelism
Operating System Concepts – 10th Edition 4.6 Silberschatz, Galvin and Gagne ©2018
Simple perspective: Three types of parallelism
Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.7 Silberschatz, Galvin and Gagne ©2018
Multicomputers vs multiprocessor parallel systems
Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Operating System Concepts – 10th Edition 4.8 Silberschatz, Galvin and Gagne ©2018
Coupling and parallelism perspective
▪ Tightly coupled multiprocessors (with UMA shared memory). These may
be either switch-based (e.g., NYU Ultracomputer, RP3) or bus-based (e.g.,
Sequent, Encore).
▪ Tightly coupled multiprocessors (with NUMA shared memory or that
communicate by message passing). Examples are the SGI Origin 2000 and
the Sun Ultra HPC servers (that communicate via NUMA shared memory),
and the hypercube and the torus (that communicate by message passing).
Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge University
Press.
Operating System Concepts – 10th Edition 4.9 Silberschatz, Galvin and Gagne ©2018
NUMA and CPU Scheduling
If the operating system is NUMA-aware, it will assign memory closes to the
CPU the thread is running on.
Processor Affinity
Operating System Concepts – 10th Edition 5.10 Silberschatz, Galvin and Gagne ©2018
Coupling and parallelism perspective
▪ Loosely coupled multicomputers (without shared memory) physically
colocated. These may be bus-based (e.g., NOW connected by a LAN or
Myrinet card) or using a more general communication network, and the
processors may be heterogeneous. In such systems, processors neither
share memory nor have a common clock, and hence may be classified as
distributed systems – however, the processors are very close to one
another, which is characteristic of a parallel system.
▪ Loosely coupled multicomputers (without shared memory and without
common clock) that are physically remote. These correspond to the
conventional notion of distributed systems.
Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge University
Press.
Operating System Concepts – 10th Edition 19.11 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Coupling perspective summary
The main difference between these two is that a parallel computing system
consists of multiple processors that communicate with each other using a shared
memory or multicomputers not using shared memory but very close physically
colocated, whereas a distributed computing system contains multiple computers
connected by a communication network that are physically remote.
Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge
University Press.
Operating System Concepts – 10th Edition 4.12 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Coupling perspective summary
▪ Parallelism system
• Shared memory or
• Close physically colocated
IBM's Blue Gene/P massively parallel supercomputer
▪ Distributed system
• Physically remote
Cloud computing
Operating System Concepts – 10th Edition 4.13 Silberschatz, Galvin and Gagne ©2018
Concurrency
Concurrency is a property of a system representing the fact that multiple
activities are executed at the same time (in share time or parallel).
The activities may perform some kind of interaction among them.
They all share the same underlying challenges: providing mechanisms to control
the different flows of execution via coordination and synchronization, while
ensuring consistency.
Operating System Concepts – 10th Edition 4.14 Silberschatz, Galvin and Gagne ©2018
Concurrency
(No real parallelism)
▪ Sequential execution on single-core system + Synchronization
(Real parallelism)
▪ Parallel execution on a multi-core system + Synchronization
Operating System Concepts – 10th Edition 4.15 Silberschatz, Galvin and Gagne ©2018
Concurrency vs. Parallelism
Concurrency is a conceptual property of a program, while parallelism is a
runtime state.[1]
In terms of scheduling, parallelism can only be achieved if the hardware
architecture supports parallel execution, like multi-core or multi-processor
systems do. A single core machine will also be able to execute multiple threads
concurrently, however it can never provide true parallelism.[1]
Operating System Concepts – 10th Edition 4.16 Silberschatz, Galvin and Gagne ©2018
Sequential vs Concurrency
Operating System Concepts – 10th Edition 4.17 Silberschatz, Galvin and Gagne ©2018
Concurrency and Parallelism
Operating System Concepts – 10th Edition 4.18 Silberschatz, Galvin and Gagne ©2018
Concurrency vs. Parallelism
Parallel computing is closely related to concurrent computing—they are
frequently used together, and often conflated, though the two are distinct: it is
possible to have parallelism without concurrency (such as bit-level
parallelism), and concurrency without parallelism (such as multitasking by
time-sharing on a single-core CPU).[1]
[1] "Concurrency is not Parallelism", Waza conference Jan 11, 2012, Rob Pike
Operating System Concepts – 10th Edition 4.19 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Concurrency perspective
Operating System Concepts – 10th Edition 4.21 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Concurrency perspective summary
▪ Parallelism system
• Parallelism
• No concurrency
▪ Distributed system
• Parallelism
• Concurrency
Cloud computing
Operating System Concepts – 10th Edition 4.22 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: summary
▪ Parallelism system
• Shared memory or
• Close physically colocated
• No concurrency
HPC IBM's Blue Gene/P massively parallel
supercomputer
▪ Distributed system
• Physically remote
• Concurrency
Cloud computing
Operating System Concepts – 10th Edition 4.23 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: What for perspective
▪ Parallelism system
• High performance
• High availability
Cloud computing
Operating System Concepts – 10th Edition 4.24 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Who for perspective
▪ Parallelism system
• Scientists
• Business
Cloud computing
Operating System Concepts – 10th Edition 4.25 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
If the operating system is NUMA-aware, it will assign memory closes to the
CPU the thread is running on.
Processor Affinity
Operating System Concepts – 10th Edition 5.26 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
Operating System Concepts – 10th Edition 4.27 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
Operating System Concepts – 10th Edition 4.28 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
Operating System Concepts – 10th Edition 4.30 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
Operating System Concepts – 10th Edition 4.31 Silberschatz, Galvin and Gagne ©2018
Amdahl’s Law
▪ Identifies performance gains from adding additional cores to an application that
has both serial and parallel components
▪ S is serial portion
▪ N processing cores
▪ That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores
results in speedup of 1.6 times
▪ As N approaches infinity, speedup approaches 1 / S
Operating System Concepts – 10th Edition 4.33 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ Parallel computers vary greatly in their organization.
• Processing units may or may not be directly connected one to another
• May share a common memory or local memories
• Processing units may be synchronized by a common clock
• Architectural details and hardware specifics of the components
• Different clock rates, memory access times etc
Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.34 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ Parallel Random Access Machine (PRAM)
• No interconnection network
• Any processing unit can access any
memory location
▪ The variations are
• Exclusive Read Exclusive Write PRAM
(EREW-PRAM)
• Concurrent Read Exclusive Write PRAM
(CREW-PRAM)
• Concurrent Read Concurrent Write PRAM
(CRCW-PRAM)
Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.35 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ The Local-Memory Machine (LMM)
• The LMM model has p processing units,
each with its own local memory.
• Common interconnection network.
Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.36 Silberschatz, Galvin and Gagne ©2018
Programming parallel systems
▪ Two main differences between the shared memory and distributed memory
computer architectures
• The price of communication
• The number of processors that can cooperate efficiently, is in favor of
distributed memory computers
Operating System Concepts – 10th Edition 4.38 Silberschatz, Galvin and Gagne ©2018
Programming parallel systems
Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Operating System Concepts – 10th Edition 4.39 Silberschatz, Galvin and Gagne ©2018
OpenMP
▪ Set of compiler directives and an
API for C, C++, FORTRAN
▪ Provides support for parallel
programming in shared-memory
environments
▪ Identifies parallel regions –
blocks of code that can run in
parallel
#pragma omp parallel
Create as many threads as there are
cores
Operating System Concepts – 10th Edition 4.40 Silberschatz, Galvin and Gagne ©2018
OpenMP
Operating System Concepts – 10th Edition 4.41 Silberschatz, Galvin and Gagne ©2018
OpenMP
▪ Run the for loop in parallel
Operating System Concepts – 10th Edition 4.42 Silberschatz, Galvin and Gagne ©2018
OpenMP
https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-0519-web.pdf
Operating System Concepts – 10th Edition 4.43 Silberschatz, Galvin and Gagne ©2018
Open MPI Project (OpenMPI)
▪ Message Passing Interface (MPI)
▪ Programmers have to be aware that the cooperation among
processes implies the data exchange.
▪ Total execution time = computation + communication
time
▪ Algorithms with only local communication between
neighboring processors are faster and more scalable
▪ Issues related to communication are efficiently solved by the
MPI specification.
▪ The MPI library interface is a specification, not an
implementation.
Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.44 Silberschatz, Galvin and Gagne ©2018
Open MPI
MPI_Reduce is the means by which MPI process can apply a reduction
calculation. The values sent by the MPI processes will be combined using the
reduction operation given and the result will be stored on the MPI process
specified as root.
https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/
Operating System Concepts – 10th Edition 4.45 Silberschatz, Galvin and Gagne ©2018
AWS ParallelCluster
https://docs.aws.amazon.com/parallelcluster/latest/ug/what-is-aws-parallelcluster.html
Operating System Concepts – 10th Edition 4.48 Silberschatz, Galvin and Gagne ©2018