0% found this document useful (0 votes)
87 views

Parallel Computing: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition

Uploaded by

Wes Mena
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Parallel Computing: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition

Uploaded by

Wes Mena
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Parallel computing

Operating System Concepts – 10th Edition Silberschatz, Galvin and Gagne ©2018
Outline
▪ Review
▪ Overview parallelism
• Types of parallelism
– Simple perspective: Three types of parallelism
– Coupling and parallelism perspective
– Concurrency perspective
• Design elements parallel system
– Amdahl’s Law
– Modeling Parallel Computation
• Example NEC Cluster
▪ Programming parallel systems
• Multi-core memory share systems (OpenMP)
• MPI process and messaging (OpenMPI)

Operating System Concepts – 10th Edition 4.2 Silberschatz, Galvin and Gagne ©2018
Multiple-Processor Scheduling

▪ Symmetric multiprocessing (SMP) is where each processor is self


scheduling.
▪ All threads may be in a common ready queue (a)
▪ Each processor may have its own private queue of threads (b)

Operating System Concepts – 10th Edition 5.3 Silberschatz, Galvin and Gagne ©2018
Multithreaded Multicore System
▪ Chip-multithreading (CMT)
assigns each core multiple
hardware threads. (Intel refers to
this as hyperthreading.)

▪ On a quad-core system with 2


hardware threads per core, the
operating system sees 8 logical
processors.

Operating System Concepts – 10th Edition 5.4 Silberschatz, Galvin and Gagne ©2018
Multicore Programming
▪ Types of parallelism
• Data parallelism – distributes subsets of the same data across
multiple cores, same operation on each
• Task parallelism – distributing threads across cores, each thread
performing unique operation

Operating System Concepts – 10th Edition 4.5 Silberschatz, Galvin and Gagne ©2018
Data and Task Parallelism

Operating System Concepts – 10th Edition 4.6 Silberschatz, Galvin and Gagne ©2018
Simple perspective: Three types of parallelism

1. Shared memory systems, i.e., systems with multiple processing units


attached to a single memory.
2. Distributed systems, i.e., systems consisting of many computer units, each
with its own processing unit and its physical memory, that are connected
with fast interconnection networks.
3. Graphic processor units used as co-processors for solving general-purpose
numerically intensive problems.

Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.7 Silberschatz, Galvin and Gagne ©2018
Multicomputers vs multiprocessor parallel systems

Multicomputer parallel system Multiprocessor parallel system

Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Operating System Concepts – 10th Edition 4.8 Silberschatz, Galvin and Gagne ©2018
Coupling and parallelism perspective
▪ Tightly coupled multiprocessors (with UMA shared memory). These may
be either switch-based (e.g., NYU Ultracomputer, RP3) or bus-based (e.g.,
Sequent, Encore).
▪ Tightly coupled multiprocessors (with NUMA shared memory or that
communicate by message passing). Examples are the SGI Origin 2000 and
the Sun Ultra HPC servers (that communicate via NUMA shared memory),
and the hypercube and the torus (that communicate by message passing).

Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge University
Press.
Operating System Concepts – 10th Edition 4.9 Silberschatz, Galvin and Gagne ©2018
NUMA and CPU Scheduling
If the operating system is NUMA-aware, it will assign memory closes to the
CPU the thread is running on.

Processor Affinity

Operating System Concepts – 10th Edition 5.10 Silberschatz, Galvin and Gagne ©2018
Coupling and parallelism perspective
▪ Loosely coupled multicomputers (without shared memory) physically
colocated. These may be bus-based (e.g., NOW connected by a LAN or
Myrinet card) or using a more general communication network, and the
processors may be heterogeneous. In such systems, processors neither
share memory nor have a common clock, and hence may be classified as
distributed systems – however, the processors are very close to one
another, which is characteristic of a parallel system.
▪ Loosely coupled multicomputers (without shared memory and without
common clock) that are physically remote. These correspond to the
conventional notion of distributed systems.

Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge University
Press.
Operating System Concepts – 10th Edition 19.11 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Coupling perspective summary

The main difference between these two is that a parallel computing system
consists of multiple processors that communicate with each other using a shared
memory or multicomputers not using shared memory but very close physically
colocated, whereas a distributed computing system contains multiple computers
connected by a communication network that are physically remote.

Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Ajay D. Kshemkalyani, Mukesh Singhal (2008). Distributed Computing: Principles, Algorithms, and Systems. Cambridge
University Press.
Operating System Concepts – 10th Edition 4.12 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Coupling perspective summary

▪ Parallelism system
• Shared memory or
• Close physically colocated
IBM's Blue Gene/P massively parallel supercomputer

▪ Distributed system
• Physically remote

Cloud computing

Operating System Concepts – 10th Edition 4.13 Silberschatz, Galvin and Gagne ©2018
Concurrency
Concurrency is a property of a system representing the fact that multiple
activities are executed at the same time (in share time or parallel).
The activities may perform some kind of interaction among them.
They all share the same underlying challenges: providing mechanisms to control
the different flows of execution via coordination and synchronization, while
ensuring consistency.

[1] April, B. E. (2012). Concurrent Programming for Scalable Web Architectures

Operating System Concepts – 10th Edition 4.14 Silberschatz, Galvin and Gagne ©2018
Concurrency
(No real parallelism)
▪ Sequential execution on single-core system + Synchronization

(Real parallelism)
▪ Parallel execution on a multi-core system + Synchronization

Operating System Concepts – 10th Edition 4.15 Silberschatz, Galvin and Gagne ©2018
Concurrency vs. Parallelism
Concurrency is a conceptual property of a program, while parallelism is a
runtime state.[1]
In terms of scheduling, parallelism can only be achieved if the hardware
architecture supports parallel execution, like multi-core or multi-processor
systems do. A single core machine will also be able to execute multiple threads
concurrently, however it can never provide true parallelism.[1]

[1] April, B. E. (2012). Concurrent Programming for Scalable Web Architectures

Operating System Concepts – 10th Edition 4.16 Silberschatz, Galvin and Gagne ©2018
Sequential vs Concurrency

Operating System Concepts – 10th Edition 4.17 Silberschatz, Galvin and Gagne ©2018
Concurrency and Parallelism

Operating System Concepts – 10th Edition 4.18 Silberschatz, Galvin and Gagne ©2018
Concurrency vs. Parallelism
Parallel computing is closely related to concurrent computing—they are
frequently used together, and often conflated, though the two are distinct: it is
possible to have parallelism without concurrency (such as bit-level
parallelism), and concurrency without parallelism (such as multitasking by
time-sharing on a single-core CPU).[1]

[1] "Concurrency is not Parallelism", Waza conference Jan 11, 2012, Rob Pike

Operating System Concepts – 10th Edition 4.19 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Concurrency perspective

In parallel computing, a computational task is typically broken down into


several, often many, very similar sub-tasks that can be processed independently
and whose results are combined afterwards, upon completion. In contrast, in
concurrent computing, the various processes often do not address related tasks;
when they do, as is typical in distributed computing, the separate tasks may
have a varied nature and often require some inter-process communication
during execution. [2]
Distributed systems are inherently concurrent and parallel, thus concurrency
control is also essential (Synchronization). [1]

[1] April, B. E. (2012). Concurrent Programming for Scalable Web Architectures


[2] "Parallelism vs. Concurrency". Haskell Wiki
Operating System Concepts – 10th Edition 4.20 Silberschatz, Galvin and Gagne ©2018
Parallelism systems

Operating System Concepts – 10th Edition 4.21 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Concurrency perspective summary

▪ Parallelism system
• Parallelism
• No concurrency

IBM's Blue Gene/P massively parallel supercomputer

▪ Distributed system
• Parallelism
• Concurrency

Cloud computing

Operating System Concepts – 10th Edition 4.22 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: summary

▪ Parallelism system
• Shared memory or
• Close physically colocated
• No concurrency
HPC IBM's Blue Gene/P massively parallel
supercomputer

▪ Distributed system
• Physically remote
• Concurrency
Cloud computing

Operating System Concepts – 10th Edition 4.23 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: What for perspective

▪ Parallelism system
• High performance

HPC IBM's Blue Gene/P massively parallel


▪ Distributed system supercomputer

• High availability

Cloud computing

Operating System Concepts – 10th Edition 4.24 Silberschatz, Galvin and Gagne ©2018
Parallel vs distributed: Who for perspective

▪ Parallelism system
• Scientists

HPC IBM's Blue Gene/P massively parallel


▪ Distributed system supercomputer

• Business

Cloud computing

Operating System Concepts – 10th Edition 4.25 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?
If the operating system is NUMA-aware, it will assign memory closes to the
CPU the thread is running on.

Processor Affinity

Operating System Concepts – 10th Edition 5.26 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?

Operating System Concepts – 10th Edition 4.27 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?

Operating System Concepts – 10th Edition 4.28 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?

Operating System Concepts – 10th Edition 4.30 Silberschatz, Galvin and Gagne ©2018
Práctica: ¿Sistema paralelo o distribuido?

Operating System Concepts – 10th Edition 4.31 Silberschatz, Galvin and Gagne ©2018
Amdahl’s Law
▪ Identifies performance gains from adding additional cores to an application that
has both serial and parallel components
▪ S is serial portion
▪ N processing cores

▪ That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores
results in speedup of 1.6 times
▪ As N approaches infinity, speedup approaches 1 / S

Serial portion of an application has disproportionate effect on


performance gained by adding additional cores
Operating System Concepts – 10th Edition 4.32 Silberschatz, Galvin and Gagne ©2018
Amdahl’s Law

Operating System Concepts – 10th Edition 4.33 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ Parallel computers vary greatly in their organization.
• Processing units may or may not be directly connected one to another
• May share a common memory or local memories
• Processing units may be synchronized by a common clock
• Architectural details and hardware specifics of the components
• Different clock rates, memory access times etc

Which properties of parallel computers must be considered and which may be


ignored in the design and analysis of parallel algorithms?

Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.34 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ Parallel Random Access Machine (PRAM)
• No interconnection network
• Any processing unit can access any
memory location
▪ The variations are
• Exclusive Read Exclusive Write PRAM
(EREW-PRAM)
• Concurrent Read Exclusive Write PRAM
(CREW-PRAM)
• Concurrent Read Concurrent Write PRAM
(CRCW-PRAM)

Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.35 Silberschatz, Galvin and Gagne ©2018
Modeling Parallel Computation
▪ The Local-Memory Machine (LMM)
• The LMM model has p processing units,
each with its own local memory.
• Common interconnection network.

▪ The Memory-Module Machine (MMM)


• Consists of p processing units and m
memory modules each of which can be
accessed by any processing unit via a
common interconnection network.
• There are no local memories to processing
units.

Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.36 Silberschatz, Galvin and Gagne ©2018
Programming parallel systems
▪ Two main differences between the shared memory and distributed memory
computer architectures
• The price of communication
• The number of processors that can cooperate efficiently, is in favor of
distributed memory computers

Operating System Concepts – 10th Edition 4.38 Silberschatz, Galvin and Gagne ©2018
Programming parallel systems

Multicomputer parallel system Multiprocessor parallel system

MPI process and messaging (OpenMPI) Multi-core memory share


systems (OpenMP)

Raja Malleswara, Rao Pattamsetti (2017). Distributed Computing in Java 9. Packt Publishing.
Operating System Concepts – 10th Edition 4.39 Silberschatz, Galvin and Gagne ©2018
OpenMP
▪ Set of compiler directives and an
API for C, C++, FORTRAN
▪ Provides support for parallel
programming in shared-memory
environments
▪ Identifies parallel regions –
blocks of code that can run in
parallel
#pragma omp parallel
Create as many threads as there are
cores

Operating System Concepts – 10th Edition 4.40 Silberschatz, Galvin and Gagne ©2018
OpenMP

Operating System Concepts – 10th Edition 4.41 Silberschatz, Galvin and Gagne ©2018
OpenMP
▪ Run the for loop in parallel

Operating System Concepts – 10th Edition 4.42 Silberschatz, Galvin and Gagne ©2018
OpenMP

https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-0519-web.pdf
Operating System Concepts – 10th Edition 4.43 Silberschatz, Galvin and Gagne ©2018
Open MPI Project (OpenMPI)
▪ Message Passing Interface (MPI)
▪ Programmers have to be aware that the cooperation among
processes implies the data exchange.
▪ Total execution time = computation + communication
time
▪ Algorithms with only local communication between
neighboring processors are faster and more scalable
▪ Issues related to communication are efficiently solved by the
MPI specification.
▪ The MPI library interface is a specification, not an
implementation.

Trobec, R., Slivnik, B., Bulić, P., & Robič, B. (2018). Undergraduate Topics in Computer Science Introduction to Parallel
Computing.
Operating System Concepts – 10th Edition 4.44 Silberschatz, Galvin and Gagne ©2018
Open MPI
MPI_Reduce is the means by which MPI process can apply a reduction
calculation. The values sent by the MPI processes will be combined using the
reduction operation given and the result will be stored on the MPI process
specified as root.

https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/

Operating System Concepts – 10th Edition 4.45 Silberschatz, Galvin and Gagne ©2018
AWS ParallelCluster

AWS ParallelCluster is an AWS-supported open source cluster management tool


that helps you to deploy and manage High Performance Computing (HPC)
clusters in the AWS Cloud. Built on the open source CfnCluster project, AWS
ParallelCluster enables you to quickly build an HPC compute environment in
AWS. It automatically sets up the required compute resources and shared
filesystem. You can use AWS ParallelCluster with a batch schedulers, such as
AWS Batch and Slurm. AWS ParallelCluster facilitates quick start proof of
concept deployments and production deployments. You can also build higher
level workflows, such as a genomics portal that automates an entire DNA
sequencing workflow, on top of AWS ParallelCluster.

https://docs.aws.amazon.com/parallelcluster/latest/ug/what-is-aws-parallelcluster.html
Operating System Concepts – 10th Edition 4.48 Silberschatz, Galvin and Gagne ©2018

You might also like