0% found this document useful (0 votes)

88 views29 pages

Understanding Parallel Computing Basics

High Performance Computing (HPC) utilizes multiple processors working concurrently to solve large problems faster. The increasing demands of applications and limitations of sequential processing due to physical constraints have made parallel computing inevitable. Key terminology includes core, multicore, task, speedup, efficiency, and Amdahl's law, which is used to predict maximum speedup using multiple processors based on the sequential fraction of a problem. Parallelism can be expressed in programs implicitly through defined tasks or explicitly through programming models that specify the coordination of parallel elements.

Uploaded by

Fatma mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views29 pages

Understanding Parallel Computing Basics

Uploaded by

Fatma mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

High Performance

Computing
LECTURE #2

1
Agenda
o What is parallel computing?
o Why Parallel Computers?
o Motivation
o Inevitability of parallel computing
o Application demands
o Technology and architecture trends
oTerminologies
o How is parallelism expressed in a program
o Challenges
2
What is parallel computing?
Multiple processors cooperating concurrently to solve one problem.

3
What is parallel computing?

“A parallel computer is a collection of processing elements that can

communicate and cooperate to solve large problems fast”
Almasi/Gottlieb

“communicate and cooperate”

• Nodes and interconnect architecture
• Problem partitioning (Co-ordination of events in a process)

“large problems fast”

• Programming model
• Match of model and architecture

4
What is parallel computing?

Some broad issues:

• Resource Allocation:
– How large a collection?
– How powerful are the elements?

• Data access, Communication and Synchronization

– How are data transmitted between processors?
– How do the elements cooperate and communicate?
– What are the abstractions and primitives for cooperation?

• Performance and Scalability

– How does it all translate into performance?
– How does it scale? A service is said to be scalable when ??
5
Why Parallel Computer?
• Tremendous advances in microprocessor technology,
ex: clock rates of processors increased from 40MHz (e.g MIPS R3000, 1988)
to 2.0 GHz (e.g pentium 4, 2002)
to nowadays, 8.429GHz (AMD's Bulldozer based FX chips, 2012)

• Processor are now capable of executing multiple instruction in the same cycle

◦ The fundamental sequence of steps that a CPU performs. Also known as the "fetch-execute
cycle," it is the time in which a single instruction is fetched from memory, decoded and
executed.

◦ The first half of the cycle transfers the instruction from memory to the instruction register
and decodes it. The second half executes the instruction
6
Why parallel computing?

• The ability of memory system to feed data to processor at required rate

increased

• In addition, significant innovations in architecture and software have

addressed the mitigation of bottlenecks posed by data-Path and memory

◦ Hence, multiplicity of data-paths to increase access to storage elements (memory & disk)

7
Motivation
• Sequential architectures reaching physical limitation.

• Uniprocessor architectures will not be able to sustain the rate of performance

increments in the future.

• Computation requirements are ever increasing

-- visualization, distributed databases,
-- simulations, scientific prediction (earthquake), etc.
• Accelerating applications.

8
Inevitability of parallel computing
Application demand for performance
• Scientific: weather forecasting, pharmaceutical design, genomics
• Commercial: OLTP, search engine, decision support, data mining
• Scalable web servers

Technology and architecture trends

• limits to sequential CPU, memory, storage performance
• parallelism is an effective way of utilizing growing number of transistors.
• low incremental cost of supporting parallelism

9
Application Demand: Inevitability of parallel Computing
Engineering Computers
• Earthquake and structural modeling • Embedded systems increasingly rely on
distributed control algorithms.
• Design and simulation of micro- and nano-scale
systems. •Network intrusion detection, cryptography, etc.
•Optimizing performance of modern automobile.
Computational Sciences •Networks, mail-servers, search engines…
• Bioinformatics: Functional and structural •Visualization architectures & entertainment
characterization of genes and proteins.
•Simulation Traditional scientific and engineering
• Astrophysics: exploring the evolution of galaxies. paradigm:
 1) Do theory or paper design.
• Weather modeling, flood/tornado prediction..  2) Perform experiments or build system.

Limitations:
Commercial
– Too difficult -- build large wind tunnels.
• Data mining and analysis for optimizing business – Too expensive -- build a throw-away passenger jet.
and marketing decisions.
– Too slow -- wait for climate or galactic evolution.
• Database and Web servers for online transaction
– Too dangerous -- weapons, drug design, climate
processing experimentation. 10
Technology and architecture

1- Processor Capacity

11
2- Transistor Count

40% more functions can be performed by a CPU per year

Fundamentally, the use of more transistors improves performance in two ways:
◦ Parallelism: multiple operations done at once (less processing time)
◦ Locality: data references performed close to the processor (less memory latency)
12
3- Clock Rate

30% per year ---> today’s PC is yesterday’s Supercomputer

13
4- Similar Story for Memory and Disk
❖ Divergence between memory capacity and speed
o Capacity increased by 1000X from 1980-95, speed only 2X
o Larger memories are slower, while processors get faster “memory wall”
- Need to transfer more data in parallel
- Need deeper cache hierarchies
- Parallelism helps hide memory latency

❖Parallelism within memory systems too

o New designs fetch many bits within memory chip, follow with fast pipelined
transfer across narrower interface

14
5- Role of Architecture
Greatest trend in VLSI is an increase in the exploited parallelism
• Up to 1985: bit level parallelism:
– 4-bit -> 8 bit -> 16-bit – slows after 32 bit

• Mid 80s to mid 90s: Instruction Level Parallelism (ILP)

– pipelining and simple instruction sets (RISC)
– on-chip caches and functional units => superscalar execution
– Greater sophistication: out of order execution, speculation

• Nowadays:
– Hyper-threading
– Multi-core
15
❖ Definition
High-performance computing (HPC) is the use of parallel processing for
running advanced application programs efficiently, reliably and quickly.

17
Conclusions
• The hardware evolution, driven by Moore’s law, was geared toward two
things:
– exploiting parallelism
– Dealing with memory (latency, capacity)

18
Terminologies
❑Core a single computing unit with its own independent control

❑ Multicore is a processor having several cores that can access the same memory
concurrently

❑ A computation is decomposed into several parts called Tasks that can be computed
in parallel

❑Finding enough parallelism is (one of the) critical steps for high performance
(Amdahl’s law).

19
Performance Metrics

❑ Execution time:
The time elapsed between the beginning and the end of its execution.

❑Speedup:
The ration between serial and parallel time.
Speedup= Ts/Tp

❑ Efficiency:
Ratio of speedup to the number of processors.
Efficiency= Speedup/P
20
❑ Amdahl’s Law
Used to predict maximum speedup using multiple processors.
• Let f = fraction of work performed sequentially.
• (1 - f) = fraction of work that is parallelizable.
• P = number of processors
On 1 cpu: T1 = f + (1 – f ) = 1.
(1−𝑓)
On P processors: Tp = f +
𝑝

• Speedup
𝑇1 1 1
= <
𝑇𝑝 𝑓+(1−𝑓)/𝑝 𝑓

Speedup limited by sequential part

21
How is parallelism expressed in a
program
IMPLICITLY EXPLICITLY
❑ Define tasks only, rest implied; or ❑ Define tasks, work decomposition,
define tasks and work decomposition data decomposition, communication,
rest implied; synchronization.

❑ OpenMP is a high-level parallel ❑MPI is a library for fully explicit

programming model, which is mostly an parallelization.
implicit model.

23
1- IMPLICITLY
❑ It is a characteristic of a programming language that allows a compiler or interpreter to
automatically exploit the parallelism inherent to the computations expressed by some of the
language's constructs.

❑ A pure implicitly parallel language does not need special directives, operators or functions
to enable parallel execution.

❑ Programming languages with implicit parallelism include Axum, HPF, Id, LabVIEW, MATLAB
M-code,

❑ Example: taking the sine or logarithm of a group of numbers, a language that provides
implicit parallelism might allow the programmer to write the instruction thus:

24
Advantages Disadvantages

❑A programmer does not need to worry ❑It reduce the control that the
about task division or process programmer has over the parallel
communication, execution of the program,

❑ focusing instead in the problem that ❑resulting sometimes in less-than-optimal

his or her program is intended to solve. parallel efficiency.

❑It generally facilitates the design of ❑ Sometimes debugging is difficult.

parallel programs .

25
2- EXPLICITLY
How is parallelism expressed in a program
❑it is the representation of concurrent computations by means of primitives in the form
of special-purpose directives or function calls.

❑Most parallel primitives are related to process synchronization, communication or task

partitioning.

Advantages Disadvantages

❑ The absolute programmer control ❑ programming with explicit parallelism

over the parallel execution. is often difficult, especially for non
computing specialists,

❑ A skilled parallel programmer takes

advantage of explicit parallelism to ❑because of the extra work involved in
produce very efficient code. planning the task division and
synchronization of concurrent
26
What is Think - Different
How many people doing the work → (Degree of Parallelism)

What is needed to begin the work → (Initialization)

Who does what → (Work distribution)

Access to work part → (Data/IO access)

Whether they need info from each other to finish their own job → (Communication)

When are they all done → (Synchronization)

What needs to be done to collate the result

27
Challenges
All parallel programs contain:
❑ Parallel sections
❑ Serial sections
❑Serial sections are with work is being duplicated or no useful work is being done, (waiting
for others)

Building efficient algorithms avoiding:

❑ Communication delay
❑ Idling
❑ Synchronization

28
Sources of overhead in parallel programs
❑ Inter process interaction:
The time spent communicating data between processing elements is
usually the most significant source of parallel processing overhead.

❑ Idling:
Processes may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a
program.

❑ Excess Computation:
The fastest known sequential algorithm for a problem may be difficult
or impossible to parallelize, forcing us to use a parallel algorithm
based on a poorer but easily parallelizable sequential algorithm.

VLSI Engineer with Internship Experience
No ratings yet
VLSI Engineer with Internship Experience
2 pages
Indian Desktop PC Brands Overview
No ratings yet
Indian Desktop PC Brands Overview
1 page
Contoh Sistem Operasi Mini 2440
No ratings yet
Contoh Sistem Operasi Mini 2440
9 pages
TPS56C215: 12A Buck Converter Specs
No ratings yet
TPS56C215: 12A Buck Converter Specs
39 pages
Labo K Effects for Neve V Series Rack
No ratings yet
Labo K Effects for Neve V Series Rack
12 pages
2720 Fold (RM-519 RM-520) SM L1&L2 v1.0 PDF
No ratings yet
2720 Fold (RM-519 RM-520) SM L1&L2 v1.0 PDF
19 pages
ASD10 10W DC/DC Converter Overview
No ratings yet
ASD10 10W DC/DC Converter Overview
4 pages
Basic Structure of Embedded Systems
No ratings yet
Basic Structure of Embedded Systems
13 pages
2SA928A PNP Silicon Transistor Specs
No ratings yet
2SA928A PNP Silicon Transistor Specs
4 pages
Simple Series-Cathode Modulator Guide
No ratings yet
Simple Series-Cathode Modulator Guide
1 page
12V to 220V Mini Inverter Project Report
No ratings yet
12V to 220V Mini Inverter Project Report
34 pages
How Inverters Operate in AC Systems
No ratings yet
How Inverters Operate in AC Systems
6 pages
Microprocessor and Microcontroller Basics
No ratings yet
Microprocessor and Microcontroller Basics
91 pages
System and Storage Units Overview
No ratings yet
System and Storage Units Overview
7 pages
Question Bank (I Scheme)
No ratings yet
Question Bank (I Scheme)
38 pages
MTL5541 Repeater Power Supply Datasheet
No ratings yet
MTL5541 Repeater Power Supply Datasheet
1 page
LM723/LM723C Voltage Regulator Guide
No ratings yet
LM723/LM723C Voltage Regulator Guide
21 pages
Danzka Dubra 08 Power System Design
No ratings yet
Danzka Dubra 08 Power System Design
69 pages
Understanding Software Agents in E-Commerce
No ratings yet
Understanding Software Agents in E-Commerce
46 pages
KDC-9090R CD Receiver Service Manual
No ratings yet
KDC-9090R CD Receiver Service Manual
22 pages
PAW3205DB-TJ3T Wireless Mouse Sensor
No ratings yet
PAW3205DB-TJ3T Wireless Mouse Sensor
5 pages
Gamepad X3 User Manual
No ratings yet
Gamepad X3 User Manual
1 page
Internet Related Terms:: Windows - Wide Interactive Network For Development of Office Work Solution
No ratings yet
Internet Related Terms:: Windows - Wide Interactive Network For Development of Office Work Solution
2 pages
Key Questions in Computer Architecture
No ratings yet
Key Questions in Computer Architecture
5 pages
FPGA Notes
No ratings yet
FPGA Notes
19 pages
User'S Guide: Using Position Manager Biss-C Library On Iddk Hardware
No ratings yet
User'S Guide: Using Position Manager Biss-C Library On Iddk Hardware
19 pages
PV Materials and Electrical Characteristics
No ratings yet
PV Materials and Electrical Characteristics
102 pages
BCX70J, K NPN Transistor Datasheet
No ratings yet
BCX70J, K NPN Transistor Datasheet
4 pages
Battery Replacement Safety Guidelines
No ratings yet
Battery Replacement Safety Guidelines
96 pages
Online Practical Guidelines for B.Sc. Electronics
No ratings yet
Online Practical Guidelines for B.Sc. Electronics
12 pages

Understanding Parallel Computing Basics

Uploaded by

Understanding Parallel Computing Basics

Uploaded by

High Performance

“A parallel computer is a collection of processing elements that can

“communicate and cooperate”

“large problems fast”

Some broad issues:

• Data access, Communication and Synchronization

• Performance and Scalability

• The ability of memory system to feed data to processor at required rate

• In addition, significant innovations in architecture and software have

• Uniprocessor architectures will not be able to sustain the rate of performance

• Computation requirements are ever increasing

Technology and architecture trends

40% more functions can be performed by a CPU per year

30% per year ---> today’s PC is yesterday’s Supercomputer

❖Parallelism within memory systems too

• Mid 80s to mid 90s: Instruction Level Parallelism (ILP)

Speedup limited by sequential part

❑ OpenMP is a high-level parallel ❑MPI is a library for fully explicit

❑ focusing instead in the problem that ❑resulting sometimes in less-than-optimal

❑It generally facilitates the design of ❑ Sometimes debugging is difficult.

❑Most parallel primitives are related to process synchronization, communication or task

❑ The absolute programmer control ❑ programming with explicit parallelism

❑ A skilled parallel programmer takes

What is needed to begin the work → (Initialization)

Who does what → (Work distribution)

Access to work part → (Data/IO access)

When are they all done → (Synchronization)

What needs to be done to collate the result

Building efficient algorithms avoiding:

You might also like