0% found this document useful (0 votes)
47 views

02 - Lecture #2

High Performance Computing (HPC) utilizes multiple processors working concurrently to solve large problems faster. The increasing demands of applications and limitations of sequential processing due to physical constraints have made parallel computing inevitable. Key terminology includes core, multicore, task, speedup, efficiency, and Amdahl's law, which is used to predict maximum speedup using multiple processors based on the sequential fraction of a problem. Parallelism can be expressed in programs implicitly through defined tasks or explicitly through programming models that specify the coordination of parallel elements.

Uploaded by

Fatma mansour
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

02 - Lecture #2

High Performance Computing (HPC) utilizes multiple processors working concurrently to solve large problems faster. The increasing demands of applications and limitations of sequential processing due to physical constraints have made parallel computing inevitable. Key terminology includes core, multicore, task, speedup, efficiency, and Amdahl's law, which is used to predict maximum speedup using multiple processors based on the sequential fraction of a problem. Parallelism can be expressed in programs implicitly through defined tasks or explicitly through programming models that specify the coordination of parallel elements.

Uploaded by

Fatma mansour
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

High Performance

Computing
LECTURE #2

1
Agenda
o What is parallel computing?
o Why Parallel Computers?
o Motivation
o Inevitability of parallel computing
o Application demands
o Technology and architecture trends
oTerminologies
o How is parallelism expressed in a program
o Challenges
2
What is parallel computing?
Multiple processors cooperating concurrently to solve one problem.

3
What is parallel computing?

“A parallel computer is a collection of processing elements that can


communicate and cooperate to solve large problems fast”
Almasi/Gottlieb

“communicate and cooperate”


• Nodes and interconnect architecture
• Problem partitioning (Co-ordination of events in a process)

“large problems fast”


• Programming model
• Match of model and architecture

4
What is parallel computing?

Some broad issues:


• Resource Allocation:
– How large a collection?
– How powerful are the elements?

• Data access, Communication and Synchronization


– How are data transmitted between processors?
– How do the elements cooperate and communicate?
– What are the abstractions and primitives for cooperation?

• Performance and Scalability


– How does it all translate into performance?
– How does it scale? A service is said to be scalable when ??
5
Why Parallel Computer?
• Tremendous advances in microprocessor technology,
ex: clock rates of processors increased from 40MHz (e.g MIPS R3000, 1988)
to 2.0 GHz (e.g pentium 4, 2002)
to nowadays, 8.429GHz (AMD's Bulldozer based FX chips, 2012)

• Processor are now capable of executing multiple instruction in the same cycle

◦ The fundamental sequence of steps that a CPU performs. Also known as the "fetch-execute
cycle," it is the time in which a single instruction is fetched from memory, decoded and
executed.

◦ The first half of the cycle transfers the instruction from memory to the instruction register
and decodes it. The second half executes the instruction
6
Why parallel computing?

• The ability of memory system to feed data to processor at required rate


increased

• In addition, significant innovations in architecture and software have


addressed the mitigation of bottlenecks posed by data-Path and memory

◦ Hence, multiplicity of data-paths to increase access to storage elements (memory & disk)

7
Motivation
• Sequential architectures reaching physical limitation.

• Uniprocessor architectures will not be able to sustain the rate of performance


increments in the future.

• Computation requirements are ever increasing


-- visualization, distributed databases,
-- simulations, scientific prediction (earthquake), etc.
• Accelerating applications.

8
Inevitability of parallel computing
Application demand for performance
• Scientific: weather forecasting, pharmaceutical design, genomics
• Commercial: OLTP, search engine, decision support, data mining
• Scalable web servers

Technology and architecture trends


• limits to sequential CPU, memory, storage performance
• parallelism is an effective way of utilizing growing number of transistors.
• low incremental cost of supporting parallelism

9
Application Demand: Inevitability of parallel Computing
Engineering Computers
• Earthquake and structural modeling • Embedded systems increasingly rely on
distributed control algorithms.
• Design and simulation of micro- and nano-scale
systems. •Network intrusion detection, cryptography, etc.
•Optimizing performance of modern automobile.
Computational Sciences •Networks, mail-servers, search engines…
• Bioinformatics: Functional and structural •Visualization architectures & entertainment
characterization of genes and proteins.
•Simulation Traditional scientific and engineering
• Astrophysics: exploring the evolution of galaxies. paradigm:
 1) Do theory or paper design.
• Weather modeling, flood/tornado prediction..  2) Perform experiments or build system.

Limitations:
Commercial
– Too difficult -- build large wind tunnels.
• Data mining and analysis for optimizing business – Too expensive -- build a throw-away passenger jet.
and marketing decisions.
– Too slow -- wait for climate or galactic evolution.
• Database and Web servers for online transaction
– Too dangerous -- weapons, drug design, climate
processing experimentation. 10
Technology and architecture

1- Processor Capacity

11
2- Transistor Count

40% more functions can be performed by a CPU per year


Fundamentally, the use of more transistors improves performance in two ways:
◦ Parallelism: multiple operations done at once (less processing time)
◦ Locality: data references performed close to the processor (less memory latency)
12
3- Clock Rate

30% per year ---> today’s PC is yesterday’s Supercomputer

13
4- Similar Story for Memory and Disk
❖ Divergence between memory capacity and speed
o Capacity increased by 1000X from 1980-95, speed only 2X
o Larger memories are slower, while processors get faster “memory wall”
- Need to transfer more data in parallel
- Need deeper cache hierarchies
- Parallelism helps hide memory latency

❖Parallelism within memory systems too


o New designs fetch many bits within memory chip, follow with fast pipelined
transfer across narrower interface

14
5- Role of Architecture
Greatest trend in VLSI is an increase in the exploited parallelism
• Up to 1985: bit level parallelism:
– 4-bit -> 8 bit -> 16-bit – slows after 32 bit

• Mid 80s to mid 90s: Instruction Level Parallelism (ILP)


– pipelining and simple instruction sets (RISC)
– on-chip caches and functional units => superscalar execution
– Greater sophistication: out of order execution, speculation

• Nowadays:
– Hyper-threading
– Multi-core
15
❖ Definition
High-performance computing (HPC) is the use of parallel processing for
running advanced application programs efficiently, reliably and quickly.

17
Conclusions
• The hardware evolution, driven by Moore’s law, was geared toward two
things:
– exploiting parallelism
– Dealing with memory (latency, capacity)

18
Terminologies
❑Core a single computing unit with its own independent control

❑ Multicore is a processor having several cores that can access the same memory
concurrently

❑ A computation is decomposed into several parts called Tasks that can be computed
in parallel

❑Finding enough parallelism is (one of the) critical steps for high performance
(Amdahl’s law).

19
Performance Metrics

❑ Execution time:
The time elapsed between the beginning and the end of its execution.

❑Speedup:
The ration between serial and parallel time.
Speedup= Ts/Tp

❑ Efficiency:
Ratio of speedup to the number of processors.
Efficiency= Speedup/P
20
❑ Amdahl’s Law
Used to predict maximum speedup using multiple processors.
• Let f = fraction of work performed sequentially.
• (1 - f) = fraction of work that is parallelizable.
• P = number of processors
On 1 cpu: T1 = f + (1 – f ) = 1.
(1−𝑓)
On P processors: Tp = f +
𝑝

• Speedup
𝑇1 1 1
= <
𝑇𝑝 𝑓+(1−𝑓)/𝑝 𝑓

Speedup limited by sequential part

21
How is parallelism expressed in a
program
IMPLICITLY EXPLICITLY
❑ Define tasks only, rest implied; or ❑ Define tasks, work decomposition,
define tasks and work decomposition data decomposition, communication,
rest implied; synchronization.

❑ OpenMP is a high-level parallel ❑MPI is a library for fully explicit


programming model, which is mostly an parallelization.
implicit model.

23
1- IMPLICITLY
❑ It is a characteristic of a programming language that allows a compiler or interpreter to
automatically exploit the parallelism inherent to the computations expressed by some of the
language's constructs.

❑ A pure implicitly parallel language does not need special directives, operators or functions
to enable parallel execution.

❑ Programming languages with implicit parallelism include Axum, HPF, Id, LabVIEW, MATLAB
M-code,

❑ Example: taking the sine or logarithm of a group of numbers, a language that provides
implicit parallelism might allow the programmer to write the instruction thus:

24
Advantages Disadvantages

❑A programmer does not need to worry ❑It reduce the control that the
about task division or process programmer has over the parallel
communication, execution of the program,

❑ focusing instead in the problem that ❑resulting sometimes in less-than-optimal


his or her program is intended to solve. parallel efficiency.

❑It generally facilitates the design of ❑ Sometimes debugging is difficult.


parallel programs .

25
2- EXPLICITLY
How is parallelism expressed in a program
❑it is the representation of concurrent computations by means of primitives in the form
of special-purpose directives or function calls.

❑Most parallel primitives are related to process synchronization, communication or task


partitioning.

Advantages Disadvantages

❑ The absolute programmer control ❑ programming with explicit parallelism


over the parallel execution. is often difficult, especially for non
computing specialists,

❑ A skilled parallel programmer takes


advantage of explicit parallelism to ❑because of the extra work involved in
produce very efficient code. planning the task division and
synchronization of concurrent
26
What is Think - Different
How many people doing the work → (Degree of Parallelism)

What is needed to begin the work → (Initialization)

Who does what → (Work distribution)

Access to work part → (Data/IO access)

Whether they need info from each other to finish their own job → (Communication)

When are they all done → (Synchronization)

What needs to be done to collate the result

27
Challenges
All parallel programs contain:
❑ Parallel sections
❑ Serial sections
❑Serial sections are with work is being duplicated or no useful work is being done, (waiting
for others)

Building efficient algorithms avoiding:


❑ Communication delay
❑ Idling
❑ Synchronization

28
Sources of overhead in parallel programs
❑ Inter process interaction:
The time spent communicating data between processing elements is
usually the most significant source of parallel processing overhead.

❑ Idling:
Processes may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a
program.

❑ Excess Computation:
The fastest known sequential algorithm for a problem may be difficult
or impossible to parallelize, forcing us to use a parallel algorithm
based on a poorer but easily parallelizable sequential algorithm.

29

You might also like