02 - Lecture #2
02 - Lecture #2
Computing
LECTURE #2
1
Agenda
o What is parallel computing?
o Why Parallel Computers?
o Motivation
o Inevitability of parallel computing
o Application demands
o Technology and architecture trends
oTerminologies
o How is parallelism expressed in a program
o Challenges
2
What is parallel computing?
Multiple processors cooperating concurrently to solve one problem.
3
What is parallel computing?
4
What is parallel computing?
• Processor are now capable of executing multiple instruction in the same cycle
◦ The fundamental sequence of steps that a CPU performs. Also known as the "fetch-execute
cycle," it is the time in which a single instruction is fetched from memory, decoded and
executed.
◦ The first half of the cycle transfers the instruction from memory to the instruction register
and decodes it. The second half executes the instruction
6
Why parallel computing?
◦ Hence, multiplicity of data-paths to increase access to storage elements (memory & disk)
7
Motivation
• Sequential architectures reaching physical limitation.
8
Inevitability of parallel computing
Application demand for performance
• Scientific: weather forecasting, pharmaceutical design, genomics
• Commercial: OLTP, search engine, decision support, data mining
• Scalable web servers
9
Application Demand: Inevitability of parallel Computing
Engineering Computers
• Earthquake and structural modeling • Embedded systems increasingly rely on
distributed control algorithms.
• Design and simulation of micro- and nano-scale
systems. •Network intrusion detection, cryptography, etc.
•Optimizing performance of modern automobile.
Computational Sciences •Networks, mail-servers, search engines…
• Bioinformatics: Functional and structural •Visualization architectures & entertainment
characterization of genes and proteins.
•Simulation Traditional scientific and engineering
• Astrophysics: exploring the evolution of galaxies. paradigm:
1) Do theory or paper design.
• Weather modeling, flood/tornado prediction.. 2) Perform experiments or build system.
Limitations:
Commercial
– Too difficult -- build large wind tunnels.
• Data mining and analysis for optimizing business – Too expensive -- build a throw-away passenger jet.
and marketing decisions.
– Too slow -- wait for climate or galactic evolution.
• Database and Web servers for online transaction
– Too dangerous -- weapons, drug design, climate
processing experimentation. 10
Technology and architecture
1- Processor Capacity
11
2- Transistor Count
13
4- Similar Story for Memory and Disk
❖ Divergence between memory capacity and speed
o Capacity increased by 1000X from 1980-95, speed only 2X
o Larger memories are slower, while processors get faster “memory wall”
- Need to transfer more data in parallel
- Need deeper cache hierarchies
- Parallelism helps hide memory latency
14
5- Role of Architecture
Greatest trend in VLSI is an increase in the exploited parallelism
• Up to 1985: bit level parallelism:
– 4-bit -> 8 bit -> 16-bit – slows after 32 bit
• Nowadays:
– Hyper-threading
– Multi-core
15
❖ Definition
High-performance computing (HPC) is the use of parallel processing for
running advanced application programs efficiently, reliably and quickly.
17
Conclusions
• The hardware evolution, driven by Moore’s law, was geared toward two
things:
– exploiting parallelism
– Dealing with memory (latency, capacity)
18
Terminologies
❑Core a single computing unit with its own independent control
❑ Multicore is a processor having several cores that can access the same memory
concurrently
❑ A computation is decomposed into several parts called Tasks that can be computed
in parallel
❑Finding enough parallelism is (one of the) critical steps for high performance
(Amdahl’s law).
19
Performance Metrics
❑ Execution time:
The time elapsed between the beginning and the end of its execution.
❑Speedup:
The ration between serial and parallel time.
Speedup= Ts/Tp
❑ Efficiency:
Ratio of speedup to the number of processors.
Efficiency= Speedup/P
20
❑ Amdahl’s Law
Used to predict maximum speedup using multiple processors.
• Let f = fraction of work performed sequentially.
• (1 - f) = fraction of work that is parallelizable.
• P = number of processors
On 1 cpu: T1 = f + (1 – f ) = 1.
(1−𝑓)
On P processors: Tp = f +
𝑝
• Speedup
𝑇1 1 1
= <
𝑇𝑝 𝑓+(1−𝑓)/𝑝 𝑓
21
How is parallelism expressed in a
program
IMPLICITLY EXPLICITLY
❑ Define tasks only, rest implied; or ❑ Define tasks, work decomposition,
define tasks and work decomposition data decomposition, communication,
rest implied; synchronization.
23
1- IMPLICITLY
❑ It is a characteristic of a programming language that allows a compiler or interpreter to
automatically exploit the parallelism inherent to the computations expressed by some of the
language's constructs.
❑ A pure implicitly parallel language does not need special directives, operators or functions
to enable parallel execution.
❑ Programming languages with implicit parallelism include Axum, HPF, Id, LabVIEW, MATLAB
M-code,
❑ Example: taking the sine or logarithm of a group of numbers, a language that provides
implicit parallelism might allow the programmer to write the instruction thus:
24
Advantages Disadvantages
❑A programmer does not need to worry ❑It reduce the control that the
about task division or process programmer has over the parallel
communication, execution of the program,
25
2- EXPLICITLY
How is parallelism expressed in a program
❑it is the representation of concurrent computations by means of primitives in the form
of special-purpose directives or function calls.
Advantages Disadvantages
Whether they need info from each other to finish their own job → (Communication)
27
Challenges
All parallel programs contain:
❑ Parallel sections
❑ Serial sections
❑Serial sections are with work is being duplicated or no useful work is being done, (waiting
for others)
28
Sources of overhead in parallel programs
❑ Inter process interaction:
The time spent communicating data between processing elements is
usually the most significant source of parallel processing overhead.
❑ Idling:
Processes may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a
program.
❑ Excess Computation:
The fastest known sequential algorithm for a problem may be difficult
or impossible to parallelize, forcing us to use a parallel algorithm
based on a poorer but easily parallelizable sequential algorithm.
29