0% found this document useful (0 votes)
26 views

Lec1 Introduction

This document provides an introduction to parallel computing. It discusses how computational modeling and simulation have become important tools in science due to their ability to solve problems that are otherwise difficult or impossible. It also describes how parallelism is now necessary for mainstream computing due to fundamental limits on serial computing performance. Finally, it provides an overview of parallel computers, defining key terms and concepts in parallel computing like task partitioning, data partitioning, and communication challenges.

Uploaded by

SAI
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lec1 Introduction

This document provides an introduction to parallel computing. It discusses how computational modeling and simulation have become important tools in science due to their ability to solve problems that are otherwise difficult or impossible. It also describes how parallelism is now necessary for mainstream computing due to fundamental limits on serial computing performance. Finally, it provides an overview of parallel computers, defining key terms and concepts in parallel computing like task partitioning, data partitioning, and communication challenges.

Uploaded by

SAI
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Lecture 1 Introduction

Parallel Computing
Fall 2022

1
Contents
 Acknowledgments for today’s lecture
• Jack Dongarra (U. Tennessee) --- CS 594 slides from Spring 2008 —
http://www.cs.utk.edu/%7Edongarra/WEB-PAGES/cs594-
2008.htm
• Kathy Yelick (UC Berkeley) --- CS 267 slides from Spring 2007 —
http://www.eecs.berkeley.edu/~yelick/cs267_sp07/lectures
• Slides accompanying course textbook —http://www-
users.cs.umn.edu/~karypis/parbook/
• Vivek Sarkar(Rice University) –
http://www.owlnet.rice.edu/~comp422/lecture-notes/comp422-
lec1-s08-v1.pdf
• Alexandros Gerbessiotis (New Jersey Institute of Technology)

2
Why parallel computing?
– computational modeling and simulation
“Computational modeling and simulation are among the most significant
developments in the practice of scientific inquiry in the 20th Century. Within the
last two decades, scientific computing has become an important contributor to
all scientific disciplines.
It is particularly important for the solution of research problems that are
insoluble by traditional scientific theoretical and experimental approaches,
hazardous to study in the laboratory, or time consuming or expensive to solve
by traditional means”
— “Scientific Discovery through Advanced Computing” DOE Office of
Science, 2000

3
Simulation: The Third Pillar of
Science
• Traditional scientific and engineering paradigm:
1) Do theory or paper design.
2) Perform experiments or build system.

• Limitations:
—Too difficult -- build large wind tunnels.
—Too expensive -- build a throw-away passenger jet.
—Too slow -- wait for climate or galactic evolution.
—Too dangerous -- weapons, drug design, climate experimentation.

• Computational science paradigm:


3) Use high performance computer systems to simulate the
phenomenon
– Base on known physical laws and efficient numerical methods.

4
Some Particularly Challenging
Computations
• Science
—Global climate modeling
—Biology: genomics; protein folding; drug design
—Astrophysical modeling
—Computational Chemistry
—Computational Material Sciences and Nanosciences
• Engineering
—Semiconductor design
—Earthquake and structural modeling
—Computation fluid dynamics (airplane design)
—Combustion (engine design)
—Crash simulation
• Business
—Financial and economic modeling
—Transaction processing, web services and search engines
• Defense
—Nuclear weapons -- test by simulations
—Cryptography
5
Technology Trends: Microprocessor
Capacity

2X transistors/Chip Every 1.5 years


Called “Moore’s Law”
Gordon Moore (co-founder of
Intel) predicted in 1965 that
Microprocessors have
the transistor density of
become smaller, denser,
semiconductor chips would
and more powerful.
double roughly every 18
months.
Slide source: Jack Dongarra 6
More Limits: How fast can
a serial computer be?
1 Tflop/s, 1 r = 0.3
Tbyte sequential mm
machine

 Consider the 1 Tflop/s sequential machine:


 Data must travel some distance, r, to get from memory to
CPU.
 To get 1 data element per cycle, this means 1012times per
second at the speed of light, c = 3x108 m/s. Thus r <
c/1012 = 0.3 mm.
 Now put 1 Tbyte of storage in a 0.3 mm x 0.3 mm
area:
 Each bit occupies about 1 square Angstrom, or the size
of a small atom.
 No choice but parallelism 7
Why Parallelism is now necessary for
Mainstream Computing
• Chip density is continuing
increase
~2x every 2 years
—Clock speed is not
—Number of processor
cores have to double
instead

• There is little or no
hidden parallelism (ILP) to
be found

• Parallelism must be
exposed to and
managed by software

Source: Intel, Microsoft (Sutter) and


8
Stanford (Olukotun, Hammond)
Fundamental limits on Serial
Computing: Three “Walls”
• Power Wall
—Increasingly, microprocessor performance is limited by achievable power
dissipation rather than by the number of available integrated-circuit resources
(transistors and wires). Thus, the only way to significantly increase the
performance of microprocessors is to improve power efficiency at about the
same rate as the performance increase.

• Frequency Wall
—Conventional processors require increasingly deeper instruction pipelines to
achieve higher operating frequencies. This technique has reached a point of
diminishing returns, and even negative returns if power is taken into account.

• Memory Wall
—On multi-gigahertz symmetric processors --- even those with integrated
memory controllers --- latency to DRAM memory is currently approaching 1,000
cycles. As a result, program performance is dominated by the activity of moving
data between main storage (the effective-address space that includes main
memory) and the processor.

9
What is Parallel computing?
 Parallel computing involves performing
parallel tasks using more than one computer.
 Example in real life with related principles --
book shelving in a library
 Single worker
 P workers with each worker stacking n/p books,
but with arbitration problem(many workers try to
stack the next book in the same shelf.)
 P workers with each worker stacking n/p books,
but without arbitration problem (each worker work
on a different set of shelves)

10
Important Issues in parallel
computing
 Task/Program Partitioning.
 How to split a single task among the processors
so that each processor performs the same amount
of work, and all processors work collectively to
complete the task.
 Data Partitioning.
 How to split the data evenly among the
processors in such a way that processor
interaction is minimized.
 Communication/Arbitration.
 How we allow communication among different
processors and how we arbitrate communication
related conflicts.
11
Challenges
1. Design of parallel computers so that we
resolve the above issues.
2. Design, analysis and evaluation of parallel
algorithms run on these machines.
3. Portability and scalability issues related to
parallel programs and algorithms
4. Tools and libraries used in such systems.

12
Units of Measure in HPC
• High Performance Computing (HPC) units are:
—Flop: floating point operation
—Flops/s: floating point operations per second
—Bytes: size of data (a double precision floating point number is 8)
• Typical sizes are millions, billions, trillions…
Mega Mflop/s = 106 flop/sec Mbyte = 220 = 1048576 ~ 106 bytes
Giga Gflop/s = 109 flop/sec Gbyte = 230 ~ 109 bytes
Tera Tflop/s = 1012 flop/sec Tbyte = 240 ~ 1012 bytes
Peta Pflop/s = 1015 flop/sec Pbyte = 250 ~ 1015 bytes
Exa Eflop/s = 1018 flop/sec Ebyte = 260 ~ 1018 bytes
Zetta Zflop/s = 1021 flop/sec Zbyte = 270 ~ 1021 bytes
Yotta Yflop/s = 1024 flop/sec Ybyte = 280 ~ 1024 bytes
• See www.top500.org for current list of fastest machines

13
What is a parallel computer?
 A parallel computer is a collection of processors that
cooperatively solve computationally intensive
problems faster than other computers.
 Parallel algorithms allow the efficient programming of
parallel computers.
 This way the waste of computational resources can
be avoided.
 Parallel computer v.s. Supercomputer
 supercomputer refers to a general-purpose computer that
can solve computational intensive problems faster than
traditional computers.
 A supercomputer may or may not be a parallel computer.
14
Parallel Computers: Past and Present
 1980’s Cray supercomputer was 20-100 times faster
than other computers(main frames, minicomputers)
in use. (The price of supercomputer is 10 times other
computers – worth it)
 1990’s “Cray”-like CPU is 2-4 times as fast as a
microprocessor. (The price of supercomputer is 10-20
times a microcomputer – make no sense)
 The solution to the need for computational power is
a massively parallel computers, where tens to
hundreds of commercial off-the-shelf processors are
used to build a machine whose performance is much
greater than that of a single processor.

15
Scale of HPC Systems in 6/2022

https://www.top500.org/lists/top500/2022/06/

16
Performance numbers for
various CUNY HPC Systems

17
Applications of Parallel Computing
 Astrophysics(explore the evoluation of galaxies, analysis of
extremely large datasets from telescope).
 Material sciences (eg superconductivity).
 Biology, biochemistry, gene sequencing.
 Medicine and human organ modeling (eg. to study the effects
and dynamics of a heart attack, developing new drugs and
cures for diseases).
 Global weather prediction.
 Visualization (eg movie industry, 3D animation).
 Data Mining (optimizing business and marketing decisions).
 Computational-Fluid Dynamics (CFD) for aircraft and automotive
vehicle design.
 Computer security, cryptography

18
Global Climate Modeling Problem
 Problem is to compute:
f(latitude, longitude, elevation, time) ->
temperature, pressure, humidity, wind velocity
 Approach:
—Discretize the domain, e.g., a measurement point every 10 km
—Devise an algorithm to predict weather at time t+δt given t
 Uses:
- Predict major events,
e.g., El Nino
- Use in setting air
emissions standards

19
Source: http://www.epm.ornl.gov/chammp/chammp.html
Global Climate Modeling Computation
• One piece is modeling the fluid flow in the atmosphere
—Solve Navier-Stokes equations
– Roughly 100 Flops per grid point with 1 minute timestep
• Computational requirements:
—To match real-time, need 5 x 1011 flops in 60 seconds = 8
Gflop/s
—Weather prediction (7 days in 24 hours) -> 56 Gflop/s
—Climate prediction (50 years in 30 days) -> 4.8 Tflop/s
—To use in policy negotiations (50 years in 12 hours) -> 288
Tflop/s
• To double the grid resolution, computation is 8x to 16x
• State of the art models require integration of atmosphere,
ocean, sea-ice, land models, plus possibly carbon cycle,
geochemistry and more
• Current models are coarser than this
20
What is a parallel algorithm?
 A parallel algorithm is an algorithm designed
for a parallel computer.

21
Questions when combining
processor power
 How does one combine processors efficiently?
 Do processors work independently?
 Do they cooperate? If they cooperate how do
they interact with each other?
 How are the processors interconnected?
 How can we make programs portable?
 How does one program such machines so
that programs run efficiently and do not
waster resourses?

22
End of lecture 1

Thank you!

23

You might also like