What Is Parallel Computing 1 PDF
What Is Parallel Computing 1 PDF
In the simplest sense, parallel computing is the simultaneous use of multiple compute
resources to solve a computational problem:
o To be run using multiple CPUs
o A problem is broken into discrete parts that can be solved concurrently
o Each part is further broken down to a series of instructions
o Instructions from each part execute simultaneously on different CPUs
o
o
o
Historically, parallel computing has been considered to be "the high end of computing",
and has been used to model difficult scientific and engineering problems found in the real
world. Some examples:
o Atmosphere, Earth, Environment
o Physics - applied, nuclear, particle, condensed matter, high pressure, fusion,
photonics
o Bioscience, Biotechnology, Genetics
o Chemistry, Molecular Sciences
o Geology, Seismology
o Mechanical Engineering - from prosthetics to spacecraft
o
o
Save time and/or money: In theory, throwing more resources at a task will shorten its time
to completion, with potential cost savings. Parallel clusters can be built from cheap,
commodity components.
Solve larger problems: Many problems are so large and/or complex that it is impractical
or impossible to solve them on a single computer, especially given limited computer
memory. For example:
o "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems requiring
PetaFLOPS and PetaBytes of computing resources.
o Web search engines/databases processing millions of transactions per second
Provide concurrency: A single compute resource can only do one thing at a time.
Multiple computing resources can be doing many things simultaneously. For example, the
Access Grid (www.accessgrid.org) provides a global collaboration network where people
from around the world can meet and conduct work "virtually".
Use of non-local resources: Using compute resources on a wide area network, or even the
Internet when local compute resources are scarce. For example:
o SETI@home (setiathome.berkeley.edu) uses over 330,000 computers for a compute
power over 528 TeraFLOPS (as of August 04, 2008)
o Folding@home (folding.stanford.edu) uses over 340,000 computers for a compute
power of 4.2 PetaFLOPS (as of November 4, 2008)
Limits to serial computing: Both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
o Transmission speeds - the speed of a serial computer is directly dependent upon
how fast data can move through hardware. Absolute limits are the speed of light (30
Current computer architectures are increasingly relying upon hardware level parallelism to
improve performance:
o
o
o
RAM model
Random Access Machine is a favorite model of a sequential computer. Its main features are:
1. Computation unit with a user defined program.
2. Read-only input tape and write-only output tape.
3. Unbounded number of local memory cells.
4. Each memory cell is capable of holding an integer of unbounded size.
Instruction set includes operations for moving data between memory cells, comparisons and
conditional
branches, and simple arithmetic operations.
5.
6. Execution starts with the first instruction and ends when a HALT instruction is executed.
7. All operations take unit time regardless of the lengths of operands.
8. Time complexity = the number of instructions executed.
9. Space complexity = the number of memory cells accessed.
PRAM model
Parallel Random Access Machine is a straightforward and natural generalization of RAM. It is
an idealized
model of a shared memory SIMD machine. Its main features are:
1. Unbounded collection of numbered RAM processors P0, P1, P2,... (without tapes).
2. Unbounded collection of shared memory cells M[0], M[1], M[2],....
3. Each Pi has its own (unbounded) local memory (registers) and knows its index i.
4. Each processor can access any shared memory cell (unless there is an access conflict, see
further) in unit time.
5. Input af a PRAM algorithm consists of n items stored in (usually the first) n shared 5. memory
cells.
6. Output of a PRAM algorithm consists of n' items stored in n' shared memory cells.
PREFIX SUM
The sequential for loop executes [logn] times. Hence, The overall execution time will be [logn].