Parallel Random Access Machine
Parallel Random Access Machine
David Rodriguez-Velazquez
Spring -09 CS-6260
Dr. Elise de Doncker
Overview
Whatis a machine model?
Why do we need a model?
RAM
PRAM
◦ Steps in computation
◦ Write conflict
◦ Examples
A parallel Machine Model
What is a machine model?
◦ Describes a “machine”
◦ Puts a value to the operations on the machine
Why do we need a model?
◦ Makes it easy to reason algorithms
◦ Achieve complexity bounds
◦ Analyzes maximum parallelism
RAM (Random Access
Machine)
Unbounded number of local memory cells
Each memory cell can hold an integer of
unbounded size
Instruction set included –simple operations,
data operations, comparator, branches
All operations take unit time
Time complexity = number of instructions
executed
Space complexity = number of memory
cells used
PRAM (Parallel Random Access
Machine)
Definition:
◦ Is an abstract machine for designing the
algorithms applicable to parallel computers
◦ M’ is a system <M, X, Y, A> of infinitely many
RAM’s M1, M2, …, each Mi is called a processor of
M’. All the processors are assumed to be identical.
Each has ability to recognize its own index i
Input cells X(1), X(2),…,
Output cells Y(1), Y(2),…,
Shared memory cells A(1), A(2),…,
PRAM (Parallel RAM)
Unbounded collection of RAM processors
P0, P1, …,
Processors don’t have tape
Each processor has unbounded registers
Unbounded collection of share memory
cells
All processors can access all memory
cells in unit time
All communication via shared memory
PRAM (step in a computation)
Consist of 5 phases (carried in parallel by all
the processors) each processor:
◦ Reads a value from one of the cells x(1),…, x(N)
◦ Reads one of the shared memory cells A(1), A(2),…
◦ Performs some internal computation
◦ May write into one of the output cells y(1), y(2),…
◦ May write into one of the shared memory cells
A(1), A(2),…
e.g. for all i, do A[i] = A[i-1] + 1;
Read A[i-1] , compute add 1, write A[i]
happened synchronously
PRAM (Parallel RAM)
Some subset of the processors can remain
idle
P0 P1 P2 PN
Most Least
powerful powerful
Least Most
realistic realistic
An initial example
How do you add N numbers residing in
memory location M[0, 1, …, N]
P0 + P1 + P2 + P3 + Step 1
P0 + P2 + Step 2
P0 + Step 3
PRAM Algorithm (Parallel
Addition)
Log (n) steps = time needed
n / 2 processors needed
Speed-up = n / log(n)
Efficiency = 1 / log(n)
Applicable for other operations
◦ +, *, <, >, etc.
Example 2
p processor PRAM with n numbers (p<=n)
Does x exist within the n numbers?
P0 contains x and finally P0 has to know
Algorithm
◦ Inform everyone what x is
◦ Every processor checks [n/p] numbers and sets
a flag
◦ Check if any of the flags are set to 1
Example 2
CRCW
EREW CREW
(common)
Inform everyone what x is log(p) 1 1
Every processor checks [n/p]
n/p n/p n/p
numbers and sets a flag
Check if any of the flag are
log(p) log(p) 1
set to 1
Some variants of PRAM
Bounded number of shared memory cells.
Small memory PRAM (input data set
exceeds capacity of the share memory i/o
values can be distributed evenly among the
processors)
Bounded number of processor Small PRAM.
If # of threads of execution is higher,
processors may interleave several threads.
Bounded size of a machine word. Word size
of PRAM
Handling access conflicts. Constraints on
simultaneous access to share memory cells
Lemma
Assume p’<p. Any problem that can be solved for
a p processor PRAM in t steps can be solved in a
p’ processor PRAM in t’ = O(tp/p’) steps
(assuming same size of shared memory)
Proof:
Partition p is simulated processors into p’ groups of size p/p’
each
Associate each of the p’ simulating processors with one of
these groups
Each of the simulating processors simulates one step of its
group of processors by:
◦ executing all their READ and local computation substeps
first
◦ executing their WRITE substeps then
Lemma
Assume m’<m. Any problem that can be solved for a p processor and m-
cell PRAM in t steps can be solved on a max(p,m’)-processors m’-cell
PRAM in O(tm/m’) steps
Proof:
Partition m simulated shared memory cells into m’ continuous segments S i of size m/m’ each
Each simulating processor P’i 1<=i<=p, will simulate processor Pi of the original PRAM
Each simulating processor P’i 1<=i<=m’, stores the initial contents of Si into its local memory
and will use M’[i] as an auxiliary memory cell for simulation of accesses to cell of S i
Simulation of one original READ operation
Each P’i i=1,…,max(p,m’) repeats for k=1,…,m/m’
1. write the value of the k-th cell of Si into M’[i] i=1…,m’,
2. read the value which the simulated processor P i i=1,…,,p, would read in
this simulated substep, if it appeared in the shared memory
The local computation substep of Pi i=1..,p is simulated in one step by P’i
Simulation of one original WRITE operation is analogous to that of READ
Conclusions
We need some model to reason, compare,
analyze and design algorithms
PRAM is simple and easy to understand
Rich set of theoretical results
Over-simplistic and often not realistic
The programs written on these machines
are, in general, of type MIMD. Certain
special cases such as SIMD may also be
handled in such a framework
Question
Why is PRAM attractive and important
model for designers of parallel algorithms ?
◦ It is natural: the number of operations executed
per one cycle on p processors is at most p
◦ It is strong: any processor can read/write any
shared memory cell in unit time
◦ It is simple: it abstracts from any communication
or synchronization overhead, which makes the
complexity and correctness of PRAM algorithm
easier
◦ It can be used as a benchmark: If a problem has
no feasible/efficient solution on PRAM, it has no
feasible/efficient solution for any parallel machine