1.3 Abstract Machine Models in Parallel Computing
1.3 Abstract Machine Models in Parallel Computing
Content
Note: Now we see various abstract machine models for parallel computers.
The Random Access Machine (RAM)
Any step of an algorithm for the RAM model consists of (up to) three basic phases
namely:
1. Read: The processor reads a datum from the memory. This datum is usually
stored in one of its local registers.
2. Execute: The processor performs a basic arithmetic or logic operation on the
contents of one or two of its registers.
3. Write: The processor writes the contents of one register into an arbitrary
memory location.
Note: For the purpose of analysis, we assume that each of these phases takes
constant, i.e., O(1) time.
The Parallel Random Access Machine (PRAM)
The PRAM is one of the popular models for designing parallel algorithms.
As in the case of RAM, each step of an algorithm here consists of the following
phases:
1. Read: (Up to) N processors read simultaneously (in parallel) from (up to) N
memory locations (in the common memory) and store the values in their local
registers.
2. Compute: (Up to) N processors perform basic arithmetic or logical operations on
the values in their registers.
3. Write: (Up to) N processors write simultaneously into (up to) N memory
locations from their registers.
Cont…
• Each of the phases, READ, COMPUTE, WRITE, is assumed to take O(1) time as
in the case of RAM.
• Notice that not all processors need to execute a given step of the algorithm. When
a subset of processors execute a step, the other processors remain idle during that
time.
• The algorithm for a PRAM has to specify which subset of processors should be
active during the execution of a step.
• In the above model, a problem might arise when more than one processor tries to
access the same memory location at the same time.
Cont…
The PRAM model can be subdivided into four categories based on the way
simultaneous memory accesses are handled.
Exclusive Read Exclusive Write (EREW) PRAM: In this model, every access to a
memory location (read or write) has to be exclusive. This model provides the least
amount of memory concurrency and is therefore the weakest.
Concurrent Read Exclusive Write (CREW) PRAM: In this model, only write
operations to a memory location are exclusive. Two or more processors can
concurrently read from the same memory location. This is one of the most
commonly used models.
Cont…
Exclusive Read Concurrent Write (ERCW) PRAM: This model allows multiple
processors to concurrently write into the same memory location. The read operations
are exclusive. This model is not frequently used and is defined here only for the sake
of completeness.
Concurrent Read Concurrent Write (CRCW) PRAM: This model allows both
multiple read and multiple write operations to a memory location. It provides the
maximum amount of concurrency in memory access and is the most powerful of the
four models.
Exclusive Read Exclusive Write (EREW) PRAM
Steps:
1. Divide the Array: Divide the array into equal parts for each processor. If the
array has 8 elements and 4 processors, each processor will handle 2 elements.
Processor P1: Handles [1, 2]
Processor P2: Handles [3, 4]
Processor P3: Handles [5, 6]
Processor P4: Handles [7, 8]
Cont…
2. Each Processor Computes Local Sum: Each processor reads its assigned
elements and computes a local sum.
P1 computes local_sum1 = 1 + 2 = 3
P2 computes local_sum2 = 3 + 4 = 7
P3 computes local_sum3 = 5 + 6 = 11
P4 computes local_sum4 = 7 + 8 = 15
Cont…
Write to Shared Memory: Each processor, one at a time, writes its local sum to a
shared sum variable. The exclusive write rule is enforced, so only one processor can
write to sum at a time.
Initially, sum = 0.
P1 writes: sum = sum + local_sum1 = 0 + 3 = 3
P2 writes: sum = sum + local_sum2 = 3 + 7 = 10
P3 writes: sum = sum + local_sum3 = 10 + 11 = 21
P4 writes: sum = sum + local_sum4 = 21 + 15 = 36
Summary of EREW PRAM model
Concurrent Read (CR): Multiple processors can read from the same memory
location at the same time. This feature allows shared access to data without
contention, which is useful for parallel tasks that need to access common
information.
Exclusive Write (EW): Only one processor can write to a particular memory
location at any given time. This restriction prevents conflicts that could arise from
simultaneous write operations, ensuring data consistency.
Why Use CREW PRAM?
Step 2:
Divide the Array: Split the array into parts for each processor to handle. If there are 8
elements and 4 processors, each processor will handle 2 elements.
Processor P1: [3, 6]
Processor P2: [2, 8]
Processor P3: [4, 5]
Processor P4: [9, 1]
Cont…
Step 3:
Concurrent Read Phase:
Each processor reads the current total_sum from shared memory. This read operation is
concurrent. All processors read total_sum = 0 initially.
Step 4:
Local Computation:
Each processor computes the sum of its assigned section.
P1 computes local_sum1 = 3 + 6 = 9
P2 computes local_sum2 = 2 + 8 = 10
P3 computes local_sum3 = 4 + 5 = 9
P4 computes local_sum4 = 9 + 1 = 10
Cont…
Step 5:
Exclusive Write Phase:
Each processor tries to add its local sum to total_sum in shared memory. This write
operation is exclusive, meaning only one processor can write to total_sum at any given
time.
Processor P1 writes first, total_sum = 0 + 9 = 9.
Processor P2 writes next, total_sum = 9 + 10 = 19.
Processor P3 writes next, total_sum = 19 + 9 = 28.
Processor P4 writes last, total_sum = 28 + 10 = 38.
Step 6:
Final Result: After all processors have had a chance to write, the total_sum holds the sum
of all elements in the array, which is 38.
Summary
Allows Concurrent Reads: Multiple processors can read from the same memory
location simultaneously, facilitating efficient data access in parallel algorithms.
Restricts Write Operations: Only one processor can write to a memory location at
a time, ensuring data consistency and preventing race conditions.
Practical Application: The CREW PRAM model is commonly used in scenarios
where data needs to be read by multiple processors for computation, but writes must
be controlled to avoid conflicts.
Exclusive Read Concurrent Write (ERCW) PRAM
Exclusive Read (ER): Only one processor can read from a specific memory
location at a time. If multiple processors attempt to read from the same memory
location simultaneously, only one read operation is allowed.
Concurrent Write (CW): Multiple processors are allowed to write to the same
memory location at the same time. However, to resolve conflicts in concurrent write
operations, a predefined rule or protocol must be in place (e.g., sum the values, pick
the maximum, overwrite, etc.).
Why Use ERCW PRAM?
Efficiency in Writing: The ability to write concurrently allows the system to handle
multiple updates at once, which can be useful in aggregating results or updating
shared counters.
Scalability: In scenarios where multiple processors need to update a shared value
(like a sum or count), ERCW allows this to happen concurrently, improving
scalability and performance.
Example of ERCW PRAM
Scenario:
We have 4 processors (P1, P2, P3, P4).
Each processor is assigned to count votes for a specific candidate from different
ballot boxes.
The shared memory has an array where each index corresponds to a candidate, and
the value at each index represents the total votes for that candidate.
We will focus on aggregating votes for Candidate A.
Cont…
Steps:
Initialization: Initialize the shared memory array for Candidate A's votes to 0. Let's
denote this memory location as totalVotesForA.
Exclusive Read Phase: Each processor reads its portion of the ballots to count
votes for Candidate A. Since the read operations are exclusive, each processor reads
different ballot subsets. There is no contention for read operations.
P1 reads 10 votes for Candidate A.
P2 reads 15 votes for Candidate A.
P3 reads 20 votes for Candidate A.
P4 reads 25 votes for Candidate A.
Cont…
Concurrent Write Phase: Now, all processors write their counts to the shared
memory location totalVotesForA simultaneously. ERCW allows this concurrent
write. To handle this, we define a rule to aggregate these votes (e.g., sum the
values).
All processors write to totalVotesForA at the same time:
P1 writes 10
P2 writes 15
P3 writes 20
P4 writes 25
Cont…
Steps:
Initialization: Set the shared memory location max to a very low value (e.g., -∞).
This ensures that any element in the array will be greater than this initial value.
Concurrent Read Phase: Divide the array among the processors. Each processor
reads a segment of the array to find the local maximum.
P1 reads [3, 15] and finds the local maximum as 15.
P2 reads [8, 23] and finds the local maximum as 23.
P3 reads [7, 12] and finds the local maximum as 12.
P4 reads [19, 5] and finds the local maximum as 19.
Cont…
Concurrent Write Phase: All processors try to write their local maximum to the
shared memory location max simultaneously. A predefined rule is used to resolve
the concurrent write:
a) Suppose we use the maximum rule: the highest value among those being
written is stored in max.
b) P1 tries to write 15, P2 tries to write 23, P3 tries to write 12, and P4 tries to write
19.
c) According to the maximum rule, max will be set to 23, as it is the highest value
among the ones being written.
Final Result: The shared memory location max now holds the value 23, which is
the maximum element in the array.
Summary
CRCW PRAM represents the most powerful model in the PRAM family due to its
ability to handle both concurrent reads and writes.
It is useful for designing highly parallel algorithms, especially in scenarios where
maximum concurrency is required.
Understanding this model helps in implementing efficient parallel algorithms for
problems that involve shared data access, such as maximum finding, sorting, or
other computational tasks that can benefit from high levels of parallelism.
Cont…
Steps:
Initialization: Set the shared memory location max to a very low value (e.g., -∞).
This ensures that any element in the array will be greater than this initial value.
Concurrent Read Phase: Divide the array among the processors. Each processor
reads a segment of the array to find the local maximum.
P1 reads [3, 15] and finds the local maximum as 15.
P2 reads [8, 23] and finds the local maximum as 23.
P3 reads [7, 12] and finds the local maximum as 12.
P4 reads [19, 5] and finds the local maximum as 19.
Thank You
References
1. Parallel Computers architecture and programming by V. Rajaraman
and C.S.R Murthy, Prentice Hall of India.