2. Parallel Computers
2. Parallel Computers
Understanding
Parallel Computers - Paradigms
and Programming Models
Sofien GANNOUNI
Computer Science
E-mail: [email protected] ; [email protected]
1
Von Neumann Architecture
For over 40 years, virtually all computers have
followed a common machine model known as
the von Neumann computer. Named after the
Hungarian mathematician John von Neumann.
- Shared Memory
- Message Passing
- Multi-threading
11
Shared Memory Paradigm
Centralized Shared memory
Distributed memory
Hybrid systems
Interconnect
21
Point-to-Point Communication
Simplest form of message passing.
One process sends a message to another.
Different types of point-to-point communication:
synchronous send
Asynchronous (buffered) send
22
Synchronous Sends
The sender gets an information that the
message is received.
Analogue to the beep or okay-sheet of a fax.
beep
ok
23
Synchronous Sends
Synchronous send() and recv() library calls
using a three-way protocol
Process 1 Process 2
Request to send
send();
Suspend
Acknowledgment
process rec v ();
Both processes Message
Process 1 Process 2
continue
recv();
Suspend
Request to send
send(); process
Message
Both processes
continue
Acknowledgment
24
Buffered = Asynchronous Sends
Only know when the message has left.
25
Buffered = Asynchronous Sends
when the sender process reaches a send operation it
copies the data into the buffer on the receiver side and
can proceed without waiting.
At the receiver side it is not necessary that the received
data will be stored directly at the designated location.
When the receiver process encounters a receive
operations it checks the buffer for data.
Process 1 Process 2
Message buffer
Time
sen d ();
Continue
rec v ();
Read
process
message buffer
26
Blocking Operations
Blocking subroutine returns only when the
operation has completed.
27
Non-Blocking Operations
Non-blocking operations return immediately and
allow the sub-program to perform other work.
beep
ok
28
Blocking non-buffered send/ receive
The sender issues a send operation and cannot
proceed until a matching receive at the
receiver’s side is encountered and the operation
is complete.
29
Blocking buffered send/ receive
When the sender process
reaches a send operation it
copies the data into the buffer
on the receiver side and can
proceed without waiting.
30
Non-Blocking non-buffered send/ receive
The sender process needs not to be idle but
instead can do useful computations while waiting
for the send / receive operation to complete.
Approach 1:
The sender issues a request to
send and can proceed with its
computations without waiting
the receiver to be ready.
When the receiver is ready,
interruption signals trigger the
sender to start sending the
data.
31
Non-Blocking non-buffered send/ receive
The sender process needs not to be idle but
instead can do useful computations while waiting
for the send / receive operation to complete.
Approach 2:
The sender issues a request
to send, creates a child
process, and .can proceed
with its computations without
waiting the receiver to be
ready.
When the receiver is ready,
the child process starts
sending the data.
32
Non-Blocking buffered send/ receive
The sender issues a
direct memory access
operation (DMA) to the
buffer.
The sender can proceed
with its computations.
At the receiver side,
when a receive
operation is
encountered the data is
transferred from the
buffer to the memory
location. 33
Collective Communications
Broadcast
This function allows one process (called the root) to send
the same data to all communicator members
Scatter
Allows one process to give out the content of its send
buffer to all processes in a communicator.
Gather
Each process gives out the data in its send buffer to the
root process which stores them according to their ranks.
34
Broadcast
Sending each element of an array of data in the
root to a separate process.
35
Scatter
A one-to-many communication.
Sending each element of an array of data in the
root to a separate process.
36
Gather
Having one process collect individual values
from a set of processes.
37
Reduction
Gather operation combined with a specified
arithmetic or logical operation.
38
Multi-threading Paradigm
In a single-core (superscalar) system,
we can define multithreading as the ability of the
processor’s hardware to run two or more threads in an
overlapping fashion by allowing them to share the
functional units of that processor.
in a multi-core system,
we can define multithreading as the ability of two or more
processors to run two or more threads simultaneously (in
parallel) where each thread run on a separate processor
Modern systems combine both multithreading
approaches.
39