module 3ppt
module 3ppt
Addison Wesley:
An Introduction to Parallel Computing 2nd Ed
In most parallel algorithms, processes need to exchange
data with other processes.
1- each odd numbered node sends its buffer to the even numbered node just before
itself, where the contents of the two buffers are combined into one.
3- node 4 sends its buffer to node 0, which computes the final result of the
reduction.
Example 4.1 Matrix-vector multiplication
Consider the problem of multiplying an n x n matrix A with
an n x 1 vector x on an n x n mesh of nodes to yield an n x 1
result vector y.
to complete.
The time for the entire all-to-all broadcast on a p-node two-
dimensional square mesh is the sum of the times spent in
the individual phases, which is
3- On a p-node hypercube,
the size of each message exchanged in the i th of the log p
steps is 2i-1m.
It takes a pair of nodes time ts + 2i-1twm to send and receive
messages from each other during the i th step.
Hence, the time to complete the entire procedure is:
4.3 All-Reduce and Prefix-Sum Operations
The communication pattern of all-to-all broadcast can be
used to perform some other operations as well.
First Step
Second Step
There is a faster way to perform all-reduce by using the communication
pattern of all-to-all broadcast.
Ex.: Assume that each integer in parentheses in the figure denotes a
number to be added that originally resided at the node with that
integer label.
In the first step, every odd numbered node sends its buffer
to an even numbered neighbor behind it, which
concatenates the received message with its own buffer.