0% found this document useful (0 votes)
77 views

Lecture 4 Network Topologies For Parallel Architecture

The document discusses various network topologies for parallel architectures such as buses, crossbar networks, Omega networks, trees, and fat trees. It evaluates these topologies based on their diameter, bisection width, cost, and whether they are suitable for large or small numbers of processors. Shared memory and distributed memory architectures are also covered along with examples of commercial parallel systems that use different network topologies.

Uploaded by

nimranoor137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Lecture 4 Network Topologies For Parallel Architecture

The document discusses various network topologies for parallel architectures such as buses, crossbar networks, Omega networks, trees, and fat trees. It evaluates these topologies based on their diameter, bisection width, cost, and whether they are suitable for large or small numbers of processors. Shared memory and distributed memory architectures are also covered along with examples of commercial parallel systems that use different network topologies.

Uploaded by

nimranoor137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Parallel and Distributed Computing

CS 3006 (BCS-7A | BDS-7A)


Lecture 4
Danyal Farhat
FAST School of Computing
NUCES Lahore
Network Topologies for Parallel
Architectures, Evaluating Static
Inter-connections in-terms of
Diameter, Arc-connectivity, Bisection
width, and Cost
Buses

• Simplest and earliest parallel machines used buses


• All processors access a common bus for exchanging data
• The distance between any two nodes is O(1) in a bus
• Bus provides a convenient broadcast media
Buses (Cont.)

• The bandwidth of the shared bus is a major bottleneck


• Typical bus based machines are limited to dozens of nodes
• Sun Enterprise servers and Intel Pentium based shared-bus
multiprocessors are examples of such architectures
Crossbar Network
• Crossbar network allows any system processor to connect to any
other processor or memory unit
• A simple way of connection between processors (p) and memory
banks (m)
• Used in the design of:
High performance small scale multiprocessors
Direct network routers
Large scale indirect networks as fundamental components
Crossbar Network (Cont.)
Crossbar Network (Cont.)
• Uses an p × m grid of switches to connect p inputs to m
outputs in a non-blocking manner
• The cost of a crossbar of p processors grows as O(p2)
• This is generally difficult to scale for large values of p
• Examples of machines that employ crossbars include the
Sun Ultra HPC 10000 and the Fujitsu VPP500
• Crossbars have excellent performance scalability but poor
cost scalability
• Buses have excellent cost scalability, but poor performance
scalability
Multistage Omega Network
• One of the most commonly used multistage interconnects
is the Omega network
• This network consists of log p stages, where p is the
number of inputs/outputs
• At each stage, input i is connected j
Multistage Omega Network (Cont.)
• Each stage of the Omega network implements a perfect
shuffle as follows:
8 x 8 Omega network
• A complete Omega network with the perfect shuffle
interconnects and switches can now be illustrated:

• An omega network has p/2 × log p switching nodes, and


the cost of such a network grows as (p log p)
Multistage Omega Network (Cont.)
• Machines like IBM eServer p575 and SGI Altix 4000 use -network
• -network is much more interesting for large number of processors
• Problem: the switches have to be fast enough, and also the width
of the links is important. 16 bit parallel is better than serial links
• Multiprocessor vector-processors use instead crossbars (because at
most only 32 processors)
• Synchronization is usually performed with special communication
registers (CPU to CPU); if there is little synchronization shared
memory is admitted
Shared Memory Interconnection Network (Revisited)
There are three main network topologies available:

• Crossbar (n2 connections - datapath without sharing)

• -network (n log2 n connections - log2 n switching stages


and shared on a path)

• Central databus (1 connections - n shared)


Advantages of shared memory machines
• User-friendly programming environment due to global address
space

• Data sharing is fast and uniform due to proximity (nearness) of


memory to CPUs

• Memory coherence is managed by the operating system


Memory Coherence: Multiple processors trying to access the same memory
location
Disadvantages of shared memory machines
• Performance degradation due to “memory contention”
(several processors try to access the same memory location)
• Programmer’s responsibility for synchronization constructs
(correct access to memory)
• Expensive to design shared memory computers with increasing
numbers of processors
• Adding processors can geometrically increase traffic on the
shared memory-CPU path and for cache coherence
management
Cache Coherence: Multiple processors trying to access the same cache
Distributed-Memory SIMD
• These machines are sometimes also known as processor-array
machines
• They work in lock-step, i.e. all processors execute the same
instruction at the same time (but on different data items); no
synchronization is required
• A control processor issues the instructions that are to be
executed by the processors in the processor array
• Processors sometimes are of a very simple bit-serial type (i.e
the processors operate on the data items bitwise, irrespective
of their type), which can operate on operands of any length
(when operands are little, this may result in speedups)
Distributed-Memory SIMD (Cont.)
• GPUs are similar to processor array systems
• DM-SIMD are specialized on digital signal and image
processing and on certain types of Monte Carlo simulations
with virtually no exchange between processors
Monte Carlo Simulation is a mathematical technique that is used to
estimate possible outcome of an uncertain event.
• Operations that cannot be executed by the processor array or
by the control processor are off-loaded to the front-end system
• I/O may be through:
the front end system
the processor array
both
Distributed-Memory MIMD
• Processors have their own local memory
• No concept of global address space across all processors
• Distributed memory systems require a communication
network to connect inter-processor memory
Examples
Shared-memory SIMD systems Shared-memory MIMD systems
• The Hitachi S3600 series • The Cray Research Inc. Cray J90-
Distributed-memory SIMD systems series, T90 series
• The Alenia Quadrics • The Hitachi S3800 series
• The Cambridge Parallel • The HP/Convex C4 series
Processing Gamma II • The Digital Equipment Corp.
• The Digital Equipment Corp. MPP AlphaServer
series • The NEC SX-4
• The MasPar MP-1 • The Silicon Graphics Power
• The MasPar MP-2 Challenge
• The Tera MTA

Introduction: 1-18
Examples (Cont.)
Distributed-memory MIMD systems Distributed-memory MIMD systems
• The Alex AVX 2 • The Intel Paragon XP
• The Avalon A12 • The Matsushita ADENART
• The C-DAC PARAM 9000/SS • The Meiko Computing Surface 2
• The Cray Research Inc. T3E • The nCUBE 3
• The Fujitsu AP1000 • The NEC Cenju-3
• The Fujitsu VPP300 series • The Parallel Computing Industries
• The Hitachi SR2201 series system
• The HP/Convex Exemplar SPP-1200 • The Parsys TA9000
• The IBM 9076 SP2 • The Parsytec GC/Power Plus

Introduction: 1-19
DM-MIMD Routing Mechanism
• Routing mechanism determines the path a message takes
through network to reach from source to destination node.
• Data and task decomposition have to be dealt explicitly!
• Topology and speed of the data paths are crucial and have to
be balanced with costs.
• Routing can be classified as:
Minimal
Non-minimal
• It can also be classified as:
Deterministic routing
Adoptive routing
Linear Arrays
• In a linear array, each node has two neighbors, one to its left
and one to its right.
• If the nodes at either end are connected, we refer to it as a 1-D
torus or a ring.
K-d Meshes
• A generalization has nodes with 4 neighbors, to the north,
south, east, and west.
• A further generalization to d dimensions has nodes with 2d
neighbors (i.e., 6 neighbors in case of 3d cube).
3d-weather modeling, 3d-structure modeling
Hypercube
• For an hypercube with 2d nodes the number of steps to be
taken between any two nodes is at most d (logarithmic grow)
d = log p (dimensions = log(nodes))
• The distance between any two nodes is at most log p.
• Each node has log p neighbors.
Hypercube (Cont.)
• Rule of thumb: “d-dimensional hypercube can be constructed
by connecting corresponding nodes of two different (d-1)-
dimensional hypercubes”
Hypercube (Cont.)
• The processors are numbered with 3-bit binary numbers which
represent the X-Y-Z coordinates
• The distance between two nodes is given by the number of bit
positions at which the two nodes differ.
Tree based Networks
• A tree network is one in which there is one path between any
pair of nodes
• Linear arrays and star-connected networks are special cases of
tree-based networks
• In static tree network, each node represent a processing
element
• In dynamic tree network, leaf nodes represent processing
element while internal nodes are switching elements
Tree based Networks (Cont.)
• The source node sends the message up the tree until it reaches
the node at the root of the smallest subtree containing both
the source and destination nodes.
• Trees can be laid out in 2D with no wire crossings. This is an
attractive property of trees.
• The distance between any two nodes is no more than 2logp.
Tree based Networks (Cont.)
Fat Tree
• Another topology is the “fat tree”
• In a tree, a node can speak to another node passing through the root so we
have higher traffic at the root node.
• Fat tree amends this providing more bandwidth (with multiple connections) in
the higher levels of the tree
• N-ary fat tree is when the levels towards the root have N times the number of
connections in the level below it
Evaluating Static Interconnections
The parameters to evaluate a static interconnection:
• Cost: Usually depends on number of links for communication.
E.g., cost for linear array is p-1.
Lower values are favorable
• Diameter: The shortest distance between the farthest two
nodes in the network. The diameter of a linear array is p − 1.
Lower values are favorable
• Bisection Width: The minimum number of wires you must cut
to divide the network into two equal parts. The bisection
width of a linear array is 1.
Higher values are favorable
Evaluating Static Interconnections (Cont.)
• Arc-connectivity: The minimum number of arcs or links that
must be removed from the network, to break the network into
two disconnected networks
Higher value are desirable
It is minimum number of the links that must be cut to separate the
single node from the network
Higher values means, that incase of link failure there are multiple
other routes to the node.
Arc-connectivity of linear array is 1 and 2 for ring.
Evaluating Static Interconnections (Cont.)
Additional Resources
Book: Introduction to Parallel Computing by Ananth Grama and
Anshul Gupta
• Chapter 2: Parallel Programming Platforms
Section 2.4.3: Network Topologies
Section 2.6: Routing Mechanisms for Interconnection Networks
Thank You!

You might also like