cs668 Lec1 ParallelArch

Flynn's Taxonomy Modern clusters - hybrid processors and the memory Hierarchy Busses and Switched Networks Interconnection Network Topologies - Bus Bus - a single shared data path - Pros Simplicity - cache coherence - synchronization Global Memory - Cons fixed bandwidth - Does not scale well CPU CPU CPU CPU Busses and Switched Networks - Many possible topologies Characterized by - Diameter Worst case number of switches between two processor

Uploaded by

IshAurea

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

cs668 Lec1 ParallelArch

Uploaded by

IshAurea

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Today’s topics

• Single processors and the Memory

Hierarchy
• Busses and Switched Networks
• Interconnection Network Topologies
• Multiprocessors
• Multicomputers
• Flynn’s Taxonomy
• Modern clusters – hybrid
Processors and the Memory
Hierarchy
• Registers (1 clock cycle, 100s of bytes)
• 1st level cache (3-5 clock cycles, 100s KBytes)
• 2nd level cache (~10 clock cycles, MBytes)
• Main memory (~100 clock cycles, GBytes)
• Disk (milliseconds, 100GB to gianormous)
CPU

registers
1st level 1st level
Instructions Data
2nd Level unified
(Instructions & Data)
IBM Dual Core

From Intel® 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com/design/processor/manuals/248966.pdf
Interconnection Network
Topologies - Bus
• Bus
– A single shared data path
– Pros
• Simplicity
– cache coherence Global Memory
– synchronization
– Cons
• fixed bandwidth
– Does not scale well CPU CPU CPU
Interconnection Network
Topologies – Switch based
• Switch Based
– mxn switches
– Many possible topologies
• Characterized by CPU CPU CPU CPU
– Diameter
• Worst case number of switches between two processors
• Impacts latency
– Bisection width
• Minimum number of connections that must be removed to
split the network into two
• Communication bandwidth limitation
– Edges per switch
• Best if this is independent of the size of the network
Interconnection Network
Topologies - Mesh
• 2-D Mesh
– 2-D array of processors
• Torus/Wraparound Mesh
– Processors on edge of mesh
are connected
• Characteristics (n nodes)
– Diameter = or
n 2( n − 1)
– Bisection width =
n
– Switch size = 4
– Number of switches = n
Interconnection Network
Topologies - Hypercube
• Hypercube
– A d-dimensional hypercube has
n=2d processors. 3-D Hypercube
– Each processor directly connected
to d other processors
– Shortest path between a pair of
processors is at most d
• Characteristics (n=2d nodes)
4-D Hypercube
– Diameter = d
– Bisection width = n/2
– Switch size = d
– Number of switches = n
Multistage Networks
• Butterfly • Characteristics for an Omega network
(n=2d nodes)
• Omega Diameter = d-1
–
• Perfect shuffle – Bisection width = n/2
– Switch size = 2
– Number of switches = d× n/2

An 8-input,
8-output Omega
network of 2x2
switches
Shared Memory
• One or more memories
• Global address space (all system memory visible to all processors)
• Transfer of data between processors is usually implicit, just read (write) to
(from) a given address (OpenMP)
• Cache-coherency protocol to maintain consistency between processors.

(UMA) Uniform-memory-access Shared-memory System

Memory Memory Memory

Interconnection Network

CPU CPU CPU

Distributed Shared Memory
• Single address space with implicit communication
• Hardware support for read/write to non-local memories, cache
coherency
• Latency for a memory operation is greater when accessing non local
data than when accessing date within a CPU’s own memory

(NUMA)Non-Uniform-memory-access Shared-memory System

Interconnection Network

CPU Memory CPU Memory CPU Memory

Distributed Memory
• Each processor has access to its own memory only
• Data transfer between processors is explicit, user calls message passing
functions
• Common Libraries for message passing
– MPI, PVM
• User has complete control/responsibility for data placement and
management

Interconnection Network

CPU Memory CPU Memory CPU Memory

Hybrid Systems
• Distributed memory system with multiprocessor shared memory nodes.
• Most common architecture for current generation of parallel machines

Interconnection Network
Network Network Network
Interface Interface Interface
CPU CPU CPU

Memory

Memory
Memory

CPU CPU CPU

Flynn’s Taxonomy
(figure 2.20 from Quinn)
Data stream

Single Multiple
SISD SIMD
Uniprocessor Procesor arrays
Single
Instruction stream

Pipelined vector processors

MISD MIMD
Systolic array Multiprocessors
Multiple

Multicomputers
Top 500 List
• Some highlights from http://www.top500.org/
– On the new list, the IBM BlueGene/L system, installed at DOE’s
Lawrence Livermore National Laboratory (LLNL), retains the No. 1 spot
with a Linpack performance of 280.6 teraflops (trillions of calculations
per second, or Tflop/s).
– The new No. 2 systems is Sandia National Laboratories’ Cray Red
Storm supercomputer, only the second system ever to be recorded to
exceed the 100 Tflops/s mark with 101.4 Tflops/s. The initial Red Storm
system was ranked No. 9 in the last listing.
– Slipping to No. 3 from No. 2 last June is the IBM eServer Blue Gene
Solution system, installed at IBM’s Thomas Watson Research Center
with 91.20 Tflops/s Linpack performance.
– The new No. 5 is the largest system in Europe, an IBM JS21 cluster
installed at the Barcelona Supercomputing Center. The system reached
62.63 Tflops/s.
Linux/Beowulf cluster basics
• Goal
– Get super computing processing power at the
cost of a few PCs
• How
– Commodity components: PCs and networks
– Free software with open source
CPU nodes
• A typical configuration
– Dual socket
– Dual core AMD or Intel nodes
– 4 GB memory per node
Network Options

From D.K. Panda’s Nowlab website at Ohio State,

http://nowlab.cse.ohio-state.edu/
Research Overview presentation
Challenges
• Cooling
• Power constraints
• Reliability
• System Administration

OSCP Notes
100% (2)
OSCP Notes
78 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
CS-3006_3_ParallelArchitectures
No ratings yet
CS-3006_3_ParallelArchitectures
56 pages
Week_6_A
No ratings yet
Week_6_A
22 pages
Week 6 A
No ratings yet
Week 6 A
32 pages
CS-3006_3_ParallelArchitectures
No ratings yet
CS-3006_3_ParallelArchitectures
53 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
ceg4131_models
No ratings yet
ceg4131_models
27 pages
04 Hardware
No ratings yet
04 Hardware
109 pages
Distributed Operating Syst EM: 15SE327E Unit 1
No ratings yet
Distributed Operating Syst EM: 15SE327E Unit 1
49 pages
Multi Processor Classification
No ratings yet
Multi Processor Classification
11 pages
Why Choose A Multiprocessor?
No ratings yet
Why Choose A Multiprocessor?
17 pages
Unit-5 Part1
No ratings yet
Unit-5 Part1
85 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
Unit 5: Distributed Multiprocessor Architectures
No ratings yet
Unit 5: Distributed Multiprocessor Architectures
48 pages
Parallel
No ratings yet
Parallel
20 pages
Running in Parallel
No ratings yet
Running in Parallel
24 pages
04 NW - Network technologies note chinh
No ratings yet
04 NW - Network technologies note chinh
21 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Unit-5 Part-2
No ratings yet
Unit-5 Part-2
22 pages
Distributed Memory Architecture
No ratings yet
Distributed Memory Architecture
48 pages
Lecture 5 Network Topologies for Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies for Parallel Architectures - Updated
46 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Lecture 3 PDC
No ratings yet
Lecture 3 PDC
21 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Operating System 4
No ratings yet
Operating System 4
33 pages
01_MICROPROCESSORS-MICROCONTROLLERS
No ratings yet
01_MICROPROCESSORS-MICROCONTROLLERS
37 pages
Unit 1
No ratings yet
Unit 1
88 pages
Chapter 1
No ratings yet
Chapter 1
45 pages
Message Passing Architecture
No ratings yet
Message Passing Architecture
32 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Week 1 - Introduction To 8085
No ratings yet
Week 1 - Introduction To 8085
27 pages
Introduction
No ratings yet
Introduction
32 pages
Multiprocessors, Threads and Microkernels: Fred Kuhns
No ratings yet
Multiprocessors, Threads and Microkernels: Fred Kuhns
46 pages
CSE2005 ETH Reference Material I Module2 Threads
No ratings yet
CSE2005 ETH Reference Material I Module2 Threads
39 pages
MIPS32 1004K: Industry's First Multi-Threaded Multiprocessor IP Core For Embedded Applications
No ratings yet
MIPS32 1004K: Industry's First Multi-Threaded Multiprocessor IP Core For Embedded Applications
2 pages
Lecture2 - Embedded
No ratings yet
Lecture2 - Embedded
20 pages
Multicore Computers
100% (1)
Multicore Computers
29 pages
ARM Cortex-A9 MPCore
No ratings yet
ARM Cortex-A9 MPCore
34 pages
Computer IO Buses and Interfaces
No ratings yet
Computer IO Buses and Interfaces
28 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
HPC Unit 3
No ratings yet
HPC Unit 3
31 pages
تنظيم حاسوب
No ratings yet
تنظيم حاسوب
66 pages
Processors and Programmable Logic: Khaled Grati Grati - Khaled@Supcom - TN 2020-2021
No ratings yet
Processors and Programmable Logic: Khaled Grati Grati - Khaled@Supcom - TN 2020-2021
18 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
15CS72 ACA Module1 Chapter1FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter1FinalCopy
25 pages
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
No ratings yet
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
27 pages
Unit I Introduction
No ratings yet
Unit I Introduction
54 pages
UNIT 4 COA Parallelism
No ratings yet
UNIT 4 COA Parallelism
29 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
36 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Microcontroller: Materi 2
No ratings yet
Microcontroller: Materi 2
38 pages
Lecture2 Network Hardware
No ratings yet
Lecture2 Network Hardware
33 pages
The Memory System (Part I) : Computer Science Department Second Stage
No ratings yet
The Memory System (Part I) : Computer Science Department Second Stage
22 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Command Input (+VE) (-VE) CPT Output: Network Communications Sk3 On Interconnect Board
No ratings yet
Command Input (+VE) (-VE) CPT Output: Network Communications Sk3 On Interconnect Board
3 pages
Dante Controller-4.3.x-V1.21 User Guide
No ratings yet
Dante Controller-4.3.x-V1.21 User Guide
127 pages
Object-Oriented Programming (OOPS-1)
No ratings yet
Object-Oriented Programming (OOPS-1)
10 pages
Static and Dynamic Call
100% (1)
Static and Dynamic Call
19 pages
A Survey of Moving Target Defenses For Network Security
No ratings yet
A Survey of Moving Target Defenses For Network Security
35 pages
Connector For OneDrive For Business URG
No ratings yet
Connector For OneDrive For Business URG
16 pages
Ubiquiti UF OLT 4 Quick Start Guide A1
No ratings yet
Ubiquiti UF OLT 4 Quick Start Guide A1
14 pages
Docs
No ratings yet
Docs
26 pages
Web Design and Programming CH-1
No ratings yet
Web Design and Programming CH-1
21 pages
Hwmanr 4
No ratings yet
Hwmanr 4
602 pages
Computer Programming 2
No ratings yet
Computer Programming 2
21 pages
Ite06 Big Data Analytics-Qbank
No ratings yet
Ite06 Big Data Analytics-Qbank
18 pages
Courier - M - R. Finalpdf
No ratings yet
Courier - M - R. Finalpdf
38 pages
1D - Tom and Pie Eating
No ratings yet
1D - Tom and Pie Eating
16 pages
Xerox AltaLink C80xx EWS SoftwareUpgrade Instructions 27400
No ratings yet
Xerox AltaLink C80xx EWS SoftwareUpgrade Instructions 27400
9 pages
Lab 2
No ratings yet
Lab 2
4 pages
1z0-819 Exam - Free Actual Q&As, Page 1 - ExamTopics 2 (1) 2
No ratings yet
1z0-819 Exam - Free Actual Q&As, Page 1 - ExamTopics 2 (1) 2
106 pages
Network Case Study
No ratings yet
Network Case Study
3 pages
Balancing Symbols
No ratings yet
Balancing Symbols
10 pages
Frequency Response
No ratings yet
Frequency Response
97 pages
Advanced Lab Guide
No ratings yet
Advanced Lab Guide
21 pages
AI Infrastructure Ecosystem 2022
No ratings yet
AI Infrastructure Ecosystem 2022
100 pages
REST API Using Django Rest Framework
No ratings yet
REST API Using Django Rest Framework
52 pages
Payment
No ratings yet
Payment
1 page
APM32S103x8xB Datasheet V1.2
No ratings yet
APM32S103x8xB Datasheet V1.2
64 pages
Project Report On Blood Donation System PDF
No ratings yet
Project Report On Blood Donation System PDF
154 pages
Manual v3 Honda Edition (EN)
No ratings yet
Manual v3 Honda Edition (EN)
19 pages
00-Ruijie Reyee RG-M, EW Series Home Wi-Fi Routers Implementation Cookbook (V1.7)
No ratings yet
00-Ruijie Reyee RG-M, EW Series Home Wi-Fi Routers Implementation Cookbook (V1.7)
119 pages
CS614 Quiz-4 by Vu Topper RM
No ratings yet
CS614 Quiz-4 by Vu Topper RM
29 pages