CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
PARALLEL PROCESSING ARCHITECTURES CS213 SYLLABUS Winter 2008 INSTRUCTOR: L.N. Bhuyan (http://www.engr.ucr.edu/~bhuyan/) PHONE: (951) 827-2347 E-mail: [email protected] LECTURE TIME: TR 12:40pm-2pm PLACE: HMNSS 1502 OFFICE HOURS: W 2.00-4.00 or By Appointment
References:
John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Morgan Kauffman Publisher. Research Papers to be available in the class
Introduction to Parallel Processing: Flynns classification, SIMD and MIMD operations, Shared Memory vs. message passing multiprocessors, Distributed shared memory Shared Memory Multiprocessors: SMP and CC-NUMA architectures, Cache coherence protocols, Consistency protocols, Data pre-fetching, CC-NUMA memory management, SGI 4700 multiprocessor, Chip Multiprocessors, Network Processors (IXP and Cavium) Interconnection Networks: Static and Dynamic networks, switching techniques, Internet techniques Message Passing Architectures: Message passing paradigms, Grid architecture, Workstation clusters, User level software Multiprocessor Scheduling: Scheduling and mapping, Internet web servers, P2P, Content aware load balancing
COURSE OUTLINE:
PREREQUISITE: CS 203A
GRADING:
Project I 20 points Project II 30 points Test 1 20 points Test 2 - 30 points
Possible Projects
Experiments with SGI Altix 4700 Supercomputer Algorithm design and FPGA offloading I/O Scheduling on SGI Chip Multiprocessor (CMP) Design, analysis and simulation P2P Using Planet Lab Note: 2 students/group Expect submission of a paper to a conference
SimpleScalar www.simplescalar.com Look for multiprocessor extensions NepSim: http: www.cs.ucr.edu/~yluo/nepsim/ Working in a cluster environment Beowulf Cluster www.beowulf.org MPI www-unix.mcs.anl.gov/mpi Application Benchmarks http://www-flash.stanford.edu/apps/SPLASH/
Parallel Computers
Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.
Almasi and Gottlieb, Highly Parallel Computing ,1989
Process Level or Thread level parallelism; mainstream for general purpose computing?
Servers are parallel High-end Desktop dual processor PC soon?? (or just the sell the socket?)
Why Multiprocessors?
1. Microprocessors as the fastest CPUs
Collecting several much easier than redesigning 1
3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS) 4. Emergence of embedded and server markets driving microprocessors in addition to desktops
Embedded functional parallelism Network processors exploiting packet-level parallelism SMP Servers and cluster of workstations for multiple users Less demand for parallel computing
Uniprocessors
MIMD current winner: Concentrate on major design emphasis <= 128 processor MIMD machines
communication through explicit message passing through send and receive operations.
EX: IBM SP2, Cray XD1, and Clusters 2. Shared Memory Multiprocessor All processors share the
same address space. Interprocessor communication through load/store operations to a shared memory.
EX: SMP Servers, SGI Origin, HP V-Class, Cray T3E Their advantages and disadvantages?
(b) Hardware Distributed Shared Memory (DSM) Multiprocessor Memory is distributed, but the address space is
shared Non-Uniform Memory Access (NUMA) (c) Software DSM A level of o/s built on top of message passing multiprocessor to give a shared memory view to the programmer.
Data parallel programming languages still useful, do communication all at once: Bulk Synchronous phases in which all communicate after a global barrier
Data Mapping in HPF 1. To reduce interprocessor communication 2. Load balancing among processors http://www.npac.syr.edu/hpfa/ http://www.crpc.rice.edu/HPFF/
Shared Memory
Communication Models
Processors communicate with shared address space Easy on small-scale machines Advantages:
Model of choice for uniprocessors, small-scale MPs Ease of programming Lower latency Easier to use hardware controlled caching
Message passing
Processors have private memories, communicate via messages Advantages:
Less hardware, easier to design Good scalability Focuses attention on costly non-local operations
Based on timesharing: processes on multiple processors vs. sharing single processor process: a virtual address space and ~ 1 thread of control
Multiple processes can overlap (share), but ALL threads share a process address space
Writes to shared address space by one thread are visible to reads of other threads
Usual model: share code, private stack, some shared heap, some private heap
Based on timesharing: processes on multiple processors vs. sharing single processor process: a virtual address space and ~ 1 thread of control
Multiple processes can overlap (share), but ALL threads share a process address space
Writes to shared address space by one thread are visible to reads of other threads
Usual model: share code, private stack, some shared heap, some private heap