0% found this document useful (0 votes)

96 views

Introduction To Parallel Computing CIS 410/510 Department of Computer and Information Science

This document provides an overview of the CIS 410/510 "Introduction to Parallel Computing" course at the University of Oregon. The course was developed to bring parallel computing education to the undergraduate curriculum. It will cover key topics in parallel computing over 10 weeks. Students will learn parallel programming techniques hands-on through programming assignments and labs using shared memory parallel programming models and libraries like Cilk Plus, TBB, and OpenMP. The course involves collaboration with Intel and makes use of the department's parallel computing clusters.

Uploaded by

ed mac

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Introduction To Parallel Computing CIS 410/510 Department of Computer and Information Science

Uploaded by

ed mac

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

Overview

Introduction to Parallel Computing

CIS 410/510
Department of Computer and Information Science

Lecture 1 – Overview
Outline
 Course Overview
❍ What is CIS 410/510?
❍ What is expected of you?

❍ What will you learn in CIS 410/510?

 Parallel Computing
❍ What is it?
❍ What motivates it?

❍ Trends that shape the field

❍ Large-scale problems and high-performance

❍ Parallel architecture types

❍ Scalable parallel computing and performance

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 2

How did the idea for CIS 410/510 originate?
 There has never been an undergraduate course in parallel computing
in the CIS Department at UO
 Only 1 course taught at the graduate level (CIS 631)

 Goal is to bring parallel computing education in CIS undergraduate

curriculum, start at senior level

❍ CIS 410/510 (Spring 2014, “experimental” course)
❍ CIS 431/531 (Spring 2015, “new” course)

 CIS 607 – Parallel Computing Course Development

❍ Winter 2014 seminar to plan undergraduate course
❍ Develop 410/510 materials, exercises, labs, …

 Intel gave a generous donation ($100K) to the effort

 NSF and IEEE are spearheading a curriculum initiative for

undergraduate education in parallel processing

http://www.cs.gsu.edu/~tcpp/curriculum/
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 3
Who’s involved?
 Instructor
❍ Allen D. Malony
◆ scalable parallel computing
◆ parallel performance analysis
◆ taught CIS 631 for the last 10 years
 Faculty colleagues and course co-designers
❍ Boyana Norris
◆ Automated software analysis and transformation
◆ Performance analysis and optimization
❍ Hank Childs
◆ Large-scale, parallel scientific visualization
◆ Visualization of large data sets
 Intel scientists
❍ Michael McCool, James Reinders, Bob MacKay
 Graduate students doing research in parallel computing
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 4
Intel Partners
 James Reinders
❍ Director, Software Products
❍ Multi-core Evangelist

 Michael McCool
❍ Software architect
❍ Former Chief scientist, RapidMind

❍ Adjunct Assoc. Professor, University of Waterloo

 Arch Robison
❍ Architect of Threading Building Blocks
❍ Former lead developers of KAI C++

 David MacKay
❍ Manager of software product consulting team
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 5
CIS 410/510 Graduate Assistants
 Daniel Ellsworth
❍ 3rd year Ph.D. student
❍ Research advisor (Prof. Malony)

❍ Large-scale online system introspection

 David Poliakoff
❍ 2nd year Ph.D. student
❍ Research advisor (Prof. Malony)

❍ Compiler-based performance analysis

 Brandon Hildreth
❍ 1st year Ph.D. student
❍ Research advisor (Prof. Malony)

❍ Automated performance experimentation

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 6

Required Course Book
 “Structured Parallel Programming: Patterns for Efficient Computation,” Michael
McCool,
Arch Robinson, James Reinders,
1st edition, Morgan Kaufmann,
ISBN: 978-0-12-415993-8, 2012
http://parallelbook.com/
 Presents parallel programming

from a point of view of patterns

relevant to parallel computation
❍ Map, Collectives, Data
reorganization, Stencil and recurrence,
Fork-Join, Pipeline
 Focuses on the use of shared
memory parallel programming
languages and environments
❍ Intel Thread Building Blocks (TBB)
❍ Intel Cilk Plus

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 7

Reference Textbooks
 Introduction to Parallel Computing, A. Grama,
A. Gupta, G. Karypis, V. Kumar, Addison Wesley,
2nd Ed., 2003
❍ Lecture slides from authors online
❍ Excellent reference list at end

❍ Used for CIS 631 before

❍ Getting old for latest hardware

 Designing and Building Parallel Programs,

Ian Foster, Addison Wesley, 1995.
❍ Entire book is online!!!
❍ Historical book, but very informative

 Patterns for Parallel Programming T. Mattson,

B. Sanders, B. Massingill, Addison Wesley, 2005.
❍ Targets parallel programming
❍ Pattern language approach to parallel

program design and development

❍ Excellent references

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 8

What do you mean by experimental course?
 Given that this is the first offering of parallel
computing in the undergraduate curriculum, we
want to evaluate how well it worked
 We would like to receive feedback from students

throughout the course

❍ Lecture content and understanding
❍ Parallel programming learning experience

❍ Book and other materials

 Your experiences will help to update the course for

it debut offering in (hopefully) Spring 2015
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 9
Course Plan
 Organize the course so that cover main areas of parallel
computing in the lectures
❍ Architecture (1 week)
❍ Performance models and analysis (1 week)

❍ Programming patterns (paradigms) (3 weeks)

❍ Algorithms (2 weeks)

❍ Tools (1 week)

❍ Applications (1 week)

❍ Special topics (1 week)

 Augment lecture with a programming lab

❍ Students will take the lab with the course
◆ graded assignments and term project will be posted
❍ Targeted specifically to shared memory parallelism
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 10
Lectures
 Book and online materials are you main sources for
broader and deeper background in parallel computing
 Lectures should be more interactive
❍ Supplement other sources of information
❍ Covers topics of more priority

❍ Intended to give you some of my perspective

❍ Will provide online access to lecture slides

 Lectures will complement programming component,

but intended to cover other parallel computing aspects
 Try to arrange a guest lecture or 2 during quarter

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 11

Parallel Programming Lab
 Set up in the IPCC classroom
❍ Daniel Ellsworth and David Poliakoff leading the lab
 Shared memory parallel programming (everyone)
❍ Cilk Plus ( http://www.cilkplus.org/ )
◆ extension to the C and C++ languages to support data and task parallelism
❍ Thread Building Blocks (TBB) (
https://www.threadingbuildingblocks.org/ )
◆ C++ template library for task parallelism
❍ OpenMP (http://openmp.org/wp/ )
◆ C/C++ and Fortran directive-based parallelism
 Distributed memory message passing (graduate)
❍ MPI (http://en.wikipedia.org/wiki/Message_Passing_Interface )
◆ library for message communication on scalable parallel systems

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 12

WOPR (What Operational Parallel Resource)
 WOPR was built from whole cloth
❍ Constructed by UO graduate students
 Built Next Unit of Computing (NUC)
cluster with Intel funds
❍ 16x Intel NUC
◆ Haswell i5 CPU (2 cores, hyperthreading)
◆ Intel HD 4000 GPU (OpenCL programmable)
◆ 1 GigE, 16 GB memory, 240 GB mSATA
◆ 16x Logitech keyboard and mouse
❍ 16x ViewSonic 22” monitor
❍ Dell Edge GigE switch

❍ Dell head node

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 13

Other Parallel Resources – Mist Cluster
• Distributed memory cluster
• 16 8-core nodes
– 2x quad-core Pentium Xeon (2.33 GHz)
– 16 Gbyte memory
– 160 Gbyte disk
• Dual Gigabit ethernet adaptors
• Master node (same specs)
• Gigabit ethernet switch
• mist.nic.uoregon.edu

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 14

Other Parallel Resources – ACISS Cluster
 Applied Computational Instrument for Scientific Synthesis
❍ NSF MRI R2 award (2010)
 Basic nodes (1,536 total cores)
❍ 128 ProLiant SL390 G7
❍ Two Intel X5650 2.66 GHz

6-core CPUs per node

❍ 72GB DDR3 RAM per basic node

 Fat nodes (512 total cores)

❍ 16 ProLiant DL 580 G7
❍ Four Intel X7560 2.266 GHz

8-core CPUs per node

❍ 384GB DDR3 per fat node

 GPU nodes (624 total cores, 156 GPUs)

 52 ProLiant SL390 G7 nodes, 3 NVidia M2070 GPUs (156 total GPUs)
 Two Intel X5650 2.66 GHz 6-core CPUs per node (624 total cores)
 72GB DDR3 per GPU node
 ACISS has 2672 total cores
 ACISS is located in the UO Computing Center
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 15
Course Assignments
 Homework
❍ Exercises primarily to prepare for midterm
 Parallel programming lab
❍ Exercises for parallel programming patterns
❍ Program using Cilk Plus, Thread Building Blocks, OpenMP

❍ Graduate students will also do assignments with MPI

 Team term project

❍ Programming, presentation, paper
❍ Graduate students distributed across teams

 Research summary paper (graduate students)

 Midterm exam later in the 7th week of the quarter

 No final exam
❍ Team project presentations during final period
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 16
Parallel Programming Term Project
 Major programming project for the course
❍ Non-trivial parallel application
❍ Include performance analysis

❍ Use NUC cluster and possibly Mist and ACISS clusters

 Project teams
❍ 5 person teams, 6 teams (depending on enrollment)
❍ Will try our best to balance skills

❍ Have 1 graduate student per team

 Project dates
❍ Proposal due end of 4th week)
❍ Project talk during last class

❍ Project due at the end of the term

 Need to get system accounts!!!

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 17
Term Paper (for graduate students)
• Investigate parallel computing topic of interest
– More in depth review
– Individual choice
– Summary of major points
• Requires minimum of ten references
– Book and other references has a large bibliography
– Google Scholar, Keywords: parallel computing
– NEC CiteSeer Scientific Literature Digital Library
• Paper abstract and references due by 3rd week
• Final term paper due at the end of the term
• Individual work
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 18
Grading
 Undergraduate
❍ 5% homework
❍ 10% pattern programming labs
❍ 20% programming assignments
❍ 30% midterm exam
❍ 35% project
 Graduate
❍ 15% programming assignments
❍ 30% midterm exam
❍ 35% project
❍ 20% research paper
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 19
Overview
 Broad/Old field of computer science concerned with:
❍ Architecture, HW/SW systems, languages, programming
paradigms, algorithms, and theoretical models
❍ Computing in parallel

 Performance is the raison d’être for parallelism

❍ High-performance computing
❍ Drives computational science revolution

 Topics of study
❍ Parallel architectures  Parallel performance
❍ Parallel programming models and tools
❍ Parallel algorithms  Parallel applications
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 20
What will you get out of CIS 410/510?
• In-depth understanding of parallel computer design
• Knowledge of how to program parallel computer
systems
• Understanding of pattern-based parallel
programming
• Exposure to different forms parallel algorithms
• Practical experience using a parallel cluster
• Background on parallel performance modeling
• Techniques for empirical performance analysis
• Fun and new friends
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 21
Parallel Processing – What is it?
• A parallel computer is a computer system that uses
multiple processing elements simultaneously in a
cooperative manner to solve a computational problem
• Parallel processing includes techniques and
technologies that make it possible to compute in parallel
– Hardware, networks, operating systems, parallel libraries,
languages, compilers, algorithms, tools, …
• Parallel computing is an evolution of serial computing
– Parallelism is natural
– Computing problems differ in level / type of parallelism
• Parallelism is all about performance! Really?
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 22
Concurrency
• Consider multiple tasks to be executed in a computer
• Tasks are concurrent with respect to each if
– They can execute at the same time (concurrent execution)
– Implies that there are no dependencies between the tasks
• Dependencies
– If a task requires results produced by other tasks in order to
execute correctly, the task’s execution is dependent
– If two tasks are dependent, they are not concurrent
– Some form of synchronization must be used to enforce (satisfy)
dependencies
• Concurrency is fundamental to computer science
– Operating systems, databases, networking, …
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 23
Concurrency and Parallelism
• Concurrent is not the same as parallel! Why?
• Parallel execution
– Concurrent tasks actually execute at the same time
– Multiple (processing) resources have to be available
• Parallelism = concurrency + “parallel” hardware
– Both are required
– Find concurrent execution opportunities
– Develop application to execute in parallel
– Run application on parallel hardware
• Is a parallel application a concurrent application?
• Is a parallel application run with one processor parallel?
Why or why not?
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 24
Parallelism
• There are granularities of parallelism (parallel execution) in
programs
– Processes, threads, routines, statements, instructions, …
– Think about what are the software elements that execute
concurrently
• These must be supported by hardware resources
– Processors, cores, … (execution of instructions)
– Memory, DMA, networks, … (other associated operations)
– All aspects of computer architecture offer opportunities for parallel
hardware execution
• Concurrency is a necessary condition for parallelism
– Where can you find concurrency?
– How is concurrency expressed to exploit parallel systems?
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 25
Why use parallel processing?
• Two primary reasons (both performance related)
– Faster time to solution (response time)
– Solve bigger computing problems (in same time)
• Other factors motivate parallel processing
– Effective use of machine resources
– Cost efficiencies
– Overcoming memory constraints
• Serial machines have inherent limitations
– Processor speed, memory bottlenecks, …
• Parallelism has become the future of computing
• Performance is still the driving concern
• Parallelism = concurrency + parallel HW + performance
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 26
Perspectives on Parallel Processing
• Parallel computer architecture
– Hardware needed for parallel execution?
– Computer system design
• (Parallel) Operating system
– How to manage systems aspects in a parallel computer
• Parallel programming
– Libraries (low-level, high-level)
– Languages
– Software development environments
• Parallel algorithms
• Parallel performance evaluation
• Parallel tools
– Performance, analytics, visualization, …
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 27
Why study parallel computing today?
• Computing architecture
– Innovations often drive to novel programming models
• Technological convergence
– The “killer micro” is ubiquitous
– Laptops and supercomputers are fundamentally similar!
– Trends cause diverse approaches to converge
• Technological trends make parallel computing inevitable
– Multi-core processors are here to stay!
– Practically every computing system is operating in parallel
• Understand fundamental principles and design tradeoffs
– Programming, systems support, communication, memory, …
– Performance
• Parallelism is the future of computing
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 28
Inevitability of Parallel Computing
• Application demands
– Insatiable need for computing cycles
• Technology trends
– Processor and memory
• Architecture trends
• Economics
• Current trends:
– Today’s microprocessors have multiprocessor support
– Servers and workstations available as multiprocessors
– Tomorrow’s microprocessors are multiprocessors
– Multi-core is here to stay and #cores/processor is growing
– Accelerators (GPUs, gaming systems)
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 29
Application Characteristics
• Application performance demands hardware advances
• Hardware advances generate new applications
• New applications have greater performance demands
– Exponential increase in microprocessor performance
– Innovations in parallel architecture and integration
performance
applications

hardware
• Range of performance requirements
– System performance must also improve as a whole
– Performance requirements require computer engineering
– Costs addressed through technology advancements
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 30
Broad Parallel Architecture Issues
• Resource allocation
– How many processing elements?
– How powerful are the elements?
– How much memory?
• Data access, communication, and synchronization
– How do the elements cooperate and communicate?
– How are data transmitted between processors?
– What are the abstractions and primitives for cooperation?
• Performance and scalability
– How does it all translate into performance?
– How does it scale?
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 31
Leveraging Moore’s Law

• More transistors = more parallelism opportunities

• Microprocessors
– Implicit parallelism
• pipelining
• multiple functional units
• superscalar
– Explicit parallelism
• SIMD instructions
• long instruction works

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 32

What’s Driving Parallel Computing Architecture?

von Neumann bottleneck!!

(memory wall)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 33

Microprocessor Transitor Counts (1971-2011)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 34

What has happened in the last several years?
• Processing chip manufacturers increased processor
performance by increasing CPU clock frequency
– Riding Moore’s law
• Until the chips got too hot!
– Greater clock frequency  greater electrical power
– Pentium 4 heat sink  Frying an egg on a Pentium 4

• Add multiple cores to add performance

– Keep clock frequency same or reduced
– Keep lid on power requirements
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 35
Power Density Growth

Figure courtesy of Pat Gelsinger, Intel Developer Forum, Spring 2004

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 36

What’s Driving Parallel Computing Architecture?

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 37

What’s Driving Parallel Computing Architecture?

power wall

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 38

Classifying Parallel Systems – Flynn’s Taxonomy

• Distinguishes multi-processor computer architectures

along the two independent dimensions
– Instruction and Data
– Each dimension can have one state: Single or Multiple
• SISD: Single Instruction, Single Data
– Serial (non-parallel) machine
• SIMD: Single Instruction, Multiple Data
– Processor arrays and vector machines
• MISD: Multiple Instruction, Single Data (weird)
• MIMD: Multiple Instruction, Multiple Data
– Most common parallel computer systems
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 39
Parallel Architecture Types
• Instruction-Level Parallelism
– Parallelism captured in instruction processing
• Vector processors
– Operations on multiple data stored in vector registers
• Shared-memory Multiprocessor (SMP)
– Multiple processors sharing memory
– Symmetric Multiprocessor (SMP)
• Multicomputer
– Multiple computer connect via network
– Distributed-memory cluster
• Massively Parallel Processor (MPP)
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 40
Phases of Supercomputing (Parallel) Architecture

• Phase 1 (1950s): sequential instruction execution

• Phase 2 (1960s): sequential instruction issue
– Pipeline execution, reservations stations
– Instruction Level Parallelism (ILP)
• Phase 3 (1970s): vector processors
– Pipelined arithmetic units
– Registers, multi-bank (parallel) memory systems
• Phase 4 (1980s): SIMD and SMPs
• Phase 5 (1990s): MPPs and clusters
– Communicating sequential processors
• Phase 6 (>2000): many cores, accelerators, scale, …
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 41
Performance Expectations
 If each processor is rated at k MFLOPS and there are p
processors, we should expect to see k*p MFLOPS
performance? Correct?
 If it takes 100 seconds on 1 processor, it should take 10

seconds on 10 processors? Correct?

 Several causes affect performance
❍ Each must be understood separately
❍ But they interact with each other in complex ways

◆ solution to one problem may create another

◆ one problem may mask another
 Scaling (system, problem size) can change conditions
 Need to understand performance space

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 42

Scalability
• A program can scale up to use many processors
– What does that mean?
• How do you evaluate scalability?
• How do you evaluate scalability goodness?
• Comparative evaluation
– If double the number of processors, what to expect?
– Is scalability linear?
• Use parallel efficiency measure
– Is efficiency retained as problem size increases?
• Apply performance metrics
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 43
Top 500 Benchmarking Methodology
• Listing of the world’s 500 most powerful computers
• Yardstick for high-performance computing (HPC)
– Rmax : maximal performance Linpack benchmark
• dense linear system of equations (Ax = b)
• Data listed
– Rpeak : theoretical peak performance
– Nmax : problem size needed to achieve Rmax
– N1/2 : problem size needed to achieve 1/2 of Rmax
– Manufacturer and computer type
– Installation site, location, and year
• Updated twice a year at SC and ISC conferences
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 44
Top 10 (November 2013)
Different architectures

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 45

Top 500 – Performance (November 2013)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 46

#1: NUDT Tiahne-2 (Milkyway-2)
 Compute Nodes have 3.432 Tflop/s per node
❍ 16,000 nodes
❍ 32000 Intel Xeon CPU

❍ 48000 Intel Xeon Phi

 Operations Nodes
❍ 4096 FT CPUs
 Proprietary interconnect
❍ TH2 express
 1PB memory
❍ Host memory only
 Global shared parallel storage is 12.4 PB
 Cabinets: 125+13+24 =162
❍ Compute, communication, storage
❍ ~750 m2

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 47

#2: ORNL Titan Hybrid System (Cray XK7)

 Peak performance of 27.1 PF

❍ 24.5 GPU + 2.6 CPU
 18,688 Compute Nodes each with:
❍ 16-Core AMD Opteron CPU
❍ NVIDIA Tesla “K20x” GPU

❍ 32 + 6 GB memory

 512 Service and I/O nodes 4,352 ft2

 200 Cabinets

 710 TB total system memory

 Cray Gemini 3D Torus Interconnect

 8.9 MW peak power

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 48

#3: LLNL Sequoia (IBM BG/Q)
 Compute card
❍ 16-core PowerPC A2
processor
❍ 16 GB DDR3

 Compute node has

98,304 cards
 Total system size:
❍ 1,572,864 processing
cores
❍ 1.5 PB memory

 5-dimensional torus
interconnection
network
 Area of 3,000 ft2

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 49

#4: RIKEN K Computer

 80,000 CPUs
❍ SPARC64 VIIIfx
❍ 640,000 cores

 800 water-cooled racks

 5D mesh/torus interconnect (Tofu)
❍ 12 links between node
❍ 12x higher scalability than 3D torus

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 50

Contemporary HPC Architectures
Date System Location Comp Comm Peak Power
(PF) (MW)
2009 Jaguar; Cray XT5 ORNL AMD 6c Seastar2 2.3 7.0

2010 Tianhe-1A NSC Tianjin Intel + NVIDIA Proprietary 4.7 4.0

2010 Nebulae NSCS Intel + NVIDIA IB 2.9 2.6
Shenzhen
2010 Tsubame 2 TiTech Intel + NVIDIA IB 2.4 1.4
2011 K Computer RIKEN/Kobe SPARC64 VIIIfx Tofu 10.5 12.7
2012 Titan; Cray XK6 ORNL AMD + NVIDIA Gemini 27 9

2012 Mira; BlueGeneQ ANL SoC Proprietary 10 3.9

2012 Sequoia; LLNL SoC Proprietary 20 7.9
BlueGeneQ
2012 Blue Waters; Cray NCSA/UIUC AMD + (partial) Gemini 11.6
NVIDIA
2013 Stampede TACC Intel + MIC IB 9.5 5
2013 Tianhe-2 NSCC-GZ Intel + MIC Proprietary 54 ~20
(Guangzhou)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 51

Top 10 (Top500 List, June 2011)

Figure credit: http://www.netlib.org/utk/people/JackDongarra/SLIDES/korea-2011.pdf

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 52

Japanese K Computer (#1 in June 2011)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 53

Top 500 Top 10 (2006)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 54

Top 500 Linpack Benchmark List (June 2002)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 55

Japanese Earth Simulator
• World’s fastest supercomputer!!! (2002)
– 640 NEC SX-6 nodes
• 8 vector processors
– 5104 total processors
– Single stage crossbar
• ~2900 meters of cables
– 10 TB memory
– 700 TB disk space
– 1.6 PB mass storage
– 40 Tflops peak performance
– 35.6 Tflops Linpack performance
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 56
Prof. Malony and colleagues at Japanese ES

Mitsuhisa Sato
Barbara Chapman
Matthias Müller

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 57

Performance Development in Top 500

Figure credit: http://www.netlib.org/utk/people/JackDongarra/SLIDES/korea-2011.pdf

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 58

Exascale Initiative
 Exascale machines are targeted for 2019
 What are the potential differences and problems?

???

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 59

Major Changes to Software and Algorithms
 What were we concerned about before and now?
 Must rethink the design for exascale
❍ Data movement is expensive (Why?)
❍ Flops per second are cheap (Why?)

 Need to reduce communication and sychronization

 Need to develop fault-resilient algorithms

 How do with deal with massive parallelism?

 Software must adapt to the hardware (autotuning)

Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 60

Supercomputing and Computational Science
• By definition, a supercomputer is of a class of computer
systems that are the most powerful computing platforms at
that time
• Computational science has always lived at the leading (and
bleeding) edge of supercomputing technology
• “Most powerful” depends on performance criteria
– Performance metrics related to computational algorithms
– Benchmark “real” application codes
• Where does the performance come from?
– More powerful processors
– More processors (cores)
– Better algorithms
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 61
Computational Science
• Traditional scientific methodology
– Theoretical science
• Formal systems and theoretical models
• Insight through abstraction, reasoning through proofs
– Experimental science
• Real system and empirical models
• Insight from observation, reasoning from experiment design
• Computational science
– Emerging as a principal means of scientific research
– Use of computational methods to model scientific problems
• Numerical analysis plus simulation methods
• Computer science tools
– Study and application of these solution techniques
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 62
Computational Challenges
• Computational science thrives on computer power
– Faster solutions
– Finer resolution
– Bigger problems
– Improved interaction
– BETTER SCIENCE!!!
• How to get more computer power?
– Scalable parallel computing
• Computational science also thrives better integration
– Couple computational resources
– Grid computing
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 63
Scalable Parallel Computing
• Scalability in parallel architecture
– Processor numbers
– Memory architecture
– Interconnection network
– Avoid critical architecture bottlenecks
• Scalability in computational problem
– Problem size
– Computational algorithms
• Computation to memory access ratio
• Computation to communication ration
• Parallel programming models and tools
• Performance scalability
Introduction to Parallel Computing, University of Oregon, IPCC Lecture 1 – Overview 64
Next Lectures

• Parallel computer architectures

• Parallel performance models

CIS 410/510: Parallel Computing, University of Oregon, Spring 2014 Lecture 1 – Overview 65

CIT 316 Compiler Construction I March 24 2022
100% (1)
CIT 316 Compiler Construction I March 24 2022
234 pages
Hubspot Case Analysis
No ratings yet
Hubspot Case Analysis
14 pages
Annex D AOP Forms 2022 LCHO
100% (3)
Annex D AOP Forms 2022 LCHO
170 pages
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
No ratings yet
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
20 pages
ece569-syllabu
No ratings yet
ece569-syllabu
3 pages
CS526 1 Intro
No ratings yet
CS526 1 Intro
15 pages
Cache Nptel
No ratings yet
Cache Nptel
3 pages
Teaching VLSI Design To Today's Students: Session 2532
No ratings yet
Teaching VLSI Design To Today's Students: Session 2532
11 pages
Embedded Systems Advanced Course Syllabus
No ratings yet
Embedded Systems Advanced Course Syllabus
11 pages
Operating System Lab Manual
No ratings yet
Operating System Lab Manual
42 pages
Orientation To Computing-I LTP:200: WWW - Lpu.in Lovely Professional University
No ratings yet
Orientation To Computing-I LTP:200: WWW - Lpu.in Lovely Professional University
10 pages
Co Kit Lab PDF
No ratings yet
Co Kit Lab PDF
23 pages
Onur 18 742 Fall12 Lecture1 Intro Afterlecture
No ratings yet
Onur 18 742 Fall12 Lecture1 Intro Afterlecture
36 pages
CSC412 Compiler Construction I March 24 2022 NOUN-pages-1
No ratings yet
CSC412 Compiler Construction I March 24 2022 NOUN-pages-1
48 pages
Multicore Architecture and Programming1 - P21EC7024
No ratings yet
Multicore Architecture and Programming1 - P21EC7024
4 pages
Lab Manual - LP V - LA 3.docx
No ratings yet
Lab Manual - LP V - LA 3.docx
14 pages
Artificial Intelligence Tcm4-123688
No ratings yet
Artificial Intelligence Tcm4-123688
132 pages
OS Course Outline
No ratings yet
OS Course Outline
5 pages
HND-OOP Assignment
100% (2)
HND-OOP Assignment
25 pages
CS XII_Project_SANNIDH
No ratings yet
CS XII_Project_SANNIDH
35 pages
00 - Introduction to Parallel and Distributed Computing
No ratings yet
00 - Introduction to Parallel and Distributed Computing
3 pages
CSE303 CourseOutline Spring2024 IUB
No ratings yet
CSE303 CourseOutline Spring2024 IUB
6 pages
Cse314 Advanced-computer-Architecture TH 1.10 Ac26
No ratings yet
Cse314 Advanced-computer-Architecture TH 1.10 Ac26
2 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
Linux Programming Tools Unveiled
From Everand
Linux Programming Tools Unveiled
N. B. Venkateswarlu
No ratings yet
Lec01 Introduction
No ratings yet
Lec01 Introduction
69 pages
Internship Summary Presentation: Dongwei Mei Electrical and Computer Engineering, WPI Broadcom Corporation Dec 2012
No ratings yet
Internship Summary Presentation: Dongwei Mei Electrical and Computer Engineering, WPI Broadcom Corporation Dec 2012
30 pages
Document 1
No ratings yet
Document 1
32 pages
CS114 - Fundamentals of Programming: Qurrat-Ul-Ain Babar
100% (1)
CS114 - Fundamentals of Programming: Qurrat-Ul-Ain Babar
80 pages
Jupyter Notebooks—a Publishing Format for Reproducible Computational Workflows
No ratings yet
Jupyter Notebooks—a Publishing Format for Reproducible Computational Workflows
4 pages
ComputerOrganizationAndSoftwareSystems Flipped HO
No ratings yet
ComputerOrganizationAndSoftwareSystems Flipped HO
10 pages
Java PDF
No ratings yet
Java PDF
382 pages
01 Intro ELEC462
No ratings yet
01 Intro ELEC462
42 pages
1stand2nd Class UNIT 1 OOP.pptx
No ratings yet
1stand2nd Class UNIT 1 OOP.pptx
50 pages
Computer Science: Object Oriented Programming (OOP)
No ratings yet
Computer Science: Object Oriented Programming (OOP)
4 pages
Data Structures & Algorithms
89% (18)
Data Structures & Algorithms
431 pages
Assembly Language For Intel Based Computers 5 e de 59d6abac1723ddb5d0d18770
No ratings yet
Assembly Language For Intel Based Computers 5 e de 59d6abac1723ddb5d0d18770
5 pages
Foundations of Programming Languages (Cuuduongthancong - Com)
100% (4)
Foundations of Programming Languages (Cuuduongthancong - Com)
382 pages
Big Data Applications, Software, Hardware and Curricula
No ratings yet
Big Data Applications, Software, Hardware and Curricula
71 pages
Lectures 01 Introduction
No ratings yet
Lectures 01 Introduction
46 pages
1.6 Final Thoughts: 1 Parallel Programming Models 49
No ratings yet
1.6 Final Thoughts: 1 Parallel Programming Models 49
5 pages
BCS 413 Course Outline
No ratings yet
BCS 413 Course Outline
3 pages
Dr. R. Manikandan Assistant Professor (Senior Grade) Vit Bhopal University
No ratings yet
Dr. R. Manikandan Assistant Professor (Senior Grade) Vit Bhopal University
92 pages
Introduction to Scientific Programming
No ratings yet
Introduction to Scientific Programming
17 pages
Software List
No ratings yet
Software List
21 pages
Research at Northeastern University: - I/O Storage Modeling and Performance
No ratings yet
Research at Northeastern University: - I/O Storage Modeling and Performance
41 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
CS526_1_Intro
No ratings yet
CS526_1_Intro
10 pages
COMP1126 Course Information
No ratings yet
COMP1126 Course Information
7 pages
(Ebook) An Introduction to Programming with IDL: Interactive Data Language by Kenneth P. Bowman ISBN 9780080489278, 9780120885596, 012088559X, 0080489273 download
100% (2)
(Ebook) An Introduction to Programming with IDL: Interactive Data Language by Kenneth P. Bowman ISBN 9780080489278, 9780120885596, 012088559X, 0080489273 download
46 pages
Teaching The Compilers Course: Alfred V. Aho
No ratings yet
Teaching The Compilers Course: Alfred V. Aho
4 pages
Eee 3132 Part 1 Lecture 1 to 8
No ratings yet
Eee 3132 Part 1 Lecture 1 to 8
366 pages
L01 - Introduction
No ratings yet
L01 - Introduction
51 pages
Advanced Programming Techniques: Course 689-???
No ratings yet
Advanced Programming Techniques: Course 689-???
5 pages
Operating System Lab
No ratings yet
Operating System Lab
3 pages
Geophysical Data Analysis Using Python
No ratings yet
Geophysical Data Analysis Using Python
9 pages
2016/2017 Design of Integrated Systems For Digital Processing
No ratings yet
2016/2017 Design of Integrated Systems For Digital Processing
4 pages
Ip 1
No ratings yet
Ip 1
42 pages
Introduction and Course Outline: Advanced Operating Systems (M)
No ratings yet
Introduction and Course Outline: Advanced Operating Systems (M)
21 pages
Cit 421
No ratings yet
Cit 421
240 pages
A1750496823 - 28897 - 26 - 2023 - Zero Lecture - CSE111 (Updated) PPT
No ratings yet
A1750496823 - 28897 - 26 - 2023 - Zero Lecture - CSE111 (Updated) PPT
22 pages
Academic English for Computer Science: Academic English
From Everand
Academic English for Computer Science: Academic English
Disigma Publications
No ratings yet
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
Edpm Sba 2
No ratings yet
Edpm Sba 2
7 pages
BNPL Report
No ratings yet
BNPL Report
38 pages
Lecture On Lis - Bosy Enrollment Sy. 2021-2022
100% (1)
Lecture On Lis - Bosy Enrollment Sy. 2021-2022
34 pages
TSLB3152 Format For Proposal
No ratings yet
TSLB3152 Format For Proposal
2 pages
Training Project On Analysis of Customer Satisfaction Level in Real Estate Sector
No ratings yet
Training Project On Analysis of Customer Satisfaction Level in Real Estate Sector
64 pages
The Anatomy of A Meeting - How The Worlds Best Companies Run Productive Team Meetings
100% (2)
The Anatomy of A Meeting - How The Worlds Best Companies Run Productive Team Meetings
44 pages
G10 Language and Literature Unit 2: " Purpose and Structure Are Key Components To Effective Communication "
No ratings yet
G10 Language and Literature Unit 2: " Purpose and Structure Are Key Components To Effective Communication "
5 pages
Section 1 Reading Comprehension
No ratings yet
Section 1 Reading Comprehension
7 pages
Appendixes To TCRP Report 118:: Bus Rapid Transit Practitioner's Guide
No ratings yet
Appendixes To TCRP Report 118:: Bus Rapid Transit Practitioner's Guide
51 pages
Che 3101 Group 7-Industrial Packaging
No ratings yet
Che 3101 Group 7-Industrial Packaging
83 pages
Personal Training Contract Agreement PDF
100% (1)
Personal Training Contract Agreement PDF
1 page
Authentications
No ratings yet
Authentications
35 pages
Labour Law
No ratings yet
Labour Law
80 pages
Configuration of Tracking Area Code (TAC) For Paging Optimization in Mobile Communication Systems
No ratings yet
Configuration of Tracking Area Code (TAC) For Paging Optimization in Mobile Communication Systems
2 pages
Academic Freedom
No ratings yet
Academic Freedom
4 pages
Idbi JD
No ratings yet
Idbi JD
2 pages
Chantel Spade 2018
No ratings yet
Chantel Spade 2018
3 pages
Sustainable Planning in Reykjavik
No ratings yet
Sustainable Planning in Reykjavik
96 pages
Hardik Sir Memo
No ratings yet
Hardik Sir Memo
13 pages
Strategic Management Process
No ratings yet
Strategic Management Process
12 pages
Graffiti Reading Comprehention
No ratings yet
Graffiti Reading Comprehention
3 pages
406 1186 1 PB PDF
No ratings yet
406 1186 1 PB PDF
30 pages
Petitioner Garner Full Report Filed
No ratings yet
Petitioner Garner Full Report Filed
85 pages
Indigo: Value Based Questions
No ratings yet
Indigo: Value Based Questions
7 pages
Mishandling of Sexual Harassment Complaints at Symbiosis Law School Hyderabad
No ratings yet
Mishandling of Sexual Harassment Complaints at Symbiosis Law School Hyderabad
4 pages
Teaching Journalism Ethics Through "The Newsroom": An Enhanced Learning Experience
No ratings yet
Teaching Journalism Ethics Through "The Newsroom": An Enhanced Learning Experience
16 pages
Group 3 Report
No ratings yet
Group 3 Report
66 pages