CSCE569 Parallel Computing
Lecture 1
TTH 03:30AM-04:45PM
Dr. Jianjun Hu
http://mleg.cse.sc.edu/edu/csce569/
University of South Carolina
Department of Computer Science and
Engineering
CSCE569 Course Information
Meet time: TTH 03:30AM-04:45PM Swearingen 2A21
4 Homework
Use CSE turn-in system to submit your Homework
(https://dropbox.cse.sc.edu)
Deadline policy
1 Midterm Exam (conceptual understanding)
1 Final Project (deliverable to your future employer!)
Teamwork
Implementation project/research project
TA: No TA.
CSCE569 Course Information
Textbook and references
Parallel Programming: for Multicore and Cluster Systems
By:Thomas Rauber (Author), Gudula Rünger (Author)
Publisher: Springer; 1st Edition. edition (March 10, 2010)
Good reference book: Parallel Programming in C with
MPI and OpenMP
by Michael J. Quinn
Most important information sources: Slides.
Grading policy
4 homeworks, 1 midterm, 1 final project, in-class
participation
About Your Instructor
Dr. Jianjun Hu ([email protected])
Office hours: TTH 2:30-3:20PM or Drop by any time
Office Phone#: 803-7777304 3A66 SWNG
Background:
Mechanical Engineering/CAD
Machine learning/Computational intelligence/Genetic
Algorithms/Genetic Programming (PhD)
Bioinformatics and Genomics (Postdoc)
Multi-disciplinary just as parallel computing app.
Outline
Motivation
Modern scientific method
Evolution of supercomputing
Modern parallel computers
Seeking concurrency
Data clustering case study
Programming parallel computers
Why You are Here?
Solve BIG problems
Use Supercomputers
Write parallel programs
Why Faster Computers?
Solve compute-intensive problems faster
Make infeasible problems feasible
Reduce design time
Solve larger problems in same amount of time
Improve answer’s precision
Reduce design time
Gain competitive advantage
Why Parallel Computing?
The massively parallel architecture of GPUs, coming
from its graphics heritage, is now delivering
transformative results for scientists and researchers all
over the world. For some of the world’s most
challenging problems in medical research, drug
discovery, weather modeling, and seismic
exploration – computation is the ultimate tool.
Without it, research would still be confined to trial and
error-based physical experiments and observation.
What problems need Parallel Computing?
Parallel Computing in the Real-world
Engineering
Science
Business
Game
Cloud-computing
Definitions
Parallel computing
Using parallel computer to solve single
problems faster
Parallel computer
Multiple-processor/core system supporting
parallel programming
Parallel programming
Programming in a language that supports
concurrency explicitly
Classical Science
Nature
Observation
Physical
Theory
Experimentation
Modern Scientific Method
Nature
Observation
Numerical Physical
Theory
Simulation Experimentation
Evolution of Supercomputing
World War II
Hand-computed artillery tables
Need to speed computations
ENIAC
Cold War
Nuclear weapon design
Intelligence gathering
Code-breaking
Supercomputer
General-purpose computer
Solves individual problems at high speeds, compared
with contemporary systems
Typically costs $10 million or more
Traditionally found in government labs
Commercial Supercomputing
Started in capital-intensive industries
Petroleum exploration
Automobile manufacturing
Other companies followed suit
Pharmaceutical design
Consumer products
CPUs 1 Million Times Faster
Faster clock speeds
Greater system concurrency
Multiple functional units
Concurrent instruction execution
Speculative instruction execution
Systems 1 Billion Times Faster
Processors are 1 million times faster
Combine thousands of processors
Parallel computer
Multiple processors
Supports parallel programming
Parallel computing = Using a parallel computer to
execute a program faster
Beowulf Concept
NASA (Sterling and Becker)
Commodity processors
Commodity interconnect
Linux operating system
Message Passing Interface (MPI) library
High performance/$ for certain applications
Computing speed of supercomputers
Projected Computing speed of supercomputers
Top 10 Supercomputers 2010.11
GPU
What you can use
Hardware
Multicore chips (2011: mostly 2 cores and 4 cores, but
doubling) (cores=processors)
Servers (often 2 or 4 multicores sharing memory)
Clusters (often several, to tens, and many more servers
not sharing memory)
Supercomputer at USC CEC
Supercomputers at USC CEC
64 Nodes: Dual CPU
76 Compute Nodes w/ dual 3.4 GHz
Supercomputers at USC CEC
SGI Altix 4700 Shared-memory system
Hardware
128 Itanium Cores @ 1.6 GHz/ 8MB Cache
256 GB RAM
8TB storage
NUMAlink Interconnect Fabric
Software
SUSE10 w/SGI PROPACK
Intel C/C++ and Fortran Compilers
VASP
PBSPro scheduling software
Message Passing Toolkit
Intel Math Kernel Library
GNU Scientific Library
Boost library
Some historical machines
Earth Simulator was #1
Some interesting hardware
Nvidia Cell Processor
Sicortex – “Teraflops from Milliwatts”
http://www.sicortex.com/products/sc648
http://www.gizmag.com/mit-cycling-human-powered-computation/8503/
GPU-based supercomputing+CUDA
Topic1: Hardware Architecture of parallel
computing system
Topic2: Programming/Software
Common parallel computing methods
PBS- job scheduling system
MPI: The Message Passing Interface
Low level “lowest common denominator”
language that the world has stuck with for nearly
20 years
Can get performance, but can be a hindrance as
well
Pthread for multi-core shared memory parallel
programming
CUDA GPU programming
MapReduce Google style high-performance
computing
Why MPI?
MPI = “Message Passing Interface”
Standard specification for message-passing libraries
Libraries available on virtually all parallel computers
Free libraries also available for networks of
workstations or commodity clusters
Why OpenMP?
OpenMP an application programming interface (API)
for shared-memory systems
Supports higher performance parallel programming of
symmetrical multiprocessors
Topic3: Performance
Single processor speeds for now no longer growing.
Moore’s law still allows for more real estate per core
(transistors double/nearly every two years)
http://www.intel.com/technology/mooreslaw/index.htm
People want performance but hard to get
Slowdowns seen before speedups
Flops (floating point ops / second)
Gigaflops (109), Teraflops (1012), Petaflops(1015)
Summary (1/2)
High performance computing
U.S. government
Capital-intensive industries
Many companies and research labs
Parallel computers
Commercial systems
Commodity-based systems
Summary (2/2)
Power of CPUs keeps growing exponentially
Parallel programming environments changing very
slowly
Two standards have emerged
MPI library, for processes that do not share
memory
OpenMP directives, for processes that do share
memory
Places to Look
Best current news:
http://www.hpcwire.com/
Huge Conference:
http://sc09.supercomputing.org/
http://www.interactivesupercomputing.com
Top500.org