0% found this document useful (0 votes)

70 views34 pages

Numerical Libraries For Petascale Computing: Brett Bode William Gropp

This document discusses numerical libraries for petascale computing. It notes several advantages to using libraries, such as code running faster through optimized algorithms and tricks, correctness through vetted implementations, and greater programmer productivity by avoiding lower-level work. While libraries can add overhead, they are important for achieving best parallel performance and implementing modern algorithms that provide much greater speedups than hardware alone. The document provides examples of libraries that deliver optimized linear algebra, random number generation, multigrid solvers, and more.

Uploaded by

Peng Xu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views34 pages

Numerical Libraries For Petascale Computing: Brett Bode William Gropp

Uploaded by

Peng Xu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Numerical Libraries for Petascale Computing

Brett Bode William Gropp

Why Use Libraries?

There are many reasons to use libraries:
Faster - Code tricks Faster Better Algorithms Correct More productive programming Stuff you dont want to do

There are some reasons not to use libraries

Ill mention some during the presentation
2

Faster (Better Code)

Achieving best performance can require creating very processor- and system-specific code Example: Dense matrix-matrix multiply (DGEMM)
Simple to express: In Fortran, do i=1, n do j=1,n c(i,j) = 0 do k=1,n c(i,j) = c(i,j) + a(i,k) * b(k,j)
3

Performance Estimate
How fast should this run?
Standard complexity analysis in numerical analysis counts floating point operations Our matrix-matrix multiply algorithm has 2n3 floating point operations
3 nested loops, each with n iterations 1 multiply, 1 add in each inner iteration

For n=100, 2x106 operations, or about 1 msec on a 2GHz processor :) For n=1000, 2x109 operations, or about 1 sec

The Reality
N=100
1818 MF (1.1ms)

N=1000
335 MF (6s)

What this tells us:

Obvious expression of algorithms are not transformed into leading performance.

Performance Gap in Compiled Code

Large gap between natural code and specialized code

Hand-tuned

Compiler

From Atlas
Enormous effort required to get good performance
6

Sometime Slower
Using a library routine is not always the best choice:
Library routines add overhead Fewer routines (simpler for user) adds more overhead in determining exact operation Apply the usual rules:
Instrument your code Know what performance you need/expect Only worry about code that takes a significant fraction of the total run time
7

Faster (Better Code)

Example: Aggregation of operations Consider
Do i=1,n y(i) = exp(x(i)) enddo

Can this be speeded up? Yes!

Easy Application requires less accuracy Harder Compute multiple exponentials at one time Overlaps evaluation; share table lookups Call vexp(y,x,n)

One example: IBMs libmassv (http://www01.ibm.com/software/awdtools/mass/)

Faster (Better Algorithms)

Modern algorithms can provide significantly greater performance Example: Solving systems of linear equations
For most of the history of computing, as much of an improvement in performance in solving systems of linear equations arising from PDEs came from better algorithms as from faster hardware

Algorithms and Moores Law This advance took place over a span of about 36 years, or 24 doubling times for Moores Law 22416 million the same as the factor from algorithms alone!

relative speedup

year

Thanks to David Keyes for this chart

Example: Multigrid
Multigrid can be a very effective algorithm for certain classes of problems Efficient implementations must address Algorithmic choices (e.g., smoother) Implementation for memory locality Use as a preconditioner within a Krylov method And thats just on a single processor Parallel versions add questions about efficient coarse grid solves, data exchange, etc. Libraries such as hypre (https://computation.llnl.gov/casc/linear_solvers/sls_hypre.html) contain efficient implementations for parallel systems
11

Correct
Some operations are subtle and require care to get them right Example: (pseudo) random number generation in parallel Using a local random generator such as srand produces correlated values not random at all Simply using different seeds for each thread/process in a parallel program isnt enough (unless the seeds are picked very carefully) SPRNG Scalable Parallel Random Number Generator
Provides good pseudo-random number generators, suitable for use in a parallel program

http://sprng.cs.fsu.edu/

Greater Productivity
Parallel programming is widely viewed as difficult Much effort to develop programming languages that make parallel programming easy But what is really needed is a way to provide the data structures, algorithms, and methods needed by the computational scientist A general purpose language is not the best way to do this (though it may be a good way to implement it) An alternative is through carefully designed libraries

What Advantage Does This Approach Give You?

Example: A Poisson Solver in PETSc

The following 7 slides show a complete 2-d Poisson solver in PETSc. Features of this solver:
Fully parallel 2-d decomposition of the 2-d mesh Linear system described as a sparse matrix; user can select many different sparse data structures Linear system solved with any user-selected Krylov iterative method and preconditioner provided by PETSc, including GMRES with ILU, BiCGstab with Additive Schwarz, etc. Complete performance analysis built-in

Only 7 slides of code!

Solve a Poisson Problem with Preconditioned GMRES

/* -*- Mode: C; c-basic-offset:4 ; -*- */ #include <math.h> #include "petscsles.h" #include "petscda.h" extern Mat FormLaplacianDA2d( DA, int ); extern Vec FormVecFromFunctionDA2d( DA, int, double (*)(double,double) ); /* This function is used to define the right-hand side of the Poisson equation to be solved */ double func( double x, double y ) { return sin(x*M_PI)*sin(y*M_PI); } int main( int argc, char *argv[] ) { SLES sles; Mat A; Vec b, x; DA grid; int its, n, px, py, worldSize; PetscInitialize( &argc, &argv, 0, 0 );

PETSC objects hide details of distributed data structures and function parameters

/* Get the mesh size. Use 10 by default */ n = 10; PetscOptionsGetInt( PETSC_NULL, "-n", &n, 0 ); /* Get the process decomposition. Default it the same as without DAs */ px = 1; PetscOptionsGetInt( PETSC_NULL, "-px", &px, 0 ); MPI_Comm_size( PETSC_COMM_WORLD, &worldSize ); py = worldSize / px;

PETSc provides routines to access parameters and defaults

/* Create a distributed array */ DACreate2d( PETSC_COMM_WORLD, DA_NONPERIODIC, DA_STENCIL_STAR, n, n, px, py, 1, 1, 0, 0, &grid );

/* Form the matrix and the vector corresponding to the DA */ A = FormLaplacianDA2d( grid, n ); b = FormVecFromFunctionDA2d( grid, n, func ); VecDuplicate( b, &x );

PETSc provides routines to create, allocate, and manage distributed data structures
16

SLESCreate( PETSC_COMM_WORLD, &sles ); SLESSetOperators( sles, A, A, DIFFERENT_NONZERO_PATTERN ); SLESSetFromOptions( sles ); PETSc provides SLESSolve( sles, b, x, &its ); routines that solve PetscPrintf( PETSC_COMM_WORLD, "Solution is:\n" ); VecView( x, PETSC_VIEWER_STDOUT_WORLD ); PetscPrintf( PETSC_COMM_WORLD, "Required %d iterations\n", its ); MatDestroy( A ); VecDestroy( b ); VecDestroy( x ); SLESDestroy( sles ); DADestroy( grid ); PetscFinalize( ); return 0; }

systems of sparse linear (and nonlinear) equations

PETSc provides coordinated I/O (behavior is as-if a single process), including the output of the distributed vec object
17

/* -*- Mode: C; c-basic-offset:4 ; -*- */ #include "petsc.h" #include "petscvec.h" #include "petscda.h" /* Form a vector based on a function for a 2-d regular mesh on the unit square */ Vec FormVecFromFunctionDA2d( DA grid, int n, double (*f)( double, double ) ) { Vec V; int is, ie, js, je, in, jn, i, j; double h; double **vval; h = 1.0 / (n + 1); DACreateGlobalVector( grid, &V ); DAVecGetArray( grid, V, (void **)&vval );
18

/* Get global coordinates of this patch in the DA grid */ DAGetCorners( grid, &is, &js, 0, &in, &jn, 0 ); ie = is + in - 1; je = js + jn - 1; Almost the uniprocess code for (i=is ; i<=ie ; i++) { for (j=js ; j<=je ; j++){ vval[j][i] = (*f)( (i + 1) * h, (j + 1) * h ); } } DAVecRestoreArray( grid, V, (void **)&vval ); return V; }

Creating a Sparse Matrix, Distributed Across All Processes

/* -*- Mode: C; c-basic-offset:4 ; -*- */ #include "petscsles.h" #include "petscda.h" /* Form the matrix for the 5-point finite difference 2d Laplacian on the unit square. n is the number of interior points along a side */ Mat FormLaplacianDA2d( DA grid, int n ) { Mat A; int r, i, j, is, ie, js, je, in, jn, nelm; Creates a parallel distributed MatStencil cols[5], row; matrix using compressed sparse double h, oneByh2, vals[5];

row format

h = 1.0 / (n + 1); oneByh2 = 1.0 / (h*h);

DAGetMatrix( grid, MATMPIAIJ, &A ); /* Get global coordinates of this patch in the DA grid */ DAGetCorners( grid, &is, &js, 0, &in, &jn, 0 ); ie = is + in - 1; je = js + jn - 1;
20

for (i=is; i<=ie; i++) { for (j=js; j<=je; j++){ row.j = j; row.i = i; nelm = 0; if (j - 1 > 0) { vals[nelm] = oneByh2; cols[nelm].j = j - 1; cols[nelm++].i = i;} if (i - 1 > 0) { vals[nelm] = oneByh2; cols[nelm].j = j; cols[nelm++].i = i - 1;} vals[nelm] = - 4 * oneByh2; cols[nelm].j = j; cols[nelm++].i = i; if (i + 1 < n - 1) { vals[nelm] = oneByh2; cols[nelm].j = j; cols[nelm++].i = i + 1;} if (j + 1 < n - 1) { vals[nelm] = oneByh2; cols[nelm].j = j + 1; cols[nelm++].i = i;} MatSetValuesStencil( A, 1, &row, nelm, cols, vals, INSERT_VALUES ); } } MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); return A; }

Just the usual code for setting the elements of the sparse matrix (the complexity comes, as it often does, from the boundary conditions)

On Blue Waters IBM will Provide

Engineering Scientific Subroutine Library (ESSL) Most BLAS levels 1-3, LAPACK, FFT Sequential or threaded Analogous to Intels MKL, ATLAS, etc Parallel ESSL Analogous to ScaLAPACK MPI plus subset of ESSL

On Blue Waters IBM will Provide

libmass/libmassv Mathematical Acceleration SubSystem Sequential/vector/simd versions Sequential routines are standard math intrinsic functions Compiler will attempt to inline mass versions when possible. simd/vector versions are not portable
Compiler will attempt to generate calls to some routines, but manually calling the library functions is recommended.
23

Productivity Libraries and Frameworks

Several productivity libraries are receiving special attention for Blue Waters
PETSc (www.mcs.anl.gov/petsc) Cactus (http://www.cactuscode.org/)

There are many others

Let us know if there are libraries that are important to your applications

Stuff You Dont Want to Do

Such as Parallel I/O Common for applications to funnel all I/O through one process
Works around bugs in common file systems such as NFS Ensures that output is in canonical (natural) format expected by other applications

Portable File Formats

Ad-hoc file formats Difficult to collaborate Cannot leverage post-processing tools MPI provides external32 data encoding High level I/O libraries netCDF and HDF5 Better solutions than external32
Define a container for data Describes contents May be queried (self-describing) Standard format for metadata about the file Wide range of post-processing tools available
26

Higher Level I/O Libraries

Scientific applications work with structured data and desire more self-describing file formats netCDF and HDF5 are two popular higher level I/O libraries Abstract away details of file layout Provide standard, portable file formats Include metadata describing contents For parallel machines, these should be built on top of MPI-IO HDF5 has an MPI-IO option
http://www.hdfgroup.org/HDF5/
27

Parallel netCDF (PnetCDF)

(Serial) netCDF API for accessing multi-dimensional data sets Portable file format Popular in both fusion and climate communities Parallel netCDF Very similar API to netCDF Tuned for better performance in todays computing environments Retains the file format so netCDF and PnetCDF applications can share files PnetCDF builds on top of any MPI-IO implementation http://trac.mcs.anl.gov/projects/parallelnetcdf

Cluster PnetCDF ROMIO PVFS2 IBM PnetCDF IBM MPI GPFS

I/O in netCDF and PnetCDF

(Serial) netCDF Parallel read
All processes read the file independently No possibility of collective optimizations P0 P1 P2 P3

netCDF Parallel File System

Sequential write
Parallel writes are carried out by shipping data to a single process

PnetCDF Parallel read/write to shared netCDF file Built on top of MPI-IO which utilizes optimal I/O facilities of the parallel file system and MPI-IO implementation Allows for MPI-IO hints and datatypes for further optimization

Parallel netCDF Parallel File System

Higher Level Parallel I/O Libraries

PRACs have indicated a need for MPI-IO HDF-5 pnetCDF Others recognize need for parallel I/O Many use I/O through one process
Reasons of simplicity, avoid errors/performance problems in concurrent access to a common file These will need to adapt to other I/O approaches as full performance will require parallel I/O

I/O Library Tuning Issues

No really good parallel I/O benchmarks IOR, b_eff_io have value but also significant limitations
In particular, application I/O patterns dont match benchmark patterns

Performance inconsistencies MPI-IO and pnetCDF should have similar performance MPI-IO and HDF-5 should have similar performance for data POSIX I/O and comparable MPI-IO patterns should have similar performance Performance consistency is important (but not sufficient) for scalability But performance inconsistencies are common in practice

I/O Library Tuning Activities

Currently developing tests to understand performance inconsistencies Working with IOR as a basis for tests; extending as necessary MPI-IO/pnetCDT/HDF-5 tests for both per process performance and for scalability MPI-IO vs. POSIX for per process performance These are a necessary first step before focusing on scaling Also developing correctness tests for concurrent updates to a single file Test should/will fail for NFS (not POSIX semantics) Test should not fail for GPFS But anything not tested isnt known to work

Recommendations
Dont do it yourself!
Use Frameworks and Libraries where possible Exploit principles used in those libraries if you need to write your own

Upgrade existing programs

Much can be done by update/replacing core parts of the application

Embrace multicore Libraries can be a part of this solution

MPI everywhere not a solution

Start over (at least for parts)

Real Petascale may require new algorithms and even mathematical models

Summary
There are many reasons to use libraries:
Faster Correct Real parallel I/O More productive programming

The best reason: they let you focus on getting your science done There are many libraries available
Only a few mentioned in this talk Many other good ones available ask!
34

Introduction To Scientific Computing A Matrix Vector Approach Using Matlab 2nd Edition PDF
100% (1)
Introduction To Scientific Computing A Matrix Vector Approach Using Matlab 2nd Edition PDF
398 pages
Data Structures
100% (2)
Data Structures
75 pages
Intro To PETSc
No ratings yet
Intro To PETSc
111 pages
Petsc Tutorials
No ratings yet
Petsc Tutorials
231 pages
codingPracticesSSW
No ratings yet
codingPracticesSSW
47 pages
icl-utk-1031-2017
No ratings yet
icl-utk-1031-2017
45 pages
Petsc Manual
No ratings yet
Petsc Manual
310 pages
Matrix Computation On The GPU
No ratings yet
Matrix Computation On The GPU
455 pages
New Data Structures Design Lab Manual (1)
No ratings yet
New Data Structures Design Lab Manual (1)
64 pages
Solving Pdes With Petsc: William Gropp
No ratings yet
Solving Pdes With Petsc: William Gropp
92 pages
Repa
No ratings yet
Repa
18 pages
Lect11 12 Parallel
No ratings yet
Lect11 12 Parallel
57 pages
RG2-ParallelizationPrinciples-HPCAI-Jan2020
No ratings yet
RG2-ParallelizationPrinciples-HPCAI-Jan2020
40 pages
Web GPU
0% (1)
Web GPU
40 pages
PETSc Tutorial
No ratings yet
PETSc Tutorial
268 pages
Ecp2018 Magma Tutorial 1
No ratings yet
Ecp2018 Magma Tutorial 1
50 pages
5-computation
No ratings yet
5-computation
13 pages
Semiconductor Memory: Santanu Chattopadhyay
No ratings yet
Semiconductor Memory: Santanu Chattopadhyay
73 pages
Level 5 Professional Issues in It Student Guide
No ratings yet
Level 5 Professional Issues in It Student Guide
74 pages
CAD & PD Lab File
No ratings yet
CAD & PD Lab File
44 pages
Sparse 1
No ratings yet
Sparse 1
68 pages
Captiva REST Services 2.5 Development Guide
No ratings yet
Captiva REST Services 2.5 Development Guide
210 pages
PDE Matlab Tutorial
No ratings yet
PDE Matlab Tutorial
457 pages
Accelerating CFD Simulations With Gpus: Patrice Castonguay
No ratings yet
Accelerating CFD Simulations With Gpus: Patrice Castonguay
67 pages
202403-Articles-CAF-Symmetric-FSM-Published
No ratings yet
202403-Articles-CAF-Symmetric-FSM-Published
9 pages
Bus Safety Report
No ratings yet
Bus Safety Report
54 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
PETSc Tutorial
No ratings yet
PETSc Tutorial
132 pages
Record - Aarkum Manasilavilla
No ratings yet
Record - Aarkum Manasilavilla
45 pages
Computer
No ratings yet
Computer
48 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
25 pages
CUDA Tricks PDF
No ratings yet
CUDA Tricks PDF
33 pages
Lecture17 12
No ratings yet
Lecture17 12
86 pages
The Deal - II Library - The Step-5 Tutorial Program
No ratings yet
The Deal - II Library - The Step-5 Tutorial Program
10 pages
Rapid Simulation of Hydraulic Fracturing Using A Planar 3D Model
No ratings yet
Rapid Simulation of Hydraulic Fracturing Using A Planar 3D Model
26 pages
CUDA Libraries and CUDA Fortran: Massimiliano Fatica
No ratings yet
CUDA Libraries and CUDA Fortran: Massimiliano Fatica
55 pages
A FEM Algorithm in Octave: June 2000
100% (1)
A FEM Algorithm in Octave: June 2000
39 pages
User's Guide: Partial Differential Equation Toolbox™
No ratings yet
User's Guide: Partial Differential Equation Toolbox™
446 pages
IT Manager (2022 version)
No ratings yet
IT Manager (2022 version)
3 pages
Prandtl Glauert Report: TH TH
No ratings yet
Prandtl Glauert Report: TH TH
11 pages
Matrix Lyon
No ratings yet
Matrix Lyon
37 pages
Compter Graphics File - 1
No ratings yet
Compter Graphics File - 1
28 pages
Compilers: Tools For Scientists and Engineers
No ratings yet
Compilers: Tools For Scientists and Engineers
42 pages
Two Marks Question and Answers
No ratings yet
Two Marks Question and Answers
4 pages
Introduction To Petsc
No ratings yet
Introduction To Petsc
25 pages
Experience of Developing Sparse Matrix Algorithms and Software For Sustainablity
No ratings yet
Experience of Developing Sparse Matrix Algorithms and Software For Sustainablity
22 pages
Assembly
No ratings yet
Assembly
3 pages
Dlib
No ratings yet
Dlib
4 pages
Daa Question Bank
No ratings yet
Daa Question Bank
13 pages
A FEM Alghorithm in Octave
No ratings yet
A FEM Alghorithm in Octave
39 pages
2 6
No ratings yet
2 6
8 pages
Compiler Design
100% (2)
Compiler Design
52 pages
EE 660 Assignments
No ratings yet
EE 660 Assignments
9 pages
Lab 7
No ratings yet
Lab 7
3 pages
Matlab Tutorial
No ratings yet
Matlab Tutorial
25 pages
Haskell Arrays Accelerated With GPUs
100% (1)
Haskell Arrays Accelerated With GPUs
47 pages
Tony-fich
No ratings yet
Tony-fich
1 page
TS1500 Vs TS1000 File Diff Report
No ratings yet
TS1500 Vs TS1000 File Diff Report
2 pages
fts from
No ratings yet
fts from
1 page
EMP Lecture 1
No ratings yet
EMP Lecture 1
19 pages
Numerical Modelling in Fortran: Day 6: Paul Tackley, 2017
No ratings yet
Numerical Modelling in Fortran: Day 6: Paul Tackley, 2017
53 pages
D7.12 Data Management Plan Phase 3 v1.0
No ratings yet
D7.12 Data Management Plan Phase 3 v1.0
9 pages
Data Analysis
No ratings yet
Data Analysis
71 pages
Outline of Next 2 Lectures: Matrix Computations: Direct Methods I
No ratings yet
Outline of Next 2 Lectures: Matrix Computations: Direct Methods I
16 pages
MTH3011 - Partial Differential Equations - 2019 Handbook - Monash University
No ratings yet
MTH3011 - Partial Differential Equations - 2019 Handbook - Monash University
2 pages
Spanning Tree
No ratings yet
Spanning Tree
7 pages
Sap Tables
No ratings yet
Sap Tables
5 pages
Review Problems Project
No ratings yet
Review Problems Project
4 pages
Compre Questions-Fall-12-131 1
No ratings yet
Compre Questions-Fall-12-131 1
106 pages
Microprocessor 8085 Programs
No ratings yet
Microprocessor 8085 Programs
5 pages
Numerical Methods For Partial Differential Algebraic Systems of Equations
No ratings yet
Numerical Methods For Partial Differential Algebraic Systems of Equations
61 pages
International Tuition Fees
No ratings yet
International Tuition Fees
7 pages
Dictionary of Prepositions PDF
No ratings yet
Dictionary of Prepositions PDF
3 pages
Solving Pdes With Cuda
No ratings yet
Solving Pdes With Cuda
34 pages
C I C Services Explained
No ratings yet
C I C Services Explained
3 pages
Control Engineering MCQ
50% (2)
Control Engineering MCQ
4 pages
BCA
No ratings yet
BCA
2 pages
Grounding Electrode Sphere of Influence
No ratings yet
Grounding Electrode Sphere of Influence
3 pages
Cultural Scavenger Hunt
No ratings yet
Cultural Scavenger Hunt
4 pages
AMI AptioUtilities Datasheet PUB Q1-2013
No ratings yet
AMI AptioUtilities Datasheet PUB Q1-2013
1 page
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Numerical Libraries For Petascale Computing: Brett Bode William Gropp

Uploaded by

Numerical Libraries For Petascale Computing: Brett Bode William Gropp

Uploaded by

Numerical Libraries for Petascale Computing

Brett Bode William Gropp

Why Use Libraries?

There are some reasons not to use libraries

Faster (Better Code)

What this tells us:

Performance Gap in Compiled Code

Faster (Better Code)

Can this be speeded up? Yes!

One example: IBMs libmassv (http://www01.ibm.com/software/awdtools/mass/)

Faster (Better Algorithms)

Thanks to David Keyes for this chart

What Advantage Does This Approach Give You?

Example: A Poisson Solver in PETSc

Only 7 slides of code!

Solve a Poisson Problem with Preconditioned GMRES

PETSc provides routines to access parameters and defaults

systems of sparse linear (and nonlinear) equations

Creating a Sparse Matrix, Distributed Across All Processes

h = 1.0 / (n + 1); oneByh2 = 1.0 / (h*h);

On Blue Waters IBM will Provide

On Blue Waters IBM will Provide

Productivity Libraries and Frameworks

There are many others

Stuff You Dont Want to Do

Portable File Formats

Higher Level I/O Libraries

Parallel netCDF (PnetCDF)

Cluster PnetCDF ROMIO PVFS2 IBM PnetCDF IBM MPI GPFS

I/O in netCDF and PnetCDF

netCDF Parallel File System

Parallel netCDF Parallel File System

Higher Level Parallel I/O Libraries

I/O Library Tuning Issues

I/O Library Tuning Activities

Upgrade existing programs

Embrace multicore Libraries can be a part of this solution

Start over (at least for parts)

You might also like