0% found this document useful (0 votes)

28 views8 pages

Basic Parallel Programming Methods

Uploaded by

ABDUL MAJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views8 pages

Basic Parallel Programming Methods

Uploaded by

ABDUL MAJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Real-Time Systems

Lecture Topic – Review of Basic Concurrent/Parallel Programming

Dr. Sam Siewert
Electrical, Computer and Energy Engineering
Embedded Systems Engineering Program

Copyright © 2019 University of Colorado

Flynn’s Taxonomy – Parallel Systems
SISD – Single core, no vector
instructions Single Instruction/Prog Multiple Instruction
Single Data SISD (Traditional Uni- MISD
SIMD Ideal for Large Bitwise, processor)
Integer, and Floating Point
Vector Math
Multiple Data SIMD (SSE 4.2, Vector MIMD
Flynn’s Taxonomy Processing)
R-Pi 3b+/4 – MIMD SPMD (Single Program MPMD (Multi-threaded
Multi-core, NEON vector
instructions Multiple Data), GP-GPU Program, Multi-Data)

MIMD and SPMD Architecture

often leverages GP-GPU Co-
Processors

DSP VLIW (SIMD) or MIMD

(e.g. Beagle Bone AI)

2
Parallel Programming for Speed-up
Sharpen single core Demonstrations

Both are threaded, but

erast.c has semaphore
locks and sharpen does
not.

Sharpen with thread grid 1. erast.c

• Without locks
do we risk data
corruption
• Indivisible test
and set?
Can use Shared Memory with POSIX Threads – but may need locking! • Concurrent
– Locking will serialize and slow down code if sequential sections are too long reader and
– erast.c vs. erastsimp.c is a good example writer?
– Can we just run lockless?
2. sharpen_grid.c
Speed up is? – Linear?, Better?, Worse?

3
Scaling and Bottlenecks
Compiler Optimization 1 - Simple and Effective: turn on compiler optimization ~ 3x
– Turn on higher levels of optimization
– Level 3 optimization: –03 for gcc or g++
– Highest is -04, but requires feedback optimization
2 - Simple and Sometimes Effective: turn on NEON SIMD ~ 1.f x
SIMD Vector Instructions
– Turn on SIMD (NEON) instruction generation on ARM A-Series
– Flynn’s taxonomy

Using Multiple Cores 3 - Harder and Mostly Effective: Grid to Map and Reduce ~ 3.2x
– Shared Memory POSIX Threads

Combine #1, #2, and #3

Co-Processing 4 - Hardest and Highly Effective: Grid programming 128 SPs

– Linux SMP
– With advanced platforms like Jetson Nano with CUDA

~ 70x

4
Theoretical Speed-Up – Linear at Best

Speed-Up
< Linear

Due to Sequential Section

(Mapping - Split)

Compared to Parallel Section

(Gridded - Apply)

…and Due to Final Step

(Combine)

5
Parallel Processing Speed-up
Grid Data Processing Speed-up
1. Multi-Core, Multi-threaded, Macro-blocks/Frames
2. SIMD, Vector Instructions Operating over Large Words (Many Times
Instruction Set Size)
3. Co-Processor Operates in Parallel to CPU(s)

SPMD – GPU or GP-GPU Co-Processor

– PCI-Express Bus Interfaces
– Transfer Program and Data to Co-Processor S is infinite here
– Threads and Blocks to Transform Data Concurrently
1
Image Data Processing – Few Data Dependencies Max _ Speed _ Up =
– Good Speed-up by Amdahl’s Law
(1 − P) + 0
– P=Parallel Portion
– (1-P)=Sequential Portion 1
Multicore _ Speed _ Up =
– S=# of Cores (Concurrency) (1 − P) + P / S
– Overhead for Co-Processor
– IO for Co-Processing

6
Conceptual View of Hardware Resources
Three-Space View of CPU-bound HPC vs. RT or Fair
Utilization
Goal is to fully use
Requirements All resources to scale!
– CPU Margin?
– IO Latency (and CPU-Use
Bandwidth) Margin?
– Memory Capacity (and
Latency) Margin? CPU, I/O,
Mem bound
Upper Right Front Corner –
Low-Margin CPU, I/O
Mem Margin IO-Use
I/O-bound
Origin – High-Margin
Memory-Use
CPU + I/O + Memory
Bound?! – Bad day!

memory-bound

CS614 Finalterm Subjective Referencefile
No ratings yet
CS614 Finalterm Subjective Referencefile
27 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
A Comprehensive Survey of Various Processor Types & Latest Architectures
No ratings yet
A Comprehensive Survey of Various Processor Types & Latest Architectures
7 pages
pdf&rendition=1
No ratings yet
pdf&rendition=1
126 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Lecture13 - Full IS1500
No ratings yet
Lecture13 - Full IS1500
34 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Chapter 06
No ratings yet
Chapter 06
76 pages
Flynn's Taxonomy and SISD SIMD MISD MIMD
86% (14)
Flynn's Taxonomy and SISD SIMD MISD MIMD
7 pages
Week 4 PDC
No ratings yet
Week 4 PDC
11 pages
Copy of Unit IV CA
No ratings yet
Copy of Unit IV CA
73 pages
Chapter 7
No ratings yet
Chapter 7
25 pages
Flynn's Taxonomy: 1. Sisd
No ratings yet
Flynn's Taxonomy: 1. Sisd
7 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
Lec 5
No ratings yet
Lec 5
14 pages
Ispc - A SPMD Compiler for High-Performance CPU Programming (Ispc_inpar_2012)
No ratings yet
Ispc - A SPMD Compiler for High-Performance CPU Programming (Ispc_inpar_2012)
13 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Flynn's and Fengs Architecture
No ratings yet
Flynn's and Fengs Architecture
28 pages
Multi
No ratings yet
Multi
5 pages
Mcap Notes
No ratings yet
Mcap Notes
186 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
CA Unit IV Notes Part 1 PDF
No ratings yet
CA Unit IV Notes Part 1 PDF
17 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Parallel Computing
No ratings yet
Parallel Computing
34 pages
1.2 Underlying Principles of Parallel and Distributed Computing
No ratings yet
1.2 Underlying Principles of Parallel and Distributed Computing
42 pages
Architecture
No ratings yet
Architecture
67 pages
Cs8083 Notes Mcap
No ratings yet
Cs8083 Notes Mcap
187 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
CS 61C: Great Ideas in Computer Architecture: Parallel Processing - SIMD
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Parallel Processing - SIMD
66 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Ca Raghu 07april2021
No ratings yet
Ca Raghu 07april2021
12 pages
21cs401 CA Unit V
No ratings yet
21cs401 CA Unit V
16 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
25 pages
ACA1
No ratings yet
ACA1
29 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Computer Architecture Flynn’s taxonomy
No ratings yet
Computer Architecture Flynn’s taxonomy
4 pages
Parallel Processing Lecture2
No ratings yet
Parallel Processing Lecture2
62 pages
NOTES
No ratings yet
NOTES
19 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Flynn's Classification - SISD, SIMD,MISD & MIMD
No ratings yet
Flynn's Classification - SISD, SIMD,MISD & MIMD
15 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Fulltext
No ratings yet
Fulltext
29 pages
Parallel Computing Unit 2 - Parallel Computing Architecture
No ratings yet
Parallel Computing Unit 2 - Parallel Computing Architecture
49 pages
onur-digitaldesign-2020-lecture20-gpu-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture20-gpu-beforelecture
73 pages
06 Flynn-S Classification
No ratings yet
06 Flynn-S Classification
31 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Co Part - I
No ratings yet
Co Part - I
38 pages
Nxlog Reference Manual
No ratings yet
Nxlog Reference Manual
165 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
53 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Module- 05 Cc(Bcs601) Search Creators
100% (2)
Module- 05 Cc(Bcs601) Search Creators
35 pages
Java Multithreading PDF
No ratings yet
Java Multithreading PDF
1 page
clarity-3d-solver-ds
No ratings yet
clarity-3d-solver-ds
3 pages
Com 314 Handout
No ratings yet
Com 314 Handout
24 pages
Teslapersonalsupercomputer 160201192005
No ratings yet
Teslapersonalsupercomputer 160201192005
16 pages
Cloud Computing Note (1)
No ratings yet
Cloud Computing Note (1)
15 pages
ERA2010 Conference List
No ratings yet
ERA2010 Conference List
136 pages
Hpclab
No ratings yet
Hpclab
58 pages
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
No ratings yet
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
6 pages
Master of Computer Applications
No ratings yet
Master of Computer Applications
35 pages
Instruction Level Parallelism (ILP)
No ratings yet
Instruction Level Parallelism (ILP)
9 pages
Term Paper Cse 211
No ratings yet
Term Paper Cse 211
20 pages
The Significance of SIMD, SSE and AVX - Intel - Slides (3a - SIMD)
No ratings yet
The Significance of SIMD, SSE and AVX - Intel - Slides (3a - SIMD)
57 pages
B Odyssey Ecl XCL
No ratings yet
B Odyssey Ecl XCL
6 pages
Cluster Computing
No ratings yet
Cluster Computing
11 pages
Computer Organization
No ratings yet
Computer Organization
2 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
37 pages
Scalability: Speed Up
No ratings yet
Scalability: Speed Up
30 pages
CSE - 3rd Sem Batch 2019 - Syllabus
No ratings yet
CSE - 3rd Sem Batch 2019 - Syllabus
30 pages
AUTODYN - Chapter 11 - Parallel - Processing PDF
No ratings yet
AUTODYN - Chapter 11 - Parallel - Processing PDF
42 pages
Distributed DBMS Reliability Unit IV
100% (1)
Distributed DBMS Reliability Unit IV
27 pages
Computer Hardware Engineering: IS1200, Spring 2015
No ratings yet
Computer Hardware Engineering: IS1200, Spring 2015
17 pages
Azure Application Architecture Guide
100% (1)
Azure Application Architecture Guide
1,420 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages

Basic Parallel Programming Methods

Uploaded by

Basic Parallel Programming Methods

Uploaded by

Real-Time Systems

Lecture Topic – Review of Basic Concurrent/Parallel Programming

Copyright © 2019 University of Colorado

MIMD and SPMD Architecture

DSP VLIW (SIMD) or MIMD

Both are threaded, but

Sharpen with thread grid 1. erast.c

Combine #1, #2, and #3

Co-Processing 4 - Hardest and Highly Effective: Grid programming 128 SPs

Due to Sequential Section

Compared to Parallel Section

…and Due to Final Step

SPMD – GPU or GP-GPU Co-Processor

You might also like