0% found this document useful (0 votes)

177 views42 pages

pdc1: MODULE 1: PARALLELISM FUNDAMENTALS

This document discusses parallel and distributed computing fundamentals. It covers motivation for parallel computing like addressing computationally intensive problems. Key concepts discussed include Flynn's taxonomy of parallel systems and challenges of parallelization like load imbalance. Multi-core processors with shared or separate caches are presented as solutions to Moore's law limitations. Parallelism can be exploited through instruction-level, data-level, task-level and request-level parallelism. Amdahl's law and Gustafson's law govern parallel scalability.

Uploaded by

Vandana M 19BCE1763

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views42 pages

pdc1: MODULE 1: PARALLELISM FUNDAMENTALS

Uploaded by

Vandana M 19BCE1763

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Parallel and Distributed Computing

Module 1-Parallelism Fundamentals

Outline
• Motivation
• Key Concepts
• Challenges
• Parallel computing
• Flynn‘s Taxonomy
• Multi-core Processors,
• Shared vs Distributed memory.
SCOPE, VIT Chennai
Motivation
• To address computationally intensive
problems and increase their
response time
• Weather Forecasting
• Genome Sequencing
• Crash Simulation Testing

SCOPE, VIT Chennai

Moore’s Law
• Transistor count on a integerated
chip doubles every 24 months
independent of technology used
– @Gordon Moore, Co-founder of Intel Corp
(1965)

SCOPE, VIT Chennai

Moore’s Law
• Pipelined functional units:
instruction level parallelism
• Superscalar architecture
• Out-of-order execution
• Larger Caches

SCOPE, VIT Chennai

Moore’s Law
• More complexity will not lead to more efficiency
• More functional units are packed into CPU
• The higher probability that the “average” code
will not be able to use them
– As the number of independent instructions in a
sequential instruction stream is limited.
• Faster clock boosts power dissipation making
idling transistors useless

SCOPE, VIT Chennai

Power-Performance Dilemma
• Simplify processor design
• Additional transistors for larger
cache
• So, multi-core processors came into
existence

SCOPE, VIT Chennai

Parallelization
• Misconception: More hardware
Faster Response Time
• Billions of CPU hours wasted
• Supercomputer users have no idea
about limitations of parallel
execution

SCOPE, VIT Chennai

Classes of Parallelism
• Data-Level Parallelism (DLP) : Huge
data can be operated in parallel
• Task-Level Parallelism (TLP) :
independent and large tasks in
parallel on at the same time.

SCOPE, VIT Chennai

Ways to exploit parallelism
• Instruction-Level Parallelism:
– Exploit data-level parallelism
– Pipelining
• Vector Architectures and Graphic
Processor Units (GPUs)
– Exploit data-level parallelism
– Applies a single instruction to a collection of
data in parallel.

SCOPE, VIT Chennai

Ways to exploit parallelism
• Thread-Level Parallelism
– Exploits either data-level parallelism or task-level
parallelism
– Allows for interaction among parallel threads.
• Request-Level Parallelism
– Exploits parallelism among largely decoupled tasks
– Specified by the programmer or the operating
system.

SCOPE, VIT Chennai

Flynn’s Taxonomy
• These 4 ways to support
the data-level parallelism
and task-level parallelism
for hardware [50 years SISD SIMD
back].
• When Michael Flynn
[1966] found a simple
MISD MIMD
classification whose
abbreviations we still use
today.
SCOPE, VIT Chennai
Single instruction stream,
single data stream (SISD)
• This category is the
uniprocessor. INSTRUCTION
STREAM
• No exploitation of
parallelism
DATA Processing
• Can have concurrent STREAM
Unit

processing
• Pipelined and super
scalar processors
SCOPE, VIT Chennai
Single instruction stream,
multiple data streams (SIMD)
INSTRUCTION STREAM

DATA Processing
STREAM1 Unit

DATA Processing
DATA
STREAM 2 Unit

DATA Processing
Unit
STREAM 3

SCOPE, VIT Chennai

Multiple instruction streams,
single data stream (MISD)
INSTRUCTION STREAM

Instruction Instruction Instruction

Stream1 Stream2 Stream3

Processing
Unit

DATA Processing
STREAM Unit

Processing
Unit

SCOPE, VIT Chennai

MIMD
INSTRUCTION STREAM

Instruction Instruction Instruction

Stream1 Stream2 Stream3

DATA Processing
STREAM1 Unit

DATA Processing
DATA
STREAM 2 Unit

DATA Processing
Unit
STREAM 3

SCOPE, VIT Chennai

Parallel Scalability
• Factors limit parallel execution
• Scalability metrics
• Scalability Laws:
– Amdahl’s Law
– Gustafson’s Law
• Parallel efficiency

SCOPE, VIT Chennai

Factors that limit parallelism
1 2 3 4 5 6 7 8 9 10 11 12

Time
W1 1 2 3 4

W2 5 6 7 8 Top: Sequence of tasks that

needs to be parallelized
W3 9 10 11 12 Bottom: 3 workers
(W1,W2,W3) are used to
parallelize the tasks with the
Time time reduction

SCOPE, VIT Chennai

• But it will not be true and the factor is load
imbalance
• Some of the resources will be under utilized
• Communication between co-workers
• Tools to be shared by the co-workers
• These factors will add overhead which will
serialize some part of parallel execution

SCOPE, VIT Chennai

Load imbalance
• Tasks will be
executed by the
workers at
W1 1 2 3 4
different speeds
W2 5 6 7 8 because of load
imbalance
W3 11
9 10 12
• Regions indicate
that unused
Time
regions

SCOPE, VIT Chennai

Scalability Metrics
• Algorithmic limitations because of
mutual dependencies
• Bottlenecks because of shared
resources
• Startup Overhead
• Communication

SCOPE, VIT Chennai

Run-time
• For
fixed problem to be solved by N
workers,
• The single worker run time,
• The N-workers run time,
• Where S is amount of work can be
serialized and P is amount of work
that can be parallelized
SCOPE, VIT Chennai
Scalability Laws
• Application
speed up is defined as
quotient of parallel and serial
performance for fixed problem size
• Performance can be defined as “work
done over time”
• Serial Performance, = 1
• Parallel Performance,
SCOPE, VIT Chennai
Amdahl’s Law
•• Gene
Amdahl, application speed up is
limited to 1/s when N tends to infinite
• “What is the improvement in runtime of an
application run when the problem is put on
N CPUs?”
• Serial performance,
• Parallel Performance,
• Speed up =
SCOPE, VIT Chennai
Problems
• Bob is given the job to write a program that
will get a speedup of 3.8 on 4 processors.
• He makes it 95% parallel, and goes home
dreaming of a big pay raise.
• Using Amdahl’s law, and assuming the
problem size is the same as the serial
version, and ignoring communication costs,
what speedup will Bob actually get?
SCOPE, VIT Chennai
Gustafson’s Law
• “How
much more work can my
program do in a given amount of
time when I put a larger problem on
CPUs?”

• Where 0 < < 1, is number of workers,

SCOPE, VIT Chennai

Parallel Efficiency
•

SCOPE, VIT Chennai

Multi-core Processors
• Cores on a die can have separate caches or shared caches at
certain levels
• Sharing cache will reduce latency and improve bandwidth for
communication between core
• But it may lead to cache bandwidth bottlenecks
• Recent multicore designs have integrated memory controller
which is connected directly to memory modules without
chipset
• This reduce main memory latency and allows addition of
Hyper Transport or Quick Path inter-socket networks
• Efficient cache coherence communication

SCOPE, VIT Chennai

Dual core Processor
• Separate L1, L2,
and L3 Cahes P P
• Intel Monticeto
L1D L1D

L2 L2

L3 L3

SCOPE, VIT Chennai

Quad Core Processor
• Two Dual core processors with Shared
L1, Separate L2 (Intel Harpertown)

P P P P
L1D L1D L1D L1D
L2 L2

SCOPE, VIT Chennai

Hexa core processor
• Shared L2,
separate L3
• L2 groups are P P P P P P
dual cores and L1D L1D L1D L1D L1D L1D
L3 group is L2 L2 L2
whole chip L3

• Intel
Dunnigton
SCOPE, VIT Chennai
Quad Core Processor
• Separate L1 and L2
caches
• Shared L3 cache P P P P
• L3 group is whole chip
• Built-in memory L1D L1D L1D L1D
interface allows to L2 L2 L2 L2
attach memory and
L3
other sockets without HT/
chipset QPI
Memory Interface
• AMD Shangai and Intel
Nehalem
SCOPE, VIT Chennai
Shared-memory
• A system where the number of CPUs
work on the physical address space
• Two varieties:
– Uniform Memory Access (UMA)
– Cache Coherent Non-Uniform Memory
Access (ccNUMA)

SCOPE, VIT Chennai

UMA
• Known as Symmetric Multi
Processing (SMP)
• Latency and Bandwidth are the same
for all the processors
• Simplest implementation is Dual-
core

SCOPE, VIT Chennai

UMA System with two Single-
core chips
Common
P Front Side
Bus (FSB) P Socket

L1D L1D

L2 L2

Chipset

Memory

SCOPE, VIT Chennai

UMA System with two Dual-
core chips
P P P P Socket

L1D L1D L1D L1D

L2 L2

Chipset

Memory

SCOPE, VIT Chennai

Cache Coherence
• Cache coherence mechanism is
required
• Because of same copy of cache line is
resided in in several caches
• If one of those caches get modified,
other cache content needs to be
reflected
SCOPE, VIT Chennai
MESI Protocol
• M: modified
P1 P2
• E: exclusive
•
C1 C2
S: shared 3 7

• I: invalid
A1 A2 A1 A2

A1 A2
Memory

SCOPE, VIT Chennai

• 1. C1 requests exclusive CL
ownership P1 P2
• 2. set CL in C2 to state I
• 3. CL has state E in C1 → modify
C1 C2
A1 in C1 and set to state M
• 4. C2 requests exclusive CL 3 7
ownership
• 5. evict CL from C1 and set to A1 A2 A1 A2
state I
• 6. load CL to C2 and set to state A1 A2
E
Memory
• 7. modify A2 in C2 and set to
state M in C2
Courtesy: Hager, G. (2017). Introduction To high performance
computing for scientists and engineers: CRC press
SCOPE, VIT Chennai
NUMA System

P P P P P P P P
L1D L1D L1D L1D L1D L1D L1D L1D

L2 L2 L2 L2 L2 L2 L2 L2

L3 L3

Coherent
Memory Interface Memory Interface
Link

Memory Memory

SCOPE, VIT Chennai

Distributed Memory
P P P P-Processor
P
C-Cache
NI-Network
C C C Interface
C
M-Memory

M M M M

NI NI NI NI

Communication Network

SCOPE, VIT Chennai

Reference
• Hager, G. (2017). Introduction To
high performance computing for
scientists and engineers: CRC press.
• Patterson and Hennesy. Computer
architecture: a quantitative
approach, 5th edition

SCOPE, VIT Chennai

Unit 5
No ratings yet
Unit 5
86 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Unit IV Data Mining
No ratings yet
Unit IV Data Mining
65 pages
Architecture
No ratings yet
Architecture
67 pages
Lecture Notes: Microprocessors and Microcontrollers
No ratings yet
Lecture Notes: Microprocessors and Microcontrollers
217 pages
Docker Made Easy PDF
100% (2)
Docker Made Easy PDF
110 pages
Lecture1 Introduction to Parallel Computing_2025
No ratings yet
Lecture1 Introduction to Parallel Computing_2025
38 pages
DBMS
No ratings yet
DBMS
334 pages
Cloud Deployment Models
No ratings yet
Cloud Deployment Models
33 pages
EIOT NOTES
No ratings yet
EIOT NOTES
129 pages
Unit 2 (Process Synchronization) 1
No ratings yet
Unit 2 (Process Synchronization) 1
79 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
Desd Edd QP
No ratings yet
Desd Edd QP
15 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Advance Computer Archtecture CS501
100% (1)
Advance Computer Archtecture CS501
442 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Linux Model QP 2023
No ratings yet
Linux Model QP 2023
2 pages
Marketing Strategy Formulation (Harvard Business School HBS)
No ratings yet
Marketing Strategy Formulation (Harvard Business School HBS)
88 pages
Cloud Computing CCS335 - Unit 2
No ratings yet
Cloud Computing CCS335 - Unit 2
27 pages
PowerPoint Slides To Chapter 07
No ratings yet
PowerPoint Slides To Chapter 07
49 pages
(Information Communication Technology) : By: Sankalp Singh Section: M12 Roll No: 40
No ratings yet
(Information Communication Technology) : By: Sankalp Singh Section: M12 Roll No: 40
105 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Dependency Graph and Bernstein Conditions
No ratings yet
Dependency Graph and Bernstein Conditions
39 pages
Module1 Introduction
No ratings yet
Module1 Introduction
42 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
55 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Parallel Computer Models: CSE7002: Advanced Computer Architecture
No ratings yet
Parallel Computer Models: CSE7002: Advanced Computer Architecture
37 pages
LECTURE - 2 (AIOT) Essential Linux Commands
No ratings yet
LECTURE - 2 (AIOT) Essential Linux Commands
125 pages
A Project Synopsis Animation Using Applet: Doraemon: Submitted by
100% (1)
A Project Synopsis Animation Using Applet: Doraemon: Submitted by
22 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
8 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Concepts of Abstraction and Virtualization in Cloud Computing
No ratings yet
Concepts of Abstraction and Virtualization in Cloud Computing
20 pages
Parallel and Distributed Computing Lecture 03
No ratings yet
Parallel and Distributed Computing Lecture 03
44 pages
Advantech Linux 2.5 User Manual V1.06 PDF
No ratings yet
Advantech Linux 2.5 User Manual V1.06 PDF
61 pages
Docker Inc Docker Fundamentals Course PDF
0% (1)
Docker Inc Docker Fundamentals Course PDF
193 pages
Chapter 8: Main Memory: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
No ratings yet
Chapter 8: Main Memory: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
63 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
2161CS136 Distributed Systems: Unit II Process and Distributed Objects Lecture No.10 Interprocess Communication
No ratings yet
2161CS136 Distributed Systems: Unit II Process and Distributed Objects Lecture No.10 Interprocess Communication
23 pages
Chapter -2 introduction to HADOOP
No ratings yet
Chapter -2 introduction to HADOOP
34 pages
DBMS (R20) Unit - 5
No ratings yet
DBMS (R20) Unit - 5
32 pages
XP Booting
No ratings yet
XP Booting
76 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Ecs Cse 7thsem Unit 1 For VTU, Belgaum
No ratings yet
Ecs Cse 7thsem Unit 1 For VTU, Belgaum
81 pages
01 en Principles of Distributed Systems
No ratings yet
01 en Principles of Distributed Systems
35 pages
Ec8552-Cao Unit 5
No ratings yet
Ec8552-Cao Unit 5
72 pages
2.1 Installation Requirements - . - . - . - . - . - . - . 3
No ratings yet
2.1 Installation Requirements - . - . - . - . - . - . - . 3
42 pages
Co Unit 1 Notes
100% (1)
Co Unit 1 Notes
51 pages
OS 30 Q Assignment
No ratings yet
OS 30 Q Assignment
8 pages
Lecture 01 - Computer Hardware and Software Architectures
No ratings yet
Lecture 01 - Computer Hardware and Software Architectures
67 pages
Fixerlog
No ratings yet
Fixerlog
2 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
No ratings yet
Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
49 pages
Lec-1-2 ISA
No ratings yet
Lec-1-2 ISA
52 pages
Real-Time Databases: DR Shamala Subramaniam
100% (1)
Real-Time Databases: DR Shamala Subramaniam
27 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Selenium 1.3
No ratings yet
Selenium 1.3
6 pages
Parallelism
No ratings yet
Parallelism
22 pages
Nehru College of Management
No ratings yet
Nehru College of Management
1 page
Recent Trends in Operating Systems and Their Applicability To HPC
No ratings yet
Recent Trends in Operating Systems and Their Applicability To HPC
7 pages
Super DAT
No ratings yet
Super DAT
2 pages
Advanced NTFS Boot and MFT Repair
No ratings yet
Advanced NTFS Boot and MFT Repair
2 pages
NX View Drop
No ratings yet
NX View Drop
60 pages
Bca Cbcs Sylllabus Mku 2016 17
No ratings yet
Bca Cbcs Sylllabus Mku 2016 17
50 pages
Log
No ratings yet
Log
5 pages
LINUX
No ratings yet
LINUX
19 pages
How To Use The Fdisk Tool and The Format Tool To Partition or Repartition A Hard Disk
No ratings yet
How To Use The Fdisk Tool and The Format Tool To Partition or Repartition A Hard Disk
22 pages
Walkthrough To Install and Operate The Hydrologic Evaluation of Landfill Performance (HELP) Model v3.07
No ratings yet
Walkthrough To Install and Operate The Hydrologic Evaluation of Landfill Performance (HELP) Model v3.07
13 pages
Computer Architecture
No ratings yet
Computer Architecture
12 pages
Unit Ii
No ratings yet
Unit Ii
61 pages
SBF21
No ratings yet
SBF21
6 pages
Student Marks Analyzing System
No ratings yet
Student Marks Analyzing System
8 pages
Seminar Report ON "Linux"
No ratings yet
Seminar Report ON "Linux"
17 pages
FIFA Mod Manager Log20230902 - 001
No ratings yet
FIFA Mod Manager Log20230902 - 001
18 pages
Fast Lane - RH-EX200
No ratings yet
Fast Lane - RH-EX200
4 pages
Computer Architecture
No ratings yet
Computer Architecture
4 pages
Superpipelining
No ratings yet
Superpipelining
7 pages
4 Bit Aritchmatic Logic Unit
No ratings yet
4 Bit Aritchmatic Logic Unit
18 pages
Unit I Fundamentals of Computer Design and Ilp-1-14
No ratings yet
Unit I Fundamentals of Computer Design and Ilp-1-14
14 pages
Windows Run Commands
No ratings yet
Windows Run Commands
7 pages
HighPerformanceComputing DS
No ratings yet
HighPerformanceComputing DS
2 pages
GTU Question Bank
No ratings yet
GTU Question Bank
3 pages
Database
No ratings yet
Database
5 pages
Li-Fi Proposal PDF
No ratings yet
Li-Fi Proposal PDF
4 pages
S.No Topics Lec: Advanced Computer Network ETCS-401
No ratings yet
S.No Topics Lec: Advanced Computer Network ETCS-401
4 pages
Intelligent Electronic Devices Standard Requirements
From Everand
Intelligent Electronic Devices Standard Requirements
Gerardus Blokdyk
2/5 (1)

pdc1: MODULE 1: PARALLELISM FUNDAMENTALS

Uploaded by

pdc1: MODULE 1: PARALLELISM FUNDAMENTALS

Uploaded by

Parallel and Distributed Computing

Module 1-Parallelism Fundamentals

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

Instruction Instruction Instruction

SCOPE, VIT Chennai

Instruction Instruction Instruction

SCOPE, VIT Chennai

SCOPE, VIT Chennai

W2 5 6 7 8 Top: Sequence of tasks that

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

• Where 0 < < 1, is number of workers,

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

L1D L1D L1D L1D

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

SCOPE, VIT Chennai

You might also like