0% found this document useful (0 votes)

5 views

Week_5

The document discusses the cache coherence problem in multi-core computing, highlighting the importance of maintaining data consistency across private caches. It introduces common cache coherence protocols like MESI and MOESI, and emphasizes the significance of thread safety and affinity in programming for multi-core systems. Additionally, it covers Flynn's Taxonomy, which categorizes computer architectures based on instruction and data streams, and outlines the characteristics of symmetric multiprocessors (SMP).

Uploaded by

malikayan575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Week_5

Uploaded by

malikayan575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Parallel Computing

Landscape
(CS 526)

Department of Computer Science,

The University of Lahore,
The cache coherence
problem
• Since, we have private caches:
How to keep the data consistent across caches?

• Each core should perceive the memory as a

monolithic array, shared by all the cores
The cache coherence
problem
Suppose variable x initially contains
15213
Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache

multi-core chip
Main memory
x=15213
35
The cache coherence
problem
Core 1 reads
x
Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=15213

multi-core chip
Main memory
x=15213
36
The cache coherence problem
Core 2 reads
x
Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=15213 x=15213

multi-core chip
Main memory
x=15213
The cache coherence
problem
Core 1 writes to x, setting it to
21660
Core 1 Core 2 Core 3 Core 4

One or more One or more One or more One or more

levels of levels of levels of levels of
cache cache cache cache
x=21660 x=15213

multi-core chip
Main memory
x= 15213
The cache coherence
problem
Core 3 attempts to read x… gets a stale
copy
Both caches
Core 1 Core 2 Core 3 Core 4 contain
inconsistent data.

Unpredictable
One or more One or more One or more One or more Behavior
levels of levels of levels of levels of
cache cache cache cache
x=21660 x=15213

multi-core chip
Cache Coherence Protocols
Has many solutions: Main memory
x=15213
The cache coherence problem
To address the cache coherence problem, various cache coherence
protocols have been developed.
Two common cache coherence protocols are :

• MESI Protocol: MESI stands for Modified, Exclusive, Shared, and

Invalid.
• MOESI Protocol: MOESI extends the MESI protocol with an "Owned"
state
The cache coherence problem

• MESI Protocol: It is a widely used cache coherence protocol that

defines several states for each cache line (data block) in a cache,
allowing caches to coordinate reads and writes to maintain data
consistency.

• MOESI Protocol: The Owned state helps improve performance by

allowing a processor to read data in the Exclusive state without
sending requests to other caches, assuming that it is the only owner
of the data
Programming for Multi-
core
• Programmers must use threads or processes

• Spread the workload across multiple cores

• Write parallel algorithms

• OS will map threads/processes to cores

Thread safety very
important
• Pre-emptive context-switching:
context switch can happen AT ANY TIME

• True concurrency, not just uniprocessor time-

slicing

• Concurrency bugs exposed much faster with

multi-core
Assigning threads to
the cores
• Each thread/process has an affinity mask

• Affinity mask specifies what cores the thread is

allowed to run on

• Different threads can have different masks

• Affinities are inherited across fork( )

Affinity masks are bit
vectors
• Example: 4-way multi-core, without SMT

1 1 0 1

core 3 core 2 core 1 core 0

• Process/thread is allowed to run on

cores 0,2,3, but not on core 1
Affinity masks when multi-core and SM
combined
1 1 0 0 1 0 1 1

core 3 core 2 core 1 core 0

thread thread thread thread thread thread thread thread

1 0 1 0 1 0 1 0

• Core 2 can’t run the process

• Core 1 can only use one simultaneous thread
Default
Affinities
• Default affinity mask is all 1s:
all threads can run on all processors

• Then, the OS scheduler decides what threads run

on what core

• OS scheduler detects skewed workloads,

migrating threads to less busy
processors
Process migration is costly

• Need to restart the execution pipeline

• Cached data is invalidated

• OS scheduler tries to avoid migration as much as

possible:
it tends to keeps a thread on the same core
• This is called soft affinity
46
Hard
Affinities

• The programmer can prescribe her own affinities

(hard affinities)

• Rule of thumb: use the default scheduler unless a

good reason not to
When to set your own
affinities
• Two (or more) threads share data-structures
in memory
–map to same core so that can share cache
• Real-time threads:
Example: a thread running
a robot controller:
- must not be context
Source: Sensable.com

switched,
or else robot can go unstable
- dedicate an entire core just to this thread
Flynn’s
Taxonomy
• Michael Flynn (from Stanford)
– Made a characterization of computer systems
which became known as Flynn’s Taxonomy

Comput
er
Instructio Dat
ns a
Multiple Processor Organization
Flynn’s Taxonomy:
1.Single Instruction, Single Data stream - SISD
2.Single Instruction, Multiple Data stream - SIMD
3.Multiple Instruction, Single Data stream - MISD
4.Multiple Instruction, Multiple Data stream- MIMD
1. Single Instruction, Single Data Stream – SISD
• Single processor
• Single instruction stream
• Data stored in single memory
• Example: Uni-processor Systems

SI SISD SD
2. Single Instruction, Multiple Data Stream - SIMD

• Single machine instruction

• Controls simultaneous execution
• Large Number of processing elements
• Each processing element has associated memory
• Each instruction executed on different set of data
by different processors
• Examples: GPUs
SISD SD

SI SISD SD

SISD SD
3. Multiple Instruction, Single Data Stream - MISD

• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction
sequence, using same Data
• Few examples: Systolic array Processors

SI SISD

SI SISD SD

SI SISD
4. Multiple Instruction, Multiple Data Stream- MIMD

• Set of processors
• Simultaneously execute different instruction
sequences
• Different sets of data
• Examples: Multi-cores, SMPs, Clusters

SI SISD SD

SI SISD SD
MIMD - Overview
• General purpose processors
• Each can process all instructions necessary
• Further classified by method of processor
communication:
1. Via Shared Memory
2. Message Passing (Distributed Memory)
Taxonomy of Processor Architectures
Tightly Coupled -
SMP
• Processors share memory
• Communicate via that shared memory

• Symmetric Multiprocessor (SMP)

– Single shared memory
– Shared bus to access memory
– Memory access time to given area of memory is
approximately the same for each processor
Symmetric Multiprocessors
(SMPs)
– Two or more similar processors
– Processors share same memory and I/O
– Processors are connected by a bus or other internal
connection
– Memory access time is approximately the same for
each processor
– All processors share access to I/O
SMP
Advantages
• Performance
– If some work can be done in parallel
• Availability
– Failure of a single processor does not halt the system
• Incremental growth
– User can enhance performance by adding additional
processors
• Scaling
– Vendors can offer range of products based on number
of processors
Block Diagram of Tightly Coupled
Multiprocessor (SMP)
Symmetric Multiprocessor
Organization
Multithreading and Chip
Multiprocessors
• Instruction stream divided into smaller
streams called “threads”

• Executed in parallel
Definitions of Threads and
Processes
• Process:
– An instance of program running on computer
– A unit of resource ownership:
• virtual address space to hold process image
– Process switch

• Thread: dispatchable unit of work within process

– Includes processor context (which includes the PC register
and stack pointer) and data area for stack
– Interruptible: processor can turn to another thread

• Thread switch
– Switching processor between threads within same process
– Typically less costly than process switch
Implicit and Explicit
Multithreading
• All commercial processors use explicit
multithreading:
– Concurrently execute instructions from
different
explicit threads
– Interleave instructions from different threads on shared
pipelines OR parallel execution on parallel pipelines

• Implicit multithreading: Concurrent execution of

multiple threads extracted from single sequential
program:
– Implicit threads defined statically by
compiler or

FactoryTalk Optix OnCourse
100% (1)
FactoryTalk Optix OnCourse
180 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
Week 5
No ratings yet
Week 5
52 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
MODULE 4 hpc
No ratings yet
MODULE 4 hpc
41 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Final Report: Multicore Processors
No ratings yet
Final Report: Multicore Processors
12 pages
Thread Level Parallelism
No ratings yet
Thread Level Parallelism
21 pages
SSC Course 6 CPU
No ratings yet
SSC Course 6 CPU
17 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
This Unit: Shared Memory Multiprocessors: - Three Issues
No ratings yet
This Unit: Shared Memory Multiprocessors: - Three Issues
17 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Chapter 4 TLP
No ratings yet
Chapter 4 TLP
46 pages
Multiprocessor Architecture: Taxonomy of Parallel Architectures
100% (1)
Multiprocessor Architecture: Taxonomy of Parallel Architectures
32 pages
Why Multiprocessors?: Motivation: Opportunity
No ratings yet
Why Multiprocessors?: Motivation: Opportunity
20 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Architecture
No ratings yet
Architecture
67 pages
Unit VI
No ratings yet
Unit VI
50 pages
Introduction To Multicore Programming: University of Western Ontario, London, Ontario (Canada)
No ratings yet
Introduction To Multicore Programming: University of Western Ontario, London, Ontario (Canada)
60 pages
07 Multiprocessors MF PDF
No ratings yet
07 Multiprocessors MF PDF
99 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
07 Introduction To Multicore Programming PDF
No ratings yet
07 Introduction To Multicore Programming PDF
60 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
PART17
No ratings yet
PART17
45 pages
Parallelism and Multicores
No ratings yet
Parallelism and Multicores
54 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
PDC_MidTerm (1)
No ratings yet
PDC_MidTerm (1)
1 page
Week_7 (1)
No ratings yet
Week_7 (1)
27 pages
Week_4
No ratings yet
Week_4
11 pages
Week_6_A
No ratings yet
Week_6_A
22 pages
Week_3
No ratings yet
Week_3
12 pages
Week_2
No ratings yet
Week_2
18 pages
Git Apprentice Second Edition Raywenderlich Tutorial Team 2024 Scribd Download
100% (15)
Git Apprentice Second Edition Raywenderlich Tutorial Team 2024 Scribd Download
81 pages
PROJECT REPORT KRISHI MART - Deleted
No ratings yet
PROJECT REPORT KRISHI MART - Deleted
26 pages
Image Segmentation Homework
100% (1)
Image Segmentation Homework
4 pages
CADD
No ratings yet
CADD
66 pages
American English File 1 Student Book - Flip PDF - FlipBuilder
No ratings yet
American English File 1 Student Book - Flip PDF - FlipBuilder
176 pages
Sams Teach Yourself: Unity 2018 Game Development in 24 Hours Third Edition Mike Geig All Chapters Instant Download
100% (2)
Sams Teach Yourself: Unity 2018 Game Development in 24 Hours Third Edition Mike Geig All Chapters Instant Download
55 pages
Chapter 2 - Hardware Basics - Inside The Box PDF
No ratings yet
Chapter 2 - Hardware Basics - Inside The Box PDF
4 pages
Group No. Course Code Course Title Unique Code
No ratings yet
Group No. Course Code Course Title Unique Code
86 pages
السحر الاحمر PDF
100% (2)
السحر الاحمر PDF
28 pages
Mod 1 Presentation1 DualP
No ratings yet
Mod 1 Presentation1 DualP
48 pages
BSBMGT407 Apply Digital Solutions To Work Processes
No ratings yet
BSBMGT407 Apply Digital Solutions To Work Processes
50 pages
BIGTRRETECH Eddy V1.0 User Manual
No ratings yet
BIGTRRETECH Eddy V1.0 User Manual
28 pages
Configuration System: BIOS Settings
No ratings yet
Configuration System: BIOS Settings
76 pages
Power Bi New 2021 Sunil Sir
100% (1)
Power Bi New 2021 Sunil Sir
321 pages
STW 2007A Installation Guide
No ratings yet
STW 2007A Installation Guide
21 pages
Intrenship Report
No ratings yet
Intrenship Report
53 pages
TraceTek TT TS12 and TT TS12 E Touchscreen Operation Manual
No ratings yet
TraceTek TT TS12 and TT TS12 E Touchscreen Operation Manual
116 pages
Synopsis
No ratings yet
Synopsis
16 pages
Log
No ratings yet
Log
2 pages
Edited Specs
No ratings yet
Edited Specs
7 pages
SPOTIFY CLONE Application
No ratings yet
SPOTIFY CLONE Application
10 pages
IMPRESORA
No ratings yet
IMPRESORA
8 pages
Chapter 4 Storage of Multimedia
No ratings yet
Chapter 4 Storage of Multimedia
6 pages
Module 5 Notes_367e51a05f7518ac8b0113e959d914d4
No ratings yet
Module 5 Notes_367e51a05f7518ac8b0113e959d914d4
33 pages
it practical file(21570006)
No ratings yet
it practical file(21570006)
35 pages
STD-X-RDBMS-1 MARKS
No ratings yet
STD-X-RDBMS-1 MARKS
37 pages
Essential Mac and Windows Shortcuts
No ratings yet
Essential Mac and Windows Shortcuts
1 page
7010 Computer Studies: MARK SCHEME For The October/November 2014 Series
No ratings yet
7010 Computer Studies: MARK SCHEME For The October/November 2014 Series
6 pages
PyDVE - An Open-Source Python-Based Design Verification
No ratings yet
PyDVE - An Open-Source Python-Based Design Verification
13 pages

Week_5

Uploaded by

Week_5

Uploaded by

Parallel Computing

Department of Computer Science,

• Each core should perceive the memory as a

One or more One or more One or more One or more

One or more One or more One or more One or more

One or more One or more One or more One or more

One or more One or more One or more One or more

• MESI Protocol: MESI stands for Modified, Exclusive, Shared, and

• MESI Protocol: It is a widely used cache coherence protocol that

• MOESI Protocol: The Owned state helps improve performance by

• Spread the workload across multiple cores

• Write parallel algorithms

• OS will map threads/processes to cores

• True concurrency, not just uniprocessor time-

• Concurrency bugs exposed much faster with

• Affinity mask specifies what cores the thread is

• Different threads can have different masks

• Affinities are inherited across fork( )

core 3 core 2 core 1 core 0

• Process/thread is allowed to run on

core 3 core 2 core 1 core 0

thread thread thread thread thread thread thread thread

• Core 2 can’t run the process

• Then, the OS scheduler decides what threads run

• OS scheduler detects skewed workloads,

• Need to restart the execution pipeline

• Cached data is invalidated

• OS scheduler tries to avoid migration as much as

• The programmer can prescribe her own affinities

• Rule of thumb: use the default scheduler unless a

• Single machine instruction

• Symmetric Multiprocessor (SMP)

• Thread: dispatchable unit of work within process

• Implicit multithreading: Concurrent execution of

You might also like