0% found this document useful (0 votes)

164 views

Lecture-11 Dynamic Scheduling A

The document discusses dynamic scheduling, which allows hardware to rearrange instruction execution to reduce stalls while maintaining data flow and exception behavior. It has three main advantages: 1) code can run efficiently on different pipelines without recompilation, 2) it handles dependencies not known at compile time, and 3) it efficiently schedules instructions against unpredictable delays like cache misses.

Uploaded by

Yumna Shahzad

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views

Lecture-11 Dynamic Scheduling A

Uploaded by

Yumna Shahzad

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Dynamic Scheduling

1
Summary of contents covered
• Introduction to RISC-V processor
• Assembly and Machine language of RISC-V
• RISC-V Single Cycle Implementation
• Pipelining concepts and Hazards
• Pipelined RISC-V Implementation
• Cache memory design

2
Forwarding and stalls in Pipeline
• In pipelined processor, in case of data hazards, pipelined
is stalled if data dependence is not resolved by bypass.

• Instruction is waiting in decode phase, until the data

dependency is resolved.

• All instructions are executed in order. All instruction after

the waited instruction also keep on waiting.

• Compiler or assembly language programmer is

responsible for code re-ordering to minimize these stalls.
3
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce

stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

1. Code compiled on one pipeline, runs efficiently on other

pipelined architectures.
• No need to compile binary for each different pipeline
• Third party software are distributed as binary files

4
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce

stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

2. Efficiently handles the dependencies that are not known

at compile time.
• Such as memory dependencies

5
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce

stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

3. Efficiently schedules instructions against unpredictable

delays.
• Such as cache misses

6
Instruction Level Parallelism
(Dynamic Scheduling & Tomasulo Algorithm)
Advantages of Dynamic Scheduling
• Dynamic scheduling - hardware rearranges the
instruction execution to reduce stalls while maintaining
data flow and exception behavior
• Handles cases when dependences unknown at compile
time
– it allows the processor to tolerate unpredictable
delays such as cache misses, by executing other
code while waiting for the miss to resolve
• Allows code that compiled for one pipeline to run
efficiently on a different pipeline
• Simplifies the compiler
• Leads to hardware speculation, a technique with
significant performance advantages (discuss later)

CMSC 411 - 8 (from Patterson) 8

HW Schemes: Instruction Parallelism
• Key idea: Allow instructions behind stall to proceed
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
• Enables out-of-order execution and allows out-of-order
completion (e.g., SUBD)
– In a dynamically scheduled pipeline, all instructions still
pass through issue stage in order (in-order issue)
• Will distinguish when an instruction begins execution and
when it completes execution; between 2 times, the
instruction is in execution
• Note: Dynamic execution creates WAR and WAW
hazards and makes handling exceptions harder

CMSC 411 - 8 (from Patterson) 9

Dynamic Scheduling Step 1
• Simple pipeline had 1 stage to check both structural
and data hazards: Instruction Decode (ID), also
called Instruction Issue
• Split the ID pipe stage of simple 5-stage pipeline into
2 stages:
– Issue
» Decode instructions, check for structural hazards
– Read operands
» Wait until no data hazards, then read operands

CMSC 411 - 8 (from Patterson) 10

A Dynamic Algorithm: Tomasulo’s
• For IBM 360/91 (before caches!)
  Long memory latency
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360)
prevented interesting compiler scheduling of
operations
– This led Tomasulo to try to figure out how to get
more effective registers — renaming in hardware!

• Why study a 1966 computer?

– The descendants of this have flourished!
» Alpha 21264, Pentium 4, AMD Opteron, Power 5, …

CMSC 411 - 8 (from Patterson) 11

Tomasulo Algorithm
• Control & buffers distributed with Function Units (FU)
– FU buffers called “reservation stations”; have
pending operands
• Registers in instructions replaced by values or
pointers to reservation stations (RS); called register
renaming;
– Renaming avoids WAR, WAW hazards
– More reservation stations than registers, so can
do optimizations compilers can’t

CMSC 411 - 8 (from Patterson) 12

Tomasulo Algorithm (cont.)
• Results to FU from RS, not through registers, over
Common Data Bus that broadcasts results to all FUs
– Avoids RAW hazards by executing an instruction
only when its operands are available
• Load and Stores treated as FUs with RSs as well
• Integer instructions can go past branches (use
branch prediction), also allow FP ops beyond basic
block in FP queue

CMSC 411 - 8 (from Patterson) 13

Tomasulo Organization From H&P
Figure 2.9

From Mem FP Op FP Registers

Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5 Store
Load6
Buffers

Add1
Add2 Mult1
Add3 Mult2

Reservation To Mem
Stations
FP
FP adders
adders FP
FP multipliers
multipliers

Common Data Bus (CDB)

CMSC 411 - 10 (from Patterson) 14
Reservation Station Components
• Op: Operation to perform in the unit (e.g., + or –)
• Vj, Vk: Value of Source operands
– Store buffers have V field, result to be stored
• Qj, Qk: Reservation stations producing source
registers (value to be written)
– Note: Qj,Qk=0 => ready
– Store buffers only have Qi for RS producing result
• Busy: Indicates reservation station or FU is busy

In addition
• Register result status table—Indicates which
functional unit will write each register, if one exists.
Blank when no pending instructions that will write that
register.
CMSC 411 - 10 (from Patterson) 15
Three Stages of Tomasulo
Algorithm
1. Issue—get instruction from FP Op Queue
– If reservation station free (no structural hazard),
control issues instr & sends operands (renames
registers).
2. Execute—operate on operands (EX)
– When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)
– Write on Common Data Bus to all awaiting units;

mark reservation station available

CMSC 411 - 10 (from Patterson) 16

Common Data Bus
• Normal data bus: data + destination (“go to” bus)
• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source
address
– Write if matches expected Functional Unit
(produces result)
– Does the broadcast

CMSC 411 - 10 (from Patterson) 17

Tomasulo Example
Instruction stream
Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2 3 Load/Buffers
Latencies:
Reservation Stations: S1 S2 RS RS LD 1
Time Name Busy Op Vj Vk Qj Qk ADD 2
Add1 No MULT 10
FU count Add2 No DIV 40
Add3 No
down 3 FP Adder R.S.
Mult1 No
Mult2 No 2 FP Mult R.S.
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU

Clock cycle
counter

CMSC 411 - 10 (from Patterson) 18

Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
No ratings yet
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
8 pages
Dynamic scheduling - Tomasulo Algorithm
No ratings yet
Dynamic scheduling - Tomasulo Algorithm
48 pages
Sections 3.2 and 3.3 Dynamic Scheduling - Tomasulo's Algorithm
No ratings yet
Sections 3.2 and 3.3 Dynamic Scheduling - Tomasulo's Algorithm
53 pages
4.2-tomasulo
No ratings yet
4.2-tomasulo
13 pages
Lecture 9: Dynamic Scheduling: Kunle Olukotun Gates 302 Kunle@ogun - Stanford.edu
No ratings yet
Lecture 9: Dynamic Scheduling: Kunle Olukotun Gates 302 Kunle@ogun - Stanford.edu
14 pages
2. Dynamic Approach Tomosulo Algorithm
No ratings yet
2. Dynamic Approach Tomosulo Algorithm
57 pages
Lec18 Tomasulo Algorithm
No ratings yet
Lec18 Tomasulo Algorithm
40 pages
DSP_presentation_Sumit 5
No ratings yet
DSP_presentation_Sumit 5
45 pages
Lec 13
No ratings yet
Lec 13
13 pages
2. Dynamic Approach Tomosulo Algorithm
No ratings yet
2. Dynamic Approach Tomosulo Algorithm
59 pages
Tomasulo Algorithm
No ratings yet
Tomasulo Algorithm
38 pages
Dynamic Scheduling Using Tomasulo's Approach
No ratings yet
Dynamic Scheduling Using Tomasulo's Approach
4 pages
Lecture 6
No ratings yet
Lecture 6
29 pages
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
No ratings yet
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
54 pages
Elec327b DSP Processors 1
No ratings yet
Elec327b DSP Processors 1
21 pages
DSP_presentation_Sumit 4
No ratings yet
DSP_presentation_Sumit 4
55 pages
DSP Module 5 Notes
No ratings yet
DSP Module 5 Notes
35 pages
01 Introduction
No ratings yet
01 Introduction
29 pages
5.1. Unit V - DSP Processor
No ratings yet
5.1. Unit V - DSP Processor
83 pages
Parallelism in Uni-Processor Systems and Intel 8089 Iop
No ratings yet
Parallelism in Uni-Processor Systems and Intel 8089 Iop
117 pages
02 Architecture of Arm
No ratings yet
02 Architecture of Arm
43 pages
Tomasulo's Algorithm: Reservation Station Components
No ratings yet
Tomasulo's Algorithm: Reservation Station Components
13 pages
Tomasulo Algorithm and Dynamic Branch Prediction
No ratings yet
Tomasulo Algorithm and Dynamic Branch Prediction
57 pages
Data Hazards
No ratings yet
Data Hazards
31 pages
View01 Dataflow Intro
No ratings yet
View01 Dataflow Intro
9 pages
Unit 1
No ratings yet
Unit 1
44 pages
Out-Of-Order Completion
No ratings yet
Out-Of-Order Completion
12 pages
Implementation of DSP Algorithms
No ratings yet
Implementation of DSP Algorithms
20 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
DSP Processors
100% (1)
DSP Processors
24 pages
DSP Notes
No ratings yet
DSP Notes
15 pages
Date-Wise Lesson Plan Semester: 8Th Branch: Cse-A&B Subject: Embedded System Subject Code: Date Topics To Be Covered
No ratings yet
Date-Wise Lesson Plan Semester: 8Th Branch: Cse-A&B Subject: Embedded System Subject Code: Date Topics To Be Covered
3 pages
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
No ratings yet
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
4 pages
(Ebook PDF) Computer Systems 5Th Edition
No ratings yet
(Ebook PDF) Computer Systems 5Th Edition
49 pages
01 Introduction
No ratings yet
01 Introduction
29 pages
Computer Systems 5th Edition (Ebook PDF) 2024 Scribd Download
100% (2)
Computer Systems 5th Edition (Ebook PDF) 2024 Scribd Download
49 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
DSP Unit-5 Solutions
No ratings yet
DSP Unit-5 Solutions
17 pages
Digital Signal Processor: Architecture
No ratings yet
Digital Signal Processor: Architecture
3 pages
General and Special Purpose Hardware For DSP
No ratings yet
General and Special Purpose Hardware For DSP
146 pages
INTRODUCTION TO DSP PROCESSORS Unit-5
No ratings yet
INTRODUCTION TO DSP PROCESSORS Unit-5
43 pages
UNIT-3 Hardware-Based Speculation
No ratings yet
UNIT-3 Hardware-Based Speculation
27 pages
Refer Slide Time: 01:50
No ratings yet
Refer Slide Time: 01:50
28 pages
DSP 5th Unit
No ratings yet
DSP 5th Unit
26 pages
Chapter One: Introduction To Pipelined Processors
No ratings yet
Chapter One: Introduction To Pipelined Processors
48 pages
Module1_2_24-25
No ratings yet
Module1_2_24-25
22 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
7.7 Sectioion 7 Architecture, Data Communication & Networking
No ratings yet
7.7 Sectioion 7 Architecture, Data Communication & Networking
21 pages
ECE/CS 752 Dynamic Scheduling (I) : Nam Sung Kim Electrical and Computer Engineering University of Wisconsin
No ratings yet
ECE/CS 752 Dynamic Scheduling (I) : Nam Sung Kim Electrical and Computer Engineering University of Wisconsin
47 pages
CE222_Slides2
No ratings yet
CE222_Slides2
37 pages
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
No ratings yet
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
54 pages
(eBook PDF) Computer Systems 5th Edition download pdf
No ratings yet
(eBook PDF) Computer Systems 5th Edition download pdf
45 pages
Tomasulo Example
No ratings yet
Tomasulo Example
22 pages
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I
No ratings yet
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I
74 pages
Embedded Systems ECT-401 Part-3 Embedded Computing Platform
No ratings yet
Embedded Systems ECT-401 Part-3 Embedded Computing Platform
24 pages
M (1) .Tech 2 Sem Syllabi
No ratings yet
M (1) .Tech 2 Sem Syllabi
16 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
COAU5
No ratings yet
COAU5
31 pages
Advanced OSPF & BGP
From Everand
Advanced OSPF & BGP
Ashlan Chidester
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Lecture Plan
No ratings yet
Lecture Plan
1 page
VL2020210505966 Da
No ratings yet
VL2020210505966 Da
2 pages
Selenium Commands
No ratings yet
Selenium Commands
51 pages
Python Note 1
No ratings yet
Python Note 1
8 pages
FIPS AIPS Manual
No ratings yet
FIPS AIPS Manual
18 pages
Absolute Java Chapter 7
No ratings yet
Absolute Java Chapter 7
7 pages
Target Sum - LeetCode
No ratings yet
Target Sum - LeetCode
1 page
Chapter 2: SQL Server Reporting Services: Objectives
No ratings yet
Chapter 2: SQL Server Reporting Services: Objectives
42 pages
Lesson 5 Layouts
No ratings yet
Lesson 5 Layouts
54 pages
Popup - Semantic UI React
No ratings yet
Popup - Semantic UI React
1 page
Sitecore Powershell Extensions
100% (1)
Sitecore Powershell Extensions
463 pages
Eegame Logcat
No ratings yet
Eegame Logcat
28 pages
Hash
No ratings yet
Hash
16 pages
Test Spotify
No ratings yet
Test Spotify
31 pages
First Latin American Regional Workshop On Distributed Laboratory Instrumentation in Physics
No ratings yet
First Latin American Regional Workshop On Distributed Laboratory Instrumentation in Physics
107 pages
DS Carr
No ratings yet
DS Carr
38 pages
CS3353 - Notes 2 Marks and 16 Marks
No ratings yet
CS3353 - Notes 2 Marks and 16 Marks
287 pages
Howtodoproject
No ratings yet
Howtodoproject
12 pages
White Box Testing Example: Consider The Below Simple Pseudocode
No ratings yet
White Box Testing Example: Consider The Below Simple Pseudocode
4 pages
Games 23
No ratings yet
Games 23
8 pages
Composite Datatypes: Types: PL/SQL Records PL/SQL Tables Contain Internal Components Are Reusable
No ratings yet
Composite Datatypes: Types: PL/SQL Records PL/SQL Tables Contain Internal Components Are Reusable
14 pages
Adobe Form Dynamic Variable
No ratings yet
Adobe Form Dynamic Variable
14 pages
Introduction To LS-DYNA MPP&Restart
No ratings yet
Introduction To LS-DYNA MPP&Restart
38 pages
Cloud Computing Lab Manual
No ratings yet
Cloud Computing Lab Manual
159 pages
Von Neumann Bottleneck: Throughput Central Processing Unit
No ratings yet
Von Neumann Bottleneck: Throughput Central Processing Unit
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
IS244 Unit 7
No ratings yet
IS244 Unit 7
23 pages
B0CT937R29
No ratings yet
B0CT937R29
934 pages
Passing Parameters in Functions: Output: Hello - Python None
No ratings yet
Passing Parameters in Functions: Output: Hello - Python None
126 pages