0% found this document useful (0 votes)
164 views

Lecture-11 Dynamic Scheduling A

The document discusses dynamic scheduling, which allows hardware to rearrange instruction execution to reduce stalls while maintaining data flow and exception behavior. It has three main advantages: 1) code can run efficiently on different pipelines without recompilation, 2) it handles dependencies not known at compile time, and 3) it efficiently schedules instructions against unpredictable delays like cache misses.

Uploaded by

Yumna Shahzad
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views

Lecture-11 Dynamic Scheduling A

The document discusses dynamic scheduling, which allows hardware to rearrange instruction execution to reduce stalls while maintaining data flow and exception behavior. It has three main advantages: 1) code can run efficiently on different pipelines without recompilation, 2) it handles dependencies not known at compile time, and 3) it efficiently schedules instructions against unpredictable delays like cache misses.

Uploaded by

Yumna Shahzad
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Dynamic Scheduling

1
Summary of contents covered
• Introduction to RISC-V processor
• Assembly and Machine language of RISC-V
• RISC-V Single Cycle Implementation
• Pipelining concepts and Hazards
• Pipelined RISC-V Implementation
• Cache memory design

2
Forwarding and stalls in Pipeline
• In pipelined processor, in case of data hazards, pipelined
is stalled if data dependence is not resolved by bypass.

• Instruction is waiting in decode phase, until the data


dependency is resolved.

• All instructions are executed in order. All instruction after


the waited instruction also keep on waiting.

• Compiler or assembly language programmer is


responsible for code re-ordering to minimize these stalls.
3
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce


stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

1. Code compiled on one pipeline, runs efficiently on other


pipelined architectures.
• No need to compile binary for each different pipeline
• Third party software are distributed as binary files

4
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce


stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

2. Efficiently handles the dependencies that are not known


at compile time.
• Such as memory dependencies

5
Benefits of Dynamic scheduling

• Hardware re-orders the instruction execution to reduce


stalls, while maintaining data flow.

• Dynamic scheduling has three advantages

3. Efficiently schedules instructions against unpredictable


delays.
• Such as cache misses

6
Instruction Level Parallelism
(Dynamic Scheduling & Tomasulo Algorithm)
Advantages of Dynamic Scheduling
• Dynamic scheduling - hardware rearranges the
instruction execution to reduce stalls while maintaining
data flow and exception behavior
• Handles cases when dependences unknown at compile
time
– it allows the processor to tolerate unpredictable
delays such as cache misses, by executing other
code while waiting for the miss to resolve
• Allows code that compiled for one pipeline to run
efficiently on a different pipeline
• Simplifies the compiler
• Leads to hardware speculation, a technique with
significant performance advantages (discuss later)

CMSC 411 - 8 (from Patterson) 8


HW Schemes: Instruction Parallelism
• Key idea: Allow instructions behind stall to proceed
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
• Enables out-of-order execution and allows out-of-order
completion (e.g., SUBD)
– In a dynamically scheduled pipeline, all instructions still
pass through issue stage in order (in-order issue)
• Will distinguish when an instruction begins execution and
when it completes execution; between 2 times, the
instruction is in execution
• Note: Dynamic execution creates WAR and WAW
hazards and makes handling exceptions harder

CMSC 411 - 8 (from Patterson) 9


Dynamic Scheduling Step 1
• Simple pipeline had 1 stage to check both structural
and data hazards: Instruction Decode (ID), also
called Instruction Issue
• Split the ID pipe stage of simple 5-stage pipeline into
2 stages:
– Issue
» Decode instructions, check for structural hazards
– Read operands
» Wait until no data hazards, then read operands

CMSC 411 - 8 (from Patterson) 10


A Dynamic Algorithm: Tomasulo’s
• For IBM 360/91 (before caches!)
  Long memory latency
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360)
prevented interesting compiler scheduling of
operations
– This led Tomasulo to try to figure out how to get
more effective registers — renaming in hardware!

• Why study a 1966 computer?


– The descendants of this have flourished!
» Alpha 21264, Pentium 4, AMD Opteron, Power 5, …

CMSC 411 - 8 (from Patterson) 11


Tomasulo Algorithm
• Control & buffers distributed with Function Units (FU)
– FU buffers called “reservation stations”; have
pending operands
• Registers in instructions replaced by values or
pointers to reservation stations (RS); called register
renaming;
– Renaming avoids WAR, WAW hazards
– More reservation stations than registers, so can
do optimizations compilers can’t

CMSC 411 - 8 (from Patterson) 12


Tomasulo Algorithm (cont.)
• Results to FU from RS, not through registers, over
Common Data Bus that broadcasts results to all FUs
– Avoids RAW hazards by executing an instruction
only when its operands are available
• Load and Stores treated as FUs with RSs as well
• Integer instructions can go past branches (use
branch prediction), also allow FP ops beyond basic
block in FP queue

CMSC 411 - 8 (from Patterson) 13


Tomasulo Organization From H&P
Figure 2.9

From Mem FP Op FP Registers


Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5 Store
Load6
Buffers

Add1
Add2 Mult1
Add3 Mult2

Reservation To Mem
Stations
FP
FP adders
adders FP
FP multipliers
multipliers

Common Data Bus (CDB)


CMSC 411 - 10 (from Patterson) 14
Reservation Station Components
• Op: Operation to perform in the unit (e.g., + or –)
• Vj, Vk: Value of Source operands
– Store buffers have V field, result to be stored
• Qj, Qk: Reservation stations producing source
registers (value to be written)
– Note: Qj,Qk=0 => ready
– Store buffers only have Qi for RS producing result
• Busy: Indicates reservation station or FU is busy

In addition
• Register result status table—Indicates which
functional unit will write each register, if one exists.
Blank when no pending instructions that will write that
register.
CMSC 411 - 10 (from Patterson) 15
Three Stages of Tomasulo
Algorithm
1. Issue—get instruction from FP Op Queue
– If reservation station free (no structural hazard),
control issues instr & sends operands (renames
registers).
2. Execute—operate on operands (EX)
– When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)
– Write on Common Data Bus to all awaiting units;

mark reservation station available

CMSC 411 - 10 (from Patterson) 16


Common Data Bus
• Normal data bus: data + destination (“go to” bus)
• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source
address
– Write if matches expected Functional Unit
(produces result)
– Does the broadcast

CMSC 411 - 10 (from Patterson) 17


Tomasulo Example
Instruction stream
Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2 3 Load/Buffers
Latencies:
Reservation Stations: S1 S2 RS RS LD 1
Time Name Busy Op Vj Vk Qj Qk ADD 2
Add1 No MULT 10
FU count Add2 No DIV 40
Add3 No
down 3 FP Adder R.S.
Mult1 No
Mult2 No 2 FP Mult R.S.
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU

Clock cycle
counter

CMSC 411 - 10 (from Patterson) 18

You might also like