0% found this document useful (0 votes)

396 views71 pages

Pipelining for Enhanced CPU Performance

The document discusses pipelining in CPU design to improve performance. It describes how pipelining works like an assembly line with different stages. A 5-stage MIPS pipeline is presented with stages for instruction fetch, decode, execute, memory access, and write back. It explains how instructions flow through the pipeline concurrently and how hazards like data dependencies and control hazards can occur. Forwarding and stalling techniques are covered to resolve hazards caused by instructions dependencies.

Uploaded by

api-26072581

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

396 views71 pages

Pipelining for Enhanced CPU Performance

Uploaded by

api-26072581

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

CSE 313

Enhancing Performance
with Pipelining
Pipeline motivation
 Need both low CPI and high frequency for
best performance
 Want a multicycle for high frequency, but need better
CPI

 Idea behind pipelining is to have a

multicycle implementation that operates
like a factory assembly line

 Each “worker” in the pipeline performs a

particular task, hands off to the next
“worker”, while getting new work
Pipeline motivation
 Tasks should take about the same time – if
one “worker” is much slower than the rest,
then other “workers” will stand idle

 Once the assembly line is full, a new

“product” (instruction) comes out of the
back-end of the line each time period

 In a computer assembly line (pipeline),

each task is called a stage and the time
period is one clock cycle
MIPS 5-stage pipeline

 Like single cycle datapath but with

registers separating each stage
MIPS 5-stage pipeline
 5 stages for each instruction
 IF:instruction fetch
 ID: instruction decode and register file read
 EX: instruction execution or effective address calculation
 MEM: memory access for load and store
 WB: write back results to register file

 Delays of all 5 stages are relatively the same

 Staging registers are used to hold data and
control as instructions pass between stages
 All instructions pass through all 5 stages
 As an instruction leaves a stage in a particular
clock period, the next instruction enters it
Pipeline operation for lw
 Stage 1: Instruction fetch
Pipeline operation for lw
 Stage 2: Instruction decode and register file
read

 What happens to the instruction info in IF/ID?

Pipeline operation for lw
 Stage 3: Effective address calculation
Pipeline operation for lw
 Stage 4: Memory access
Pipeline operation for lw
 Stage 5: Write back

 Instruction info in IF/ID is gone – won’t work

Modified pipeline with write back
fix
 Write register bits from the instruction
must be carried through the pipeline with
the instruction
Pipeline operation for lw
 Pipeline usage in each stage for lw
Pipeline operation for sw
 Stage 3: Effective address calculation
Pipeline operation for sw
 Stage 4: Memory access
Pipeline operation for sw
 Stage 5: Write back (nothing)
Pipeline operation for lw, sub
sequence
Pipeline operation for lw, sub
sequence
Pipeline operation for lw, sub
sequence
Pipeline operation for lw, sub
sequence
Pipeline operation for lw, sub
sequence
Pipeline operation for lw, sub
sequence
Graphical pipeline representation
 Represent overlap of pipelined instructions
as multiple pipelines skewed by a cycle
Another useful shorthand form
Pipeline control
 Basic pipeline control is similar to the
single cycle implementation
Pipeline control
 Control for an instruction is generated in
ID and travels with the instruction and
data through the pipeline
 When an instruction enters a stage, it’s
control signals set the operation of that
stage
Pipeline control
Multiple instruction example
 For the following code fragment
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7
add $14, $8, $9

show the datapath and control usage as the

instruction sequence travels down the
pipeline
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
Multiple instruction example
How the MIPS ISA simplifies
pipelining
 Fixed length instruction simplifies
 Fetch – just get the next 32 bits
 Decode – single step; don’t have to decode opcode
before figuring out where to get the rest of the fields
 Source register fields always in same
location
 Can read source registers during decode
 Load/store architecture
 ALU can be used for both arithmetic and EA calculation
 Memory instruction require about same amount of work
as arithmetic ones, easing pipelining of the two together
 Memory data must be aligned
 Read or write accesses can be done in one cycle
Pipeline hazards
 A hazard is a conflict, regarding data, control,
or hardware resources

 Data hazards are conflicts for register values

 Control hazards occur due to the delay to

execute branch and jump instruction

 Structural hazards are conflicts for hardware

resources, such as
A single memory for instructions and data
 A multi-cycle, non-pipelined functional unit (such as a
divider)
Data dependences
 A read after write (RAW) dependence occurs
when the register written by an instruction is a
source register of a subsequent instruction

lw $10, 20($1)

sub $11, $10, $3

and $12, $4, $11

or $13, $11, $4

add $14, $13, $9

 Also have write after read (WAR) and write

after write (WAW) data dependences (later)
Pipelining and RAW dependences
 RAW dependences that are close by may
cause data hazards in the pipeline

 Consider the following code sequence:

sub $2, $1, $3
and $12, $2, $6
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)

 What are the RAW dependences?

Pipelining and RAW dependences
 Data hazards with first three instructions

hazard

ok
Forwarding
 Most RAW hazards can be eliminated by
forwarding results between pipe stages

at this point, result of sub is available

Detecting forwarding
 Rd of the instruction in MEM or WB must
match Rs and/or Rt of the instruction in EX

 The instruction in MEM or WB must have

RegWrite=1 (why?)

 Rd must not be $0 (why?)

Detecting forwarding from MEM to
EX
 To the upper ALU input (ALUupper)
 EX/[Link] =1
 EX/[Link] not equal 0
 EX/[Link] = ID/[Link]

 To the lower ALU input (ALUlower)

 EX/[Link] =1
 EX/[Link] not equal 0
 EX/[Link] = ID/[Link]
Forwarding datapaths
 Bypass paths feed data from MEM and WB back
to MUXes at the EX ALU inputs

 Do we still have to write the register file in WB?

Detecting forwarding from WB to
EX
 To the upper ALU input
 MEM/[Link] =1
 MEM/[Link] not equal 0
 MEM/[Link] = ID/[Link]
 The value is not being forwarded from MEM (why?)

 To the lower ALU input

 MEM/[Link] =1
 MEM/[Link] not equal 0
 MEM/[Link] = ID/[Link]
 The value is not being forwarded from MEM
Forwarding control
 Control is handled by the forwarding unit
Forwarding example
 Show forwarding for the code sequence:
sub $2, $1, $3

and $4, $2, $5

or $4, $4, $2

add $9, $4, $2

Forwarding example
 sub produces result in EX
Forwarding example
 sub forwards result from MEM to ALUupper
Forwarding example
 sub forwards result from WB to ALUlower
 and forwards result from MEM to ALUupper
Forwarding example
 or forwards result from MEM to ALUupper
RAW hazards involving loads
 Loads produce results in MEM – can’t forward
to an immediately following R-type instruction

 Called a load-use hazard

RAW hazards involving loads
 Solution: stall the stages behind the load
for one cycle, after which the result can be
forwarded
Detecting load-use hazards
 Instruction in EX is a load
 ID/[Link] =1

 Instruction in ID has a source register that

matches the load destination register
 ID/[Link]= IF/[Link] OR
ID/[Link] = IF/[Link]
Stalling the stages behind the
load
 Force nop (“no operation”) instruction into
EX stage on next clock cycle
 Force ID/[Link] input to zero
 Force ID/[Link] input to zero

 Hold instructions in ID and IF stages for

one clock cycle
 Hold the contents of PC
 Hold the contents of IF/ID
Control for load-use hazards
 Control is handled by the hazard detection
unit
Load-use stall example
 Code sequence:
lw $2, 20($1)

and $4, $2, $5

or $4, $4, $2

add $9, $4, $2

Load-use stall example
 lw enters ID
Load-use stall example
 Load-use hazard detected
Load-use stall example
 Force nop into EX and hold ID and IF stages
Load-use stall example
 lw result in WB forwarded to and in EX
 or reads operand $2 from register file
Load-use stall example
 Pipeline advances normally
Control hazards
 Taken branches and jumps change the PC
to the target address from which the next
instruction is to be fetched

 In our pipeline, the PC is changed when the

taken beq instruction is in the MEM stage

 This creates a control hazard in which

sequential instructions in earlier stages
must be discarded
beq instruction that is taken
instri+3 instri+2 instri+1 beq $2,$3,7

 instri+1 , instri+2 , instri+3 must be discarded

beq instruction that is taken

 In this example, the branch delay is three

Reducing the branch delay
 Reducing the branch delay reduces the
number of instructions that have to be
discarded on a taken branch

 We can reduce the branch delay to one for

beq by moving both the equality test and
the branch target address calculation into
ID

 We need to insert a nop between the beq

and the correctly fetched instruction
Reducing the branch delay
beq with one branch delay
 Register equality test done in ID by a
exclusive ORing the register values and
NORing the result

 Instruction in ID forced to nop by zeroing

the IF/ID register

 Next fetched instruction will be from PC+4

or branch target depending on the beq
outcome
beq with one branch delay
 beq in ID; next sequential instruction (and)
in IF
beq with one branch delay
 bubble in ID; lw (from taken address) in IF

Pipelining in CPU Architecture
No ratings yet
Pipelining in CPU Architecture
56 pages
Pipelining Techniques in MIPS Architecture
No ratings yet
Pipelining Techniques in MIPS Architecture
59 pages
Forwarding Paths in Pipelined Datapath
No ratings yet
Forwarding Paths in Pipelined Datapath
11 pages
Understanding Pipeline Hazards in CPUs
No ratings yet
Understanding Pipeline Hazards in CPUs
31 pages
MIPS Pipeline: Hazards and Solutions
No ratings yet
MIPS Pipeline: Hazards and Solutions
61 pages
MIPS Pipelining Performance Insights
No ratings yet
MIPS Pipelining Performance Insights
54 pages
Instruction Set Principles & Hazards
No ratings yet
Instruction Set Principles & Hazards
13 pages
Pipelined Data-Path in MIPS Architecture
No ratings yet
Pipelined Data-Path in MIPS Architecture
31 pages
Pipelining in MIPS Architecture Explained
No ratings yet
Pipelining in MIPS Architecture Explained
37 pages
Pipelining Concepts and Issues Explained
No ratings yet
Pipelining Concepts and Issues Explained
36 pages
Data Hazards in CPU Architecture
No ratings yet
Data Hazards in CPU Architecture
50 pages
Understanding Pipelining in Computer Architecture
No ratings yet
Understanding Pipelining in Computer Architecture
89 pages
Understanding Pipeline Hazards
No ratings yet
Understanding Pipeline Hazards
49 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Lec12 Pipeline 2 Notes
No ratings yet
Lec12 Pipeline 2 Notes
58 pages
Pipelining in MIPS Architecture Explained
No ratings yet
Pipelining in MIPS Architecture Explained
85 pages
MIPS Optimization Techniques Overview
No ratings yet
MIPS Optimization Techniques Overview
70 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
77 pages
Understanding Pipelining Hazards and Solutions
No ratings yet
Understanding Pipelining Hazards and Solutions
29 pages
Pipelining Techniques in Modern Processors
No ratings yet
Pipelining Techniques in Modern Processors
22 pages
Pipelining Hazards in Processor Design
No ratings yet
Pipelining Hazards in Processor Design
51 pages
Operand Forwarding in Pipelining
No ratings yet
Operand Forwarding in Pipelining
34 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Pipelined Datapath in MIPS Architecture
No ratings yet
Pipelined Datapath in MIPS Architecture
26 pages
Pipeline Execution and Hazard Management
No ratings yet
Pipeline Execution and Hazard Management
7 pages
Processor Pipeline Overview and Hazards
No ratings yet
Processor Pipeline Overview and Hazards
38 pages
MIPS Pipeline Architecture Overview
No ratings yet
MIPS Pipeline Architecture Overview
84 pages
MIPS Pipeline Hazards and Datapath Control
No ratings yet
MIPS Pipeline Hazards and Datapath Control
10 pages
Understanding Pipelining in Processors
No ratings yet
Understanding Pipelining in Processors
61 pages
Understanding Pipelining in Processors
No ratings yet
Understanding Pipelining in Processors
61 pages
Understanding Pipelining in MIPS Architecture
No ratings yet
Understanding Pipelining in MIPS Architecture
36 pages
Instruction Level Parallelism in Pipelines
No ratings yet
Instruction Level Parallelism in Pipelines
16 pages
Pipeline Processing and Hazards Explained
No ratings yet
Pipeline Processing and Hazards Explained
35 pages
Understanding Pipeline Processing in Computing
No ratings yet
Understanding Pipeline Processing in Computing
18 pages
Pipelining and Data Hazards in Processors
No ratings yet
Pipelining and Data Hazards in Processors
27 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
21 pages
32-Bit MIPS Processor Design Guide
No ratings yet
32-Bit MIPS Processor Design Guide
23 pages
Forwarding in Pipelined Processors
No ratings yet
Forwarding in Pipelined Processors
37 pages
Pipelining and ILP in Computer Architecture
No ratings yet
Pipelining and ILP in Computer Architecture
5 pages
Forwarding Techniques in Pipelined Datapaths
No ratings yet
Forwarding Techniques in Pipelined Datapaths
35 pages
Pipelining Data Hazards: Stalls & Forwarding
No ratings yet
Pipelining Data Hazards: Stalls & Forwarding
27 pages
Pipelining Hazards in MIPS Architecture
No ratings yet
Pipelining Hazards in MIPS Architecture
37 pages
Understanding Pipelining and Hazards
No ratings yet
Understanding Pipelining and Hazards
39 pages
Understanding Pipeline Hazards in MIPS
No ratings yet
Understanding Pipeline Hazards in MIPS
29 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
49 pages
Pipelined Processor Microarchitecture Analysis
No ratings yet
Pipelined Processor Microarchitecture Analysis
52 pages
Understanding Pipeline Hazards in CPUs
No ratings yet
Understanding Pipeline Hazards in CPUs
28 pages
MIPS Pipelining and Performance Analysis
No ratings yet
MIPS Pipelining and Performance Analysis
109 pages
MIPS Pipelining: Stages & Hazards Explained
No ratings yet
MIPS Pipelining: Stages & Hazards Explained
7 pages
Pipelining Techniques in Computer Architecture
No ratings yet
Pipelining Techniques in Computer Architecture
25 pages
MIPS Pipelining and Hazard Solutions
No ratings yet
MIPS Pipelining and Hazard Solutions
48 pages
Pipelining and Instruction Parallelism
No ratings yet
Pipelining and Instruction Parallelism
20 pages
Pipelining in MIPS Architecture
No ratings yet
Pipelining in MIPS Architecture
7 pages
Pipelining Stages in Computer Architecture
No ratings yet
Pipelining Stages in Computer Architecture
44 pages
Pipelining and Datapath Concepts
No ratings yet
Pipelining and Datapath Concepts
64 pages
UIU Overview by Pro-Vice Chancellor Rahman
No ratings yet
UIU Overview by Pro-Vice Chancellor Rahman
41 pages
MIPS Decision-Making Instructions Guide
No ratings yet
MIPS Decision-Making Instructions Guide
26 pages
MIPS Instruction Set Overview
No ratings yet
MIPS Instruction Set Overview
27 pages
Evaluating Computer Architecture Performance
No ratings yet
Evaluating Computer Architecture Performance
42 pages
Computer Architecture: Instruction Set Overview
No ratings yet
Computer Architecture: Instruction Set Overview
17 pages
Computer Architecture Overview and Grading
No ratings yet
Computer Architecture Overview and Grading
17 pages
Cache Memory in Computer Architecture
No ratings yet
Cache Memory in Computer Architecture
48 pages
Control Unit Design for 4-Bit CPU
No ratings yet
Control Unit Design for 4-Bit CPU
101 pages
Lenovo Legion 5 15ITH6 Spec
No ratings yet
Lenovo Legion 5 15ITH6 Spec
7 pages
Memory System Design and Expansion
No ratings yet
Memory System Design and Expansion
21 pages
SCADA Interface for Blue'Log XM/XC
No ratings yet
SCADA Interface for Blue'Log XM/XC
2 pages
Timeweaver: A Tool For Hybrid Worst-Case Execution Time Analysis
No ratings yet
Timeweaver: A Tool For Hybrid Worst-Case Execution Time Analysis
11 pages
Diagnosing and Troubleshooting Systems
100% (7)
Diagnosing and Troubleshooting Systems
54 pages
Automatic Room Light Controller Project
No ratings yet
Automatic Room Light Controller Project
22 pages
English Admission Test Setup Guide & Instructions
No ratings yet
English Admission Test Setup Guide & Instructions
19 pages
PDF Computer Science First Book
No ratings yet
PDF Computer Science First Book
203 pages
Elite 330U/334U/335U Chassis Overview
No ratings yet
Elite 330U/334U/335U Chassis Overview
3 pages
IBM Storwize V7000 For Lenovo: Product Guide
No ratings yet
IBM Storwize V7000 For Lenovo: Product Guide
33 pages
Lovato RGK Series Controller Overview
100% (1)
Lovato RGK Series Controller Overview
78 pages
Grade 10 Computer Systems Servicing Test
100% (7)
Grade 10 Computer Systems Servicing Test
2 pages
Cable and Processor Pinouts Guide
No ratings yet
Cable and Processor Pinouts Guide
12 pages
Essential Computer Cleaning Tips Guide
No ratings yet
Essential Computer Cleaning Tips Guide
2 pages
Tire Pressure Monitor Sensor Product Specification: MPXY8300 Series
No ratings yet
Tire Pressure Monitor Sensor Product Specification: MPXY8300 Series
165 pages
Evolution of Video Graphics Cards
No ratings yet
Evolution of Video Graphics Cards
7 pages
S87XX G650 MCC1 SCC1 CM5
No ratings yet
S87XX G650 MCC1 SCC1 CM5
19 pages
N56 MLB Component Specifications
100% (1)
N56 MLB Component Specifications
55 pages
ICF Catalog Management Insights
No ratings yet
ICF Catalog Management Insights
2 pages
Tandy 3000 MS-DOS Reference Guide
No ratings yet
Tandy 3000 MS-DOS Reference Guide
32 pages
Authorization Letter for Claiming Funds
No ratings yet
Authorization Letter for Claiming Funds
25 pages
Understanding Interrupts in Operating Systems
No ratings yet
Understanding Interrupts in Operating Systems
15 pages
WG1608-M2 5G CPE Product Overview
No ratings yet
WG1608-M2 5G CPE Product Overview
12 pages
Video Controller VX4 Series User Manual-V1.0.0
No ratings yet
Video Controller VX4 Series User Manual-V1.0.0
30 pages
Intel 8087 Math Coprocessor Overview
No ratings yet
Intel 8087 Math Coprocessor Overview
6 pages
Overview of Information Systems Theory
No ratings yet
Overview of Information Systems Theory
20 pages
Z68A-GD80 Overclocking Guide
No ratings yet
Z68A-GD80 Overclocking Guide
13 pages
Desktop Support Technician Role Overview
No ratings yet
Desktop Support Technician Role Overview
3 pages
Preços de Placas de Vídeo 2025
No ratings yet
Preços de Placas de Vídeo 2025
58 pages
Secure USB Data Blocker Adapter
No ratings yet
Secure USB Data Blocker Adapter
4 pages

Pipelining for Enhanced CPU Performance

Uploaded by

Pipelining for Enhanced CPU Performance

Uploaded by

CSE 313

 Idea behind pipelining is to have a

 Each “worker” in the pipeline performs a

 Once the assembly line is full, a new

 In a computer assembly line (pipeline),

 Like single cycle datapath but with

 Delays of all 5 stages are relatively the same

 What happens to the instruction info in IF/ID?

 Instruction info in IF/ID is gone – won’t work

show the datapath and control usage as the

 Data hazards are conflicts for register values

 Control hazards occur due to the delay to

 Structural hazards are conflicts for hardware

sub $11, $10, $3

and $12, $4, $11

add $14, $13, $9

 Also have write after read (WAR) and write

 Consider the following code sequence:

 What are the RAW dependences?

at this point, result of sub is available

 The instruction in MEM or WB must have

 Rd must not be $0 (why?)

 To the lower ALU input (ALUlower)

 Do we still have to write the register file in WB?

 To the lower ALU input

and $4, $2, $5

add $9, $4, $2

 Called a load-use hazard

 Instruction in ID has a source register that

 Hold instructions in ID and IF stages for

and $4, $2, $5

add $9, $4, $2

 In our pipeline, the PC is changed when the

 This creates a control hazard in which

 instri+1 , instri+2 , instri+3 must be discarded

 In this example, the branch delay is three

 We can reduce the branch delay to one for

 We need to insert a nop between the beq

 Instruction in ID forced to nop by zeroing

 Next fetched instruction will be from PC+4

You might also like