0% found this document useful (0 votes)

74 views42 pages

16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013

This document summarizes a lecture on pipelining in computer architecture. It discusses how pipelining works similarly to an assembly line to simultaneously execute multiple instructions. Pipelining improves throughput by allowing one instruction to complete per clock cycle, potentially achieving a speedup equal to the number of pipeline stages. The document outlines the five main stages in a MIPS pipeline and provides an example of how instructions would progress through each stage in successive clock cycles.

Uploaded by

Uap Orlandinho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views42 pages

16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013

Uploaded by

Uap Orlandinho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

16.482 / 16.

561
Computer Architecture
and Design
Instructor: Dr. Michael Geiger
Fall 2013

Lecture 5:
Pipelining
Lecture outline
 Announcements/reminders
 HW 4 to be posted; due 10/16
 Lecture next week on Wednesday, 10/16

 Review: Processor datapath and control

 Today’s lecture: Pipelining

02/23/18 Computer Architecture Lecture 5 2

Review: Simple MIPS Chooses PC+4

datapath or branch target

Chooses ALU
output or
memory output

Chooses register
or sign-extended
immediate

02/23/18 Computer Architecture Lecture 5 3

Datapath for R-type
instructions

EXAMPLE:
add $4, $10, $30
($4 = $10 + $30)

02/23/18 Computer Architecture Lecture 5 4

Datapath for I-type ALU
instructions

EXAMPLE:
addi $4, $10, 15
($4 = $10 + 15)

02/23/18 Computer Architecture Lecture 5 5

Datapath for beq (not taken)

EXAMPLE:
beq $1,$2,label
(branch to label if
$1 == $2)

02/23/18 Computer Architecture Lecture 5 6

Datapath for beq (taken)

EXAMPLE:
beq $1,$2,label
(branch to label if
$1 == $2)

02/23/18 Computer Architecture Lecture 5 7

Datapath for lw instruction

EXAMPLE:
lw $2, 10($3)
($2 = mem[$3 + 10])

02/23/18 Computer Architecture Lecture 5 8

Datapath for sw instruction

EXAMPLE:
sw $2, 10($3)
(mem[$3 + 10] = $2)

02/23/18 Computer Architecture Lecture 5 9

Motivating pipelining
 We’ve seen basic single-cycle datapath
 Offers 1 CPI ...
 ... but cycle time determined by longest instruction
 Load essentially uses all stages
 We’d like both low CPI and a short cycle
 Solution: pipelining
 Simultaneously execute multiple instructions
 Use multi-cycle “assembly line” approach

02/23/18 Computer Architecture Lecture 5 10

Pipelining is like …
 … doing laundry (no, really)
 Say 4 people (Ann, Brian, Cathy, Don) want
to use a laundry service that has four
components:
 Washer, which takes 30 minutes
 Dryer, which takes 30 minutes
 “Folder,” which takes 30 minutes
 “Storer,” which takes 30 minutes

02/23/18 Computer Architecture Lecture 5 11

Sequential laundry service

 Each person starts when previous one finishes

 4 loads take 8 hours

02/23/18 Computer Architecture Lecture 5 12

Pipelined laundry service

 As soon as a particular component is free, next

person can use it
 4 loads take 3 ½ hours

02/23/18 Computer Architecture Lecture 5 13

Pipelining questions
 Does pipelining improve latency or throughput?
 Throughput—time for each instruction same, but more
instructions per unit time
 What’s the maximum potential speedup of
pipelining?
 The number of stages, N—before, each instruction took N
cycles, now we can (theoretically) finish 1 instruction per
cycle
 If one stage can run faster, how does that affect the
speedup?
 No effect—cycle time depends on longest stage, because
you may be using hardware from all stages at once

02/23/18 Computer Architecture Lecture 5 14

Principles of pipelining
 Every instruction takes same number of steps
 Pipeline stages
 1 stage per cycle (like multi-cycle datapath)
 MIPS (like most simple processors) has 5 stages
 IF: Instruction fetch
 ID: Instruction decode and register read
 EX: Execution / address calculation
 MEM: Memory access
 WB: Write back result to register

02/23/18 Computer Architecture Lecture 5 15

Pipeline Performance
 Assume time for stages is
 100ps for register read or write
 200ps for other stages
 Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time

read access write
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

Computer Architecture
Lecture 5
02/23/18 16
Pipeline Performance
Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Computer Architecture
Lecture 5
02/23/18 17
Pipeline diagram
Cycle
1 2 3 4 5 6 7 8
lw IF ID EX MEM WB
add IF ID EX MEM WB
beq IF ID EX MEM WB
sw IF ID EX MEM WB

 Pipeline diagram shows execution of multiple instructions

 Instructions listed vertically
 Cycles shown horizontally
 Each instruction divided into stages
 Can see what instructions are in a particular stage at any cycle

02/23/18 Computer Architecture Lecture 5 18

Performance example
 Say we have the following code:

loop: add $t1, $t2, $t3

lw $t4, 0($t1)
beq $t4, $t3, end
sw $t3, 4($t1)
add $t2, $t2, 8
j loop
end: ...

 Assume each pipeline stage takes 4 ns

 How long would one loop iteration take in an ideal pipelined
processor (i.e., no delays between instructions)?

02/23/18 Computer Architecture Lecture 5 19

Solution
Cycle
1 2 3 4 5 6 7 8 9 10
add IF ID EX MEM WB

lw IF ID EX MEM WB

beq IF ID EX MEM WB

sw IF ID EX MEM WB

add IF ID EX MEM WB

j IF ID EX MEM WB

 Can draw pipelining diagram to show # cycles

 In ideal pipelining, with M instructions & N pipeline stages,
total time = N + (M-1)
 Here, M = 6, N = 5  5 + (6-1) = 10 cycles
 Total time = (10 cycles) * (4 ns/cycle) = 40 ns

02/23/18 Computer Architecture Lecture 5 20

Pipelined datapath principles

MEM

Right-to-left WB
flow leads to
hazards

02/23/18 Computer Architecture Lecture 5 21

Pipeline registers
 Need registers between stages for info from previous cycles
 Register must be able to hold all needed info for given stage
 For example, IF/ID must be 64 bits—32 bits for instruction, 32 bits for PC+4
 May need to propagate info through multiple stages for later use
 For example, destination reg. number determined in ID, but not used until WB

02/23/18 Computer Architecture Lecture 5 22

Pipeline hazards
 A hazard is a situation that prevents an
instruction from executing during its
designated clock cycle
 3 types:
 Structure hazards: two instructions attempt to
simultaneously use the same hardware
 Data hazards: instruction attempts to use data
before it’s ready
 Control hazards: attempt to make a decision
before condition is evaluated

02/23/18 Computer Architecture Lecture 5 23

Structure hazards
 Examples in MIPS pipeline
 May need to calculate addresses and perform operations
 need multiple adders + ALU
 May need to access memory for both instructions and data
 need instruction & data memories (caches)
 May need to read and write register file
 write in first half of cycle, read in second
Cycle
1 2 3 4 5 6 7 8
lw IF ID EX MEM WB
add IF ID EX MEM WB
beq IF ID EX MEM WB
sw IF ID EX MEM WB

02/23/18 Computer Architecture Lecture 5 24

Data Hazard Example
 Consider this sequence:
sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)
 Can’t use value of $2 until it’s actually
computed and stored
 No hazard for sw
 Register hardware takes care of add
 What about and, or?
02/23/18 Computer Architecture Lecture 5 25
Software solution: no-ops
 No-ops: instructions that do nothing
 Effectively “stalls” pipeline until data is ready
 Compiler can recognize hazards ahead of
time and insert nop instructions

Cycle Result written to reg file

1 2 3 4 5 6 7 8
sub IF ID EX MEM WB
nop IF ID EX MEM WB
nop IF ID EX MEM WB
and IF ID EX MEM WB

02/23/18 Computer Architecture Lecture 5 26

No-op example
 Given the following code, where are no-ops
needed?

add $t2, $t3, $t4

sub $t5, $t1, $t2
or $t6, $t2, $t7
slt $t8, $t9, $t5

02/23/18 Computer Architecture Lecture 5 27

Solution
 Given the following code, where are no-ops
needed?

add $t2, $t3, $t4 $t2 used by sub, or

nop
nop
sub $t5, $t1, $t2  $t5 used by slt
or $t6, $t2, $t7
nop  could also be before or
slt $t8, $t9, $t5

02/23/18 Computer Architecture Lecture 5 28

Avoiding stalls

 Inserting no-ops rarely best solution

 Complicates compiler
 Reduces performance
 Can we solve problem in hardware? (Hint: when do we know value of $2?)

02/23/18 Computer Architecture Lecture 5 29

Dependencies & Forwarding

 Value computed at end of EX stage

 Use pipeline registers to forward
 Add additional paths to ALU inputs from EX/MEM, MEM/WB

02/23/18 Computer Architecture Lecture 5 30

Load-Use Data Hazard

Need to stall
for one cycle

Chapter 4 — The
Processor — 31
How to Stall the Pipeline
 Force control values in ID/EX register
to 0
 EX, MEM and WB do nop (no-operation)
 Prevent update of PC and IF/ID register
 Using instruction is decoded again
 Following instruction is fetched again
 1-cycle stall allows MEM to read data for lw
 Can subsequently forward to EX stage

Chapter 4 — The
Processor — 32
Stall/Bubble in the Pipeline

Stall inserted
here

Chapter 4 — The
Processor — 33
Stall/Bubble in the Pipeline

Or, more
Chapter 4 — The accurately…
Processor — 34
Datapath with Hazard
Detection

Chapter 4 — The
Processor — 35
Code Scheduling to Avoid
Stalls
 Reorder code to avoid use of load result in

the next instruction

 C code for A = B + E; C = B + F;

lw $t1, 0($t0) lw $t1, 0($t0)

lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3, 12($t0) add $t3, $t1, $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

02/23/18 Computer Architecture Lecture 5 36

Control Hazards
 Branch determines flow of control
 Fetching next instruction depends on branch
outcome
 Pipeline can’t always fetch correct instruction
 Still working on ID stage of branch
 In MIPS pipeline
 Need to compare registers and compute target
early in the pipeline
 Add hardware to do it in ID stage

02/23/18 Computer Architecture Lecture 5 37

Stall on Branch
 Wait until branch outcome determined before
fetching next instruction

02/23/18 Computer Architecture Lecture 5 38

Branch Prediction
 Longer pipelines can’t readily determine
branch outcome early
 Stall penalty becomes unacceptable
 Predict outcome of branch
 Only stall if prediction is wrong
 In MIPS pipeline
 Can predict branches not taken
 Fetch instruction after branch, with no delay

Computer Architecture
Lecture 5
02/23/18 39
MIPS with Predict Not Taken

Prediction
correct

Prediction
incorrect

02/23/18 Computer Architecture Lecture 5 40

More-Realistic Branch
Prediction
 Static branch prediction
 Based on typical branch behavior
 Example: loop and if-statement branches
 Predict backward branches taken
 Predict forward branches not taken
 Dynamic branch prediction
 Hardware measures actual branch behavior
 e.g., record recent history of each branch
 Assume future behavior will continue the trend
 When wrong, stall while re-fetching, and update history

02/23/18 Computer Architecture Lecture 5 41

Final notes
 Next time:
 Instruction scheduling issues
 Dynamic branch prediction
 Dynamic scheduling
 Multiple issue
 Midterm exam preview
 Announcements/reminders
 HW 4 to be posted; due 10/16
 Lecture next week on Wednesday, 10/16

02/23/18 Computer Architecture Lecture 5 42

Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
4.4 Pipelining
No ratings yet
4.4 Pipelining
39 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
27 pages
EECS 252 Graduate Computer Architecture Lec 3 - Performance + Pipeline Review
No ratings yet
EECS 252 Graduate Computer Architecture Lec 3 - Performance + Pipeline Review
48 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
MIPS Pipeline Stages & Hazards
No ratings yet
MIPS Pipeline Stages & Hazards
84 pages
MIPS Pipeline Performance Guide
No ratings yet
MIPS Pipeline Performance Guide
20 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Pipelining
No ratings yet
Pipelining
32 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
4 29 03 ImplementingMIPS 0429
No ratings yet
4 29 03 ImplementingMIPS 0429
45 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
50 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
No ratings yet
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
69 pages
Phy 108
No ratings yet
Phy 108
24 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
06 Pipeline PDF
No ratings yet
06 Pipeline PDF
17 pages
cs152 Notes
No ratings yet
cs152 Notes
34 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
60 pages
MIPS
No ratings yet
MIPS
70 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
MIPS Pipelining and Hazards
0% (1)
MIPS Pipelining and Hazards
38 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Ca Lecture 9
No ratings yet
Ca Lecture 9
26 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
Lecture 32 Pipelined Execution Structural and Data Hazards
No ratings yet
Lecture 32 Pipelined Execution Structural and Data Hazards
30 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
MIPS Processor Architecture Guide
No ratings yet
MIPS Processor Architecture Guide
51 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Chapter4 Part1
No ratings yet
Chapter4 Part1
51 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Pipe 1 New
No ratings yet
Pipe 1 New
64 pages
Unit 7 Module 31 Studying and Building Memories Powerpoint
No ratings yet
Unit 7 Module 31 Studying and Building Memories Powerpoint
24 pages
Operating System Short
No ratings yet
Operating System Short
17 pages
Letchworth Garden City
No ratings yet
Letchworth Garden City
4 pages
Intro to Microcomputer Design
No ratings yet
Intro to Microcomputer Design
13 pages
Confederate Veteran - Vol. 4 - (1896)
100% (1)
Confederate Veteran - Vol. 4 - (1896)
494 pages
Preparing For Own Death Personal Checklist
No ratings yet
Preparing For Own Death Personal Checklist
4 pages
Part 2 Na May Page
No ratings yet
Part 2 Na May Page
8 pages
Dark Tourism Concepts Typologies and Sites
No ratings yet
Dark Tourism Concepts Typologies and Sites
6 pages
Almont Teacher Aids Student After Fire
No ratings yet
Almont Teacher Aids Student After Fire
24 pages
Flower Festival India
No ratings yet
Flower Festival India
9 pages
Deliberation of Honor Pupils Iv Ricarte
100% (1)
Deliberation of Honor Pupils Iv Ricarte
4 pages
Microprocessors Mid Exam
75% (4)
Microprocessors Mid Exam
3 pages
MostWired 2015 Lists
No ratings yet
MostWired 2015 Lists
6 pages
Native American Tribes & Historic Sites
No ratings yet
Native American Tribes & Historic Sites
1 page
I/O Operations and Addressing
No ratings yet
I/O Operations and Addressing
10 pages
Serbia Remembers 100 Years of WWI
100% (2)
Serbia Remembers 100 Years of WWI
27 pages
Lo Presti - Elegy For A Young American
No ratings yet
Lo Presti - Elegy For A Young American
7 pages
National Heroes Acre
No ratings yet
National Heroes Acre
5 pages
Title: Samman Diwas - A Tribute To Unsung Heroes Narrator
No ratings yet
Title: Samman Diwas - A Tribute To Unsung Heroes Narrator
3 pages
Apmc National Internship Program Primer
100% (2)
Apmc National Internship Program Primer
10 pages
Biogradska Gora National Park
No ratings yet
Biogradska Gora National Park
1 page
A Distinguished Warrior's Burial From Dmanisi
No ratings yet
A Distinguished Warrior's Burial From Dmanisi
2 pages
Brookfield Korean War Veteran, William C. Knight, Home at Last
No ratings yet
Brookfield Korean War Veteran, William C. Knight, Home at Last
4 pages
WELLON Programmer User Manual
No ratings yet
WELLON Programmer User Manual
49 pages
Michael Jackson Memorial Program
100% (64)
Michael Jackson Memorial Program
16 pages
D&D 3.5 Chase Rules Guide
No ratings yet
D&D 3.5 Chase Rules Guide
18 pages
Amer Filesys
No ratings yet
Amer Filesys
57 pages
Cemetery Population Investigation
No ratings yet
Cemetery Population Investigation
5 pages
Answers
No ratings yet
Answers
10 pages

16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013

Uploaded by

16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013

Uploaded by

16.482 / 16.

 Review: Processor datapath and control

02/23/18 Computer Architecture Lecture 5 2

datapath or branch target

02/23/18 Computer Architecture Lecture 5 3

02/23/18 Computer Architecture Lecture 5 4

02/23/18 Computer Architecture Lecture 5 5

02/23/18 Computer Architecture Lecture 5 6

02/23/18 Computer Architecture Lecture 5 7

02/23/18 Computer Architecture Lecture 5 8

02/23/18 Computer Architecture Lecture 5 9

02/23/18 Computer Architecture Lecture 5 10

02/23/18 Computer Architecture Lecture 5 11

 Each person starts when previous one finishes

02/23/18 Computer Architecture Lecture 5 12

 As soon as a particular component is free, next

02/23/18 Computer Architecture Lecture 5 13

02/23/18 Computer Architecture Lecture 5 14

02/23/18 Computer Architecture Lecture 5 15

Instr Instr fetch Register ALU op Memory Register Total time

Pipelined (Tc= 200ps)

 Pipeline diagram shows execution of multiple instructions

02/23/18 Computer Architecture Lecture 5 18

loop: add $t1, $t2, $t3

 Assume each pipeline stage takes 4 ns

02/23/18 Computer Architecture Lecture 5 19

 Can draw pipelining diagram to show # cycles

02/23/18 Computer Architecture Lecture 5 20

02/23/18 Computer Architecture Lecture 5 21

02/23/18 Computer Architecture Lecture 5 22

02/23/18 Computer Architecture Lecture 5 23

02/23/18 Computer Architecture Lecture 5 24

Cycle Result written to reg file

02/23/18 Computer Architecture Lecture 5 26

add $t2, $t3, $t4

02/23/18 Computer Architecture Lecture 5 27

add $t2, $t3, $t4 $t2 used by sub, or

02/23/18 Computer Architecture Lecture 5 28

 Inserting no-ops rarely best solution

02/23/18 Computer Architecture Lecture 5 29

 Value computed at end of EX stage

02/23/18 Computer Architecture Lecture 5 30

the next instruction

lw $t1, 0($t0) lw $t1, 0($t0)

02/23/18 Computer Architecture Lecture 5 36

02/23/18 Computer Architecture Lecture 5 37

02/23/18 Computer Architecture Lecture 5 38

02/23/18 Computer Architecture Lecture 5 40

02/23/18 Computer Architecture Lecture 5 41

02/23/18 Computer Architecture Lecture 5 42

You might also like