0% found this document useful (0 votes)
45 views

Lecture17 PDF

The document discusses pipelining in microprocessors. It describes how an instruction can be broken down into stages - fetch, decode, execute, memory, and writeback. Pipelining allows overlapping the execution of multiple instructions by having different instructions in different stages at the same time. This improves performance by completing one instruction per clock cycle on average, compared to multiple cycles for each instruction in a single-cycle processor. The document outlines the operations that occur in each of the pipeline stages.

Uploaded by

Timothy Eng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Lecture17 PDF

The document discusses pipelining in microprocessors. It describes how an instruction can be broken down into stages - fetch, decode, execute, memory, and writeback. Pipelining allows overlapping the execution of multiple instructions by having different instructions in different stages at the same time. This improves performance by completing one instruction per clock cycle on average, compared to multiple cycles for each instruction in a single-cycle processor. The document outlines the operations that occur in each of the pipeline stages.

Uploaded by

Timothy Eng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

ECE 2300

Digital Logic & Computer Organization


Spring 2018

Pipelined Microprocessor

Lecture 17: 1
Announcements
• Prelab 5(b) deadline is on Friday

• Prelim 2: Tues April 17, 7:30-9:00pm, PHL 101


– Coverage: Lectures 8~16
• FSMs, timing analysis, binary arithmetic, memories, single-
cycle microprocessor
• Supplementary notes on timing analysis posted on course
web
– Closed book, closed notes, closed internet
– A sample exam is posted on CMS
– Instructor OH moved from Thursday 4/12 to
Monday 4/16, 4:00-5:30pm (one-time change)

Lecture 17: 2
Programmable Single-Cycle Processor
0 0
1 1
+2 0
Z 2
Z’ 3 MP
1 N 4
SE(OFF,0) Adder N’ 5
MP C 6
V 7 BS
DR Fm … F0
SA RF DataA
Inst. RAM

SB M_address
Decoder

IMM LD Data
DataB ALU
PC

MB SA Data_in 0
FS 0 RAM
SB 1 1
MD
DR
LD VCZN
MW D_in SE
MB MW MD
BS
IMM

• Instruction RAM holds the program to be run


• Decoder derives control word from the instruction

Lecture 17: 3
ECE-2300→
Instructions Instruction Set
Control Words
LD

• Instruction decoder derives the control word from the


instruction fields
• Combinational circuit ! instruction input, CW output
Lecture17:
Lecture 17:!14 4
Instruction Set Architecture (ISA)
• The ISA describes a set of instructions supported by a
family of machines

• The ISA specification tells hardware and software


(compiler and operating system) developers
– Instruction formats
– Operation of each instruction
– Ways to form memory addresses
– Data formats
– Lots of other

• Examples: x86, ARM, MIPS, POWER, SPARC, RISC-V

Lecture 17: 5
Steps in Instruction Execution
• Instruction Fetch (IF)
– Fetch instruction; Update PC
• Instruction Decode (ID)
– Decode instruction; Read register file
• Execute (EX)
– Perform ALU operation
• Memory (MEM)
– Perform memory operation
• Write Back (WB)
– Put result into register file

• Clock period is limited by the longest path


– Suppose each step takes 1ns, clock period is 5ns

Lecture 17: 6
Pipelining: Basic Idea

IF ID EX MEM WB

IF/ID ID/EX EX/MEM MEM/WB

• Overlap instruction execution by performing


each step in successive clock cycles

Lecture 17: 7
Pipelining: Overlapped Instructions
• Single-cycle execution
CC1 CC2

IF-ID-EX-MEM-WB IF-ID-EX-MEM-WB

• Pipelined execution
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
Instruction 1 IF ID ALU MEM WB
Instruction 2 IF ID ALU MEM WB
Instruction 3 IF ID ALU MEM WB
Instruction 4 IF ID ALU MEM WB
Instruction 5 IF ID ALU MEM WB

Lecture 17: 8
Pipelining: Performance
• Faster clock frequency than single cycle processor
• Each instruction takes 5 cycles
• Average number of cycles per instruction (CPI)

• ~1 instruction completed every cycle (ideally)


CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
Instruction 1 IF ID ALU MEM WB
Instruction 2 IF ID ALU MEM WB
Instruction 3 IF ID ALU MEM WB
Instruction 4 IF ID ALU MEM WB
Instruction 5 IF ID ALU MEM WB

Lecture 17: 9
Instruction Fetch Stage

+2
M
U
P Inst
X C RAM

PCJ
PCL

IF/ID

• Fetch the instruction into IF/ID (based on current PC)


• Load PC+2 into PC
• Place PC+2 into IF/ID

Lecture 17: 10
Instruction Decode Stage
Control
CU Signals

Adder VCZN
+2
RF
M LD
U
P Inst Decoder SA
X C RAM SB
DR
PCJ
PCL D_in

SE

IF/ID ID/EX

• Read source operands from RF into ID/EX


• Place SE(IMM) into ID/EX
• Place SE(IMM)+(PC+2) into ID/EX
• Place DR into ID/EX
Lecture 17: 11
Execute Stage
Control
CU Signals

Adder VCZN
+2 Fm … F0
RF
M LD
U
P Inst Decoder SA
ALU
X C RAM SB
DR M
PCJ
U
X VCZN
PCL D_in

SE

IF/ID ID/EX MB EX/MEM

• Perform ALU operation and place result into EX/MEM


• Pass DataB from the RF to EX/MEM
• Pass DR to EX/MEM
• If taken branch, update PC (Is it too late?)
Lecture 17: 12
Memory Stage
Control
CU Signals

Adder VCZN
+2 Fm … F0 Data
RF RAM
M LD
U
P Inst Decoder SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in

SE

IF/ID ID/EX MB EX/MEM MEM/WB

• Store: Write DataB into RAM


• Load: Read data from RAM into MEM/WB
• ALU operation: pass ALU result from EX/MEM to MEM/WB
• Pass DR to MEM/WB
Lecture 17: 13
Writeback Stage
Control
CU Signals

Adder VCZN
+2 Fm … F0 Data
RF RAM
M LD
U
P Inst Decoder SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in

SE

IF/ID ID/EX MB EX/MEM MEM/WB

• Load or ALU operation: Write register file

Lecture 17: 14
Pipelined Microprocessor
Control
CU Signals

Adder VCZN
+2 Fm … F0 Data
RF RAM
M LD
U
P Inst Decoder SA M
ALU
X C RAM SB U
DR M D_IN
U X
PCJ VCZN
X MW MD
PCL D_in

SE

IF/ID ID/EX MB EX/MEM MEM/WB

Lecture 17: 15
Abstract Representation

IM Reg DM Reg
A
L
U

IF/ID ID/EX EX/MEM MEM/WB

Lecture 17: 16
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

Lecture 17: 17
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

ADD R1,R2,R3

Lecture 17: 18
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

OR R4,R4,R3 ADD R1,R2,R3

Lecture 17: 19
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

SUB R5,R2,R3 OR R4,R4,R3 ADD R1,R2,R3

Lecture 17: 20
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

AND R6,R6,R2 SUB R5,R2,R3 OR R4,R4,R3 ADD R1,R2,R3

Lecture 17: 21
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

ADDI R7,R7,3 AND R6,R6,R2 SUB R5,R2,R3 OR R4,R4,R3 ADD R1,R2,R3

Lecture 17: 22
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

ADDI R7,R7,3 AND R6,R6,R2 SUB R5,R2,R3 OR R4,R4,R3

Lecture 17: 23
Example Instruction Sequence

IM Reg DM Reg
ADD R1,R2,R3 A
OR R4,R4,R3 L
SUB R5,R2,R3
AND R6,R6,R2 U
ADDI R7,R7,3

ADDI R7,R7,3 AND R6,R6,R2 SUB R5,R2,R3

Lecture 17: 24
Example Instruction Sequence
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

A
ADD R1,R2,R3 IM Reg L DM Reg
U

A
OR R4,R4,R3 IM Reg L DM Reg
U

A
SUB R5,R2,R3 IM Reg L DM Reg
U

A
AND R6,R6,R2 IM Reg L DM Reg
U

A
ADDI R7,R7,3 IM Reg L DM Reg
U

Lecture 17: 25
What About This Sequence?
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

A
ADD R1,R2,R3 IM Reg L DM Reg
U

A
OR R4,R1,R3 IM Reg L DM Reg
U

A
SUB R5,R2,R1 IM Reg L DM Reg
U

A
AND R6,R1,R2 IM Reg L DM Reg
U

A
ADDI R7,R7,3 IM Reg L DM Reg
U

The OR, SUB, and AND instructions are


data dependent on the ADD instruction
Lecture 17: 26
Data Hazard
• Occurs when a register is read before the write back of a
value to that register

ADD R1,R2,R3 IF ID ALU MEM WB


OR R4,R1,R3 IF ID ALU MEM WB
IF ID ALU MEM WB
SUB R5,R2,R1
IF ID ALU MEM WB
AND R6,R1,R2

• What should happen


– The 1st instruction calculates a new value for R1
– The 2nd, 3rd, and 4th instructions use this new value

• What actually happens


– The 2nd, 3rd, and 4th instructions read the old value of R1
– The first instruction then writes the new value into R1

Lecture 17: 27
Solution 1: SW (compiler) Inserts NOPs
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

A
ADD R1,R2,R3 IM Reg L DM Reg
U

A
NOP IM Reg L DM Reg
U

A
NOP IM Reg L DM Reg
U

A
NOP IM Reg L DM Reg
U

A
OR R4,R1,R3 IM Reg L DM Reg
U

Lecture 17: 28
Solution 2: HW Stalls the Pipeline
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

A
ADD R1,R2,R3 IM Reg L DM Reg
U

A
OR R4,R1,R3 IM bubble bubble bubble Reg L DM Reg
U

SUB R5,R2,R1 IM Reg A


L
DM
U

A
AND R6,R1,R2 IM Reg L
U

ADDI R7,R7,3 IM Reg

The pipeline is stalled for three cycles

Lecture 17: 29
Example: Data Hazards
• Identify all data hazards in the following
instruction sequences by circling each source
register that is read before the updated value is
written back
ADD R1, R2, R3
NOP
ADDI R2, R1, 1
SUB R3, R1, R2
SUB R4, R3, R1

Lecture 17: 30
Before Next Class
• H&H 7.5.3-7.5.5

Next Time

More Pipelined Microprocessor

Lecture 17: 31

You might also like