0% found this document useful (0 votes)

233 views4 pages

A4 Solution

This document summarizes the solution to an assignment on pipelined processors. It identifies data dependencies and hazards in sample code, calculates the CPI for a program with dependencies under different forwarding conditions, analyzes performance impacts of bubbles in a 10-stage pipeline, discusses how forwarding can avoid stalls for store-after-load dependencies, and lists predictions and accuracies for different branch prediction schemes given sample branch outcomes.

Uploaded by

Assam Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

233 views4 pages

A4 Solution

Uploaded by

Assam Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

COE 308 – Computer Architecture

Assignment 6: Pipelined Processor

Solution
1. (4 pts) Identify all the RAW data dependencies in the following code. Which dependencies are
data hazards that will be resolved by forwarding? Which dependencies are data hazards that will
cause a stall? Using a graphical representation of the pipeline, show the forwarding paths and
stalled cycles if any.
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6

Solution:

RAW dependencies:

add $3, $4, $2 and sub $5, $3, $1 (forwarding)

add $3, $4, $2 and lw $6, 200($3) (forwarding)
lw $6, 200($3) and add $7, $3, $6 (stall 1, forward)
add $3, $4, $2 and add $7, $3, $6 (from register)

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

add $3, $4, $2 IM Reg ALU DM Reg

sub $5, $3, $1 IM Reg ALU DM Reg

lw $6, 200($3) IM Reg ALU DM Reg

add $7, $3, $6 IM bubbl Reg ALU DM Reg

2. (4 pts) We have a program of 106 instructions in the format of “lw,add,lw,add,…”. The

add instruction depends only on the lw instruction right before it. The lw instruction also
depends only on the add instruction right before it. If this program is executed on the 5-stage
MIPS pipeline:
a) Without forwarding, what would be the actual CPI?
b) With forwarding, what would be the actual CPI?

Prepared by Dr. Muhamed Mudawar Page 1 of 4

Solution:

a) Without forwarding, the value being written into a register can only be read in the same
cycle. As a result, there will be a bubble of 2 cycles between a LW and the dependent
ADD to allow the LW to progress through the MEM and WB stages. Similarly, there
will be a bubble of 2 cycles between an ADD and the dependent LW.

Therefore, it takes 6 cycles on average to complete one LW and one ADD.

1 cycle (to complete LW) + 2 cycles (bubbles) + 1 cycle (to complete ADD) + 2 cycles
(bubbles) = 6 cycles

So, it takes 6 cycles to complete 2 instructions

Average CPI = 6/2 = 3.

b) With forwarding, there will be a bubble of 1 cycle between a LW and the dependent
ADD. However, no bubble exists between an ADD and the dependent LW.

Therefore, it takes only 3 cycles on average to to complete one LW and one ADD.
1 cycle (to complete LW) + 1 cycle (bubble) + 1 cycle (to complete ADD) = 3 cycles

So, it takes 3 cycles to complete 2 instructions

Average CPI = 3/2 = 1.5.

3. (4 pts) A 10-stage instruction pipeline runs at a clock rate of 1 GHz. The instruction mix is such
that 15% of instructions cause one bubble to be inserted into the pipeline, and 10% of
instructions cause two bubbles to be inserted. The equivalent single-cycle implementation would
lead to a clock rate of 150 MHz.
a) What is the increase in the pipeline CPI over the ideal CPI as a result of bubbles?
b) What is the speedup of pipelined implementation over single-cycle?

Solution:

a) Ideal pipeline CPI = 1 cycle per instruction (if no bubbles)

Increase in CPI due to bubbles = 0.15 * 1 + 0.1 * 2 = 0.35 cycles per instruction

Pipeline CPI with bubbles = 1 + 0.35 = 1.35 (35% increase over ideal CPI)

b) Speedup of pipelined implementation =

(Pipeline Clock Rate / Single-Cycle Clock) * (Single-Cycle CPI / Pipeline CPI) =

(1000 MHz / 150 MHz) * (1 / 1.35) = 4.94

Prepared by Dr. Muhamed Mudawar Page 2 of 4

4. (4 pts) Store-after-load data dependence. Consider copying an array of n words from one
address in memory to another. This can be accomplished by placing a sequence of lw and sw
instructions in a loop, with each loop iteration copying one word. In the current pipelined
implementation shown in the lecture slides, this leads to one bubble (stall cycle) between lw and
sw. Is it possible to avoid this stalling via additional data forwarding hardware? Discuss how this
can be done or explain how the bubble is unavoidable.

Solution: Yes, forwarding is possible and we can avoid stalling the pipeline. Consider:

LW $8, ... # LW instruction writes $8

SW $8, ... # SW instruction uses $8

We need a multiplexer at the input of EX/MEM.B register as show below. The data read from the
data memory in the MEM stage should be fed back at the input of this multiplexer. A control
signal “ForwardC” is needed to control the selection of this multiplexer. The Forwarding unit in
the DECODE stage will generate the “ForwardC” signal and pipeline it, after detecting the
dependency between a SW and a previous LW instruction. The SW instruction is currently in the
DECODE stage (MemWrite = 1). The LW instruction is in the EXE stage (ID/EX.MemRead = 1).
The ID/EX.RW register for the LW instruction contains the same register number as Rt for the
SW instruction. The Forwarding Unit can detect this situation and generate the ForwardC signal.

IF/ID ID/EX
ALUSrc MemtoReg
Imm26

Imm2
EX/MEM MEM/WB
Ex ALU
t
ALU result
m A

WriteData
Instruction

u Data m
A

Rs L
x m Memory u
Rt Register u
U
File
x
m x m
B
B

u u Data_in
x x
m
Rw

Rw
Rd u
x

ForwardB ForwardA
Forwarding Unit

ForwardC ForwardC

Op MemWrite
Control
Unit
MemRead MemRead

Prepared by Dr. Muhamed Mudawar Page 3 of 4

5. (4 pts) We have a program core consisting of five conditional branches. The program core will
be executed millions of times. Below are the outcomes of each branch for one execution of the
program core (T for taken and N for not taken).
Branch 1: T-T-T
Branch 2: N-N-N-N
Branch 3: T-N-T-N-T-N
Branch 4: T-T-T-N-T
Branch 5: T-T-N-T-T-N-T
Assume that the behavior of each branch remains the same for each program core execution. For
dynamic branch prediction schemes, assume that each branch has its own prediction buffer and
each buffer is initialized to the same state before each execution. List the predictions and the
accuracies for each of the following branch prediction schemes:
a) Always taken
b) Always not taken
c) 1-bit predictor, initialized to predict taken
d) 2-bit predictor, initialized to weakly predict taken
Solution:

Prediction accuracy = 100% * Correct Predictions / Total Branches

a) Branch 1: prediction: T-T-T, right = 3, wrong = 0
Branch 2: prediction: T-T-T-T, right = 0, wrong = 4
Branch 3: prediction: T-T-T-T-T-T, right = 3, wrong = 3
Branch 4: prediction: T-T-T-T-T, right = 4, wrong = 1
Branch 5: prediction: T-T-T-T-T-T-T, right = 5, wrong = 2
Total right = 15, Total wrong = 10, Accuracy = 100% * 15/25 = 60%

b) Branch 1: prediction: N-N-N, right = 0, wrong = 3

Branch 2: prediction: N-N-N-N, right = 4, wrong = 0
Branch 3: prediction: N-N-N-N-N-N, right = 3, wrong = 3
Branch 4: prediction: N-N-N-N-N, right = 1, wrong = 4
Branch 5: prediction: N-N-N-N-N-N-N, right = 2, wrong = 5
Total right = 10, Total wrong = 15, Accuracy = 100% * 10/25 = 40%

c) Branch 1: prediction: T-T-T, right = 3, wrong = 0

Branch 2: prediction: T-N-N-N, right = 3, wrong = 1
Branch 3: prediction: T-T-N-T-N-T, right = 1, wrong = 5
Branch 4: prediction: T-T-T-T-N, right = 3, wrong = 2
Branch 5: prediction: T-T-T-N-T-T-N, right = 3, wrong = 4
Total right = 13, Total wrong = 12, Accuracy = 100% * 13/25 = 52%

d) Branch 1: prediction: T-T-T, right = 3, wrong = 0

Branch 2: prediction: T-N-N-N, right = 3, wrong = 1
Branch 3: prediction: T-T-T-T-T-T, right = 3, wrong = 3
Branch 4: prediction: T-T-T-T-T, right = 4, wrong = 1
Branch 5: prediction: T-T-T-T-T-T-T, right = 5, wrong = 2
Total right = 18, Total wrong = 7, Accuracy = 100% * 18/25 = 72%

Prepared by Dr. Muhamed Mudawar Page 4 of 4

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Chip Design For Non Designers An Introduction 2008 PDF
No ratings yet
Chip Design For Non Designers An Introduction 2008 PDF
180 pages
Hazards PDF
No ratings yet
Hazards PDF
30 pages
The Black Art of LOLA
No ratings yet
The Black Art of LOLA
16 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
CompEng 361 - Homework 3 Solutions(1)
No ratings yet
CompEng 361 - Homework 3 Solutions(1)
6 pages
PS4-Solution
No ratings yet
PS4-Solution
6 pages
M116C 1 EE116C-Midterm2-w15 Solution
100% (1)
M116C 1 EE116C-Midterm2-w15 Solution
8 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
06 Solutions For Chapter 6 Exercises
No ratings yet
06 Solutions For Chapter 6 Exercises
14 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
No ratings yet
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
9 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
No ratings yet
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
7 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
Coa Applied
No ratings yet
Coa Applied
13 pages
491 Part%2B1%2B-%2BTarea
No ratings yet
491 Part%2B1%2B-%2BTarea
3 pages
Ca CT2
No ratings yet
Ca CT2
4 pages
Computer Architecture Midterm1 Cmu
No ratings yet
Computer Architecture Midterm1 Cmu
30 pages
Cse590490 HW2
No ratings yet
Cse590490 HW2
5 pages
EE557SP25HW2Sol
No ratings yet
EE557SP25HW2Sol
9 pages
Problem Set 4 Sol
No ratings yet
Problem Set 4 Sol
14 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Solution To Homework #5
100% (1)
Solution To Homework #5
3 pages
hw5 Soln
No ratings yet
hw5 Soln
4 pages
Mid Term 13-14
No ratings yet
Mid Term 13-14
3 pages
Cs433 Fa12 Hw4 Sol Correct
No ratings yet
Cs433 Fa12 Hw4 Sol Correct
14 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
Midtermarch 2
No ratings yet
Midtermarch 2
9 pages
Numerical: Central Processing Unit
No ratings yet
Numerical: Central Processing Unit
28 pages
Quiz For Chapter 4 With Solutions
100% (1)
Quiz For Chapter 4 With Solutions
30 pages
pipe3
No ratings yet
pipe3
32 pages
Group 17_2151177
No ratings yet
Group 17_2151177
15 pages
BFE Final Organization Fall 2014 Answer
No ratings yet
BFE Final Organization Fall 2014 Answer
8 pages
PipelineHazards
No ratings yet
PipelineHazards
4 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Notes
No ratings yet
Notes
7 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
COA End Term Solution.docx-1
No ratings yet
COA End Term Solution.docx-1
8 pages
Appendix C
No ratings yet
Appendix C
26 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
50 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
Hazards - V3
No ratings yet
Hazards - V3
34 pages
Quiz2 Soln spr12 PDF
No ratings yet
Quiz2 Soln spr12 PDF
2 pages
Ex4 Updated
No ratings yet
Ex4 Updated
4 pages
Computer Architecture Test 2
100% (1)
Computer Architecture Test 2
5 pages
App C
No ratings yet
App C
50 pages
hw2 Sols Ece570 w14
No ratings yet
hw2 Sols Ece570 w14
9 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Homework 5
No ratings yet
Homework 5
6 pages
PARALLELISM VIA INSTRUCTIONS: Pipelining Exploits The Potential Parallelism Among Instructions. This Parallelism Is
No ratings yet
PARALLELISM VIA INSTRUCTIONS: Pipelining Exploits The Potential Parallelism Among Instructions. This Parallelism Is
2 pages
1158 CS F342 20240527010246 Mid Semester Question Paper
No ratings yet
1158 CS F342 20240527010246 Mid Semester Question Paper
4 pages
Computer Architecture Solutions_OK
No ratings yet
Computer Architecture Solutions_OK
6 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Analog Dialogue, Volume 47, Number 2
From Everand
Analog Dialogue, Volume 47, Number 2
Analog Dialogue
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Unit 7 Interfacing Memory and Parallel IO Peripherals
0% (1)
Unit 7 Interfacing Memory and Parallel IO Peripherals
23 pages
MOS
No ratings yet
MOS
992 pages
Topic 1 - Computer Processing Concept
No ratings yet
Topic 1 - Computer Processing Concept
37 pages
DSM Practical 2
No ratings yet
DSM Practical 2
14 pages
Digital Design
100% (1)
Digital Design
33 pages
بحث نظم مضمنة
No ratings yet
بحث نظم مضمنة
11 pages
Rencana Pembelajaran Semester (RPS) : 3 Sks (Fis)
No ratings yet
Rencana Pembelajaran Semester (RPS) : 3 Sks (Fis)
12 pages
CSO Unit-3 Notes Summary
No ratings yet
CSO Unit-3 Notes Summary
4 pages
Certified IT Support Professional
No ratings yet
Certified IT Support Professional
14 pages
Introduction To VLSI Design
No ratings yet
Introduction To VLSI Design
18 pages
ATMEL HVPP Fusebit Doctor
No ratings yet
ATMEL HVPP Fusebit Doctor
6 pages
Programmable Interval Timer - 8254
No ratings yet
Programmable Interval Timer - 8254
35 pages
Bit Logic Operations
No ratings yet
Bit Logic Operations
26 pages
NMOS 128 Kbit (16Kb X 8) UV EPROM: Description
No ratings yet
NMOS 128 Kbit (16Kb X 8) UV EPROM: Description
11 pages
Winbond ACPI Controller W83304D W83304G For AMD Claw Hammer CPU
No ratings yet
Winbond ACPI Controller W83304D W83304G For AMD Claw Hammer CPU
16 pages
Computer Architecture 3rd Edition by Moris Mano CH 12
No ratings yet
Computer Architecture 3rd Edition by Moris Mano CH 12
21 pages
William Stallings Computer Organization and Architecture
No ratings yet
William Stallings Computer Organization and Architecture
18 pages
Atxmega16 128a4u
No ratings yet
Atxmega16 128a4u
121 pages
Log 2014-00-27 19-53-55
No ratings yet
Log 2014-00-27 19-53-55
74 pages
Multiwii - Pro - MegaPirate Flight Controller With MS611
No ratings yet
Multiwii - Pro - MegaPirate Flight Controller With MS611
5 pages
Ic Viva
No ratings yet
Ic Viva
2 pages
121 - A. B. C. D.: View Answer Discuss Too Difficult!
No ratings yet
121 - A. B. C. D.: View Answer Discuss Too Difficult!
10 pages
8 Timers PIC18F46k22
No ratings yet
8 Timers PIC18F46k22
6 pages
ADC Lecture14
No ratings yet
ADC Lecture14
6 pages
Module 8 - Performance Measurement - Analysis
No ratings yet
Module 8 - Performance Measurement - Analysis
38 pages
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
No ratings yet
Design Examples and Case Studies of Program Modeling and Programming With RTOS - 1
40 pages
AL ICT Unit 2
No ratings yet
AL ICT Unit 2
4 pages
Finite State Machine (FSM) Vlsi Ut 3
No ratings yet
Finite State Machine (FSM) Vlsi Ut 3
78 pages

A4 Solution

Uploaded by

A4 Solution

Uploaded by

COE 308 – Computer Architecture

Assignment 6: Pipelined Processor

add $3, $4, $2 and sub $5, $3, $1 (forwarding)

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

add $3, $4, $2 IM Reg ALU DM Reg

sub $5, $3, $1 IM Reg ALU DM Reg

lw $6, 200($3) IM Reg ALU DM Reg

add $7, $3, $6 IM bubbl Reg ALU DM Reg

2. (4 pts) We have a program of 106 instructions in the format of “lw,add,lw,add,…”. The

Prepared by Dr. Muhamed Mudawar Page 1 of 4

Therefore, it takes 6 cycles on average to complete one LW and one ADD.

So, it takes 6 cycles to complete 2 instructions

So, it takes 3 cycles to complete 2 instructions

a) Ideal pipeline CPI = 1 cycle per instruction (if no bubbles)

b) Speedup of pipelined implementation =

(Pipeline Clock Rate / Single-Cycle Clock) * (Single-Cycle CPI / Pipeline CPI) =

(1000 MHz / 150 MHz) * (1 / 1.35) = 4.94

Prepared by Dr. Muhamed Mudawar Page 2 of 4

LW $8, ... # LW instruction writes $8

Prepared by Dr. Muhamed Mudawar Page 3 of 4

Prediction accuracy = 100% * Correct Predictions / Total Branches

b) Branch 1: prediction: N-N-N, right = 0, wrong = 3

c) Branch 1: prediction: T-T-T, right = 3, wrong = 0

d) Branch 1: prediction: T-T-T, right = 3, wrong = 0

Prepared by Dr. Muhamed Mudawar Page 4 of 4

You might also like