0% found this document useful (0 votes)

3 views7 pages

hw4 Cse490-590-Sp2025 Sol

The document outlines a homework assignment for CSE490/590, focusing on MIPS instruction scheduling, loop unrolling, and hazard detection in pipelined processors. It includes examples of instruction sequences, IPC calculations, and the application of Tomasulo's algorithm for dynamic scheduling. Additionally, it discusses register renaming to eliminate data hazards in floating-point operations.

Uploaded by

ILLANDULA RITHVIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

hw4 Cse490-590-Sp2025 Sol

Uploaded by

ILLANDULA RITHVIK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CSE490/590, Spring 2025 Homework 4 (not graded) Solution

1.
(a) Schedule the following instruction sequence for dual-issue MIPS. Consider one
ALU/branch instruction and one load/store instruction can be executed in parallel
when there are no data dependencies:
Loop: lw $t0, 0($s1) // $t0=array element
add $t0, $t0, $s2 // add scalar in $s2
sw $t0, 0($s1) // store result
addi $s1, $s1,–4 // decrement pointer
add $s4, $s5, $s4 // update $s4
bne $s1, $zero, Loop // branch when $s1!=0

ALU/Branch Load/Store Cycle

Loop: add $s4, $s5, $s4 lw $t0, 0($s1) 1
addi $s1, $s1, -4 nop 2
add $t0, $t0, $s2 nop 3
bne $s1, $zero, Loop sw $t0, 4($s1) 4
5
6

a. Compute the IPC in part (a)

IPC = 6/4 = 1.5

2.
Assume a MIPS 5-stage pipelined processor without forwarding, show loop unrolling so that
there are four copies of the loop body for the following instruction sequence. Schedule the
instructions to avoid stalls.

Loop: lw $s0, 0($t0)

addi $t2, $t2, $s0 // $t2 contains the sum of the array
addi $t0, $t0, 4
bne $t0, $t1, Loop

The above code iterates over an array of integers and computes its sum.
Assume that $t0 contains the address of the first word, ($t1 - 4) is the address of last word,
and that the difference between the addresses in $t0 and $t1 is a multiple of 16.
Eliminate any obviously redundant computations. Assume registers $s0 to $s7 and $t3 to $t9
are free to use for any purpose.
Fewer instructions solution (causes unnecessary stalls, still
acceptable answer):
Loop: lw $s0, 0($t0)
addi $t2, $t2, $s0
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

lw $s0, 4($t0)
addi $t2, $t2, $s0
lw $s0, 8($t0)
addi $t2, $t2, $s0
lw $s0, 12($t0)
addi $t2, $t2, $s0
addi $t0, $t0, 16
bne $t0, $t1, Loop

Time-efficient solution (avoids stalls in loop body when possible):

// Load all the values needed from memory
Loop: lw $s0, 0($t0) // lw instructions are bunched together so that
lw $s1, 4($t0) // their corresponding addi instructions are
lw $s2, 8($t0) // separated to avoid stalls
lw $s3, 12($t0)
addi $t0, $t0, 16 // addi needs to be at least 3 instructions
// before the bne to avoid a stall
// Compute all the partial sums for this iteration
addi $s4, $s4, $s0
addi $s5, $s5, $s1 // Partial sum registers are needed to help
addi $s6, $s6, $s2 // avoid stalls (each addi writes to a
addi $s7, $s7, $s3 // different register)
bne $t0, $t1, Loop
// Compute the final sum from all the partial sums
add $t3, $s4, $s5
add $t4, $s6, $s7 // Last instruction stalls but that’s ok
add $t2, $t3, $t4 // since it is not part of the loop

3. For the code sequence shown below

loop:
l.d $f12, 0($f5)
add.d $f6, $f6, $f12
daddui $f5, $f5, -8
bne $f5, $f9, loop // $f9 holds the address of the last value to be operated on.
a) Show loop unrolling so that there are four copies of the loop body Assume $f5, $f9
(that is, the size of the array) are initially a multiple of 32, which means that the number of loop
iterations is a multiple of 4. Eliminate any obviously redundant computations and do not reuse
any of the registers.
l.d $f12, 0($f5)
add.d $f7, $f7, $f12
l.d $f13, -8($f5)
add.d $f8, $f8, $f13
l.d $f14, -16($f5)
add.d $f10, $f10, $f14
l.d $f15, -24($f5)
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

add.d $f11, $f11, $f15

daddui $f5, $f5, -32 ; -32 to account for 4 copies in the loop
bne $f5, $f9, loop
add.d $f16, $f7, $f8
add.d $f17, $f10, $f11
add.d $f18, $f16, $f17

b) Compute the number of cycles needed for 4 iterations

1. l.d $f12, 0($f5)

2. stall
3. add.d $f7, $f7, $f12
4. l.d $f13, -8($f5)
5. stall
6. add.d $f8, $f8, $f13
7. l.d $f14, -16($f5)
8. stall
9. add.d $f10, $f10, $f14
10. l.d $f15, -24($f5)
11. stall
12. add.d $f11, $f11, $f15
13. daddui $f5, $f5, -32 ;double add ALU with bne not given; assumed a 1 cycle latency
14. stall
15. bne $f5, $f9, loop
16. add.d $f16, $f7, $f8
17. add.d $f17, $f10, $f11
18. stall
19. stall
20. stall
21. add.d $f18, $f16, $f17
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

4. Consider the following code sequence.

I1: lw $s4, 0($s1)
I2: or $s2, $s4, $s1
I3: and $s6, $s5, $s3
Highlight the Hazard and discuss how out of order processor will help when lw $s4, 0($s1)
encounters a cache miss?

I1: lw $s4, 0($s1)

I2: or $s2, $s4, $s1
I3: and $s6, $s5, $s3
I3 can execute and wait for write back stage until the data is loaded in $s4 in I1 and eventually
forwarded to I2

5. In the following instruction sequence, find the hazards. Rename the registers to eliminate
the anti and output dependences

div.s r1,r2,r3
mult.s r4,r1, r5
add.s r1 ,r3, r6
sub.s r3,r1, r4

div.s r1,r2,r3
mult.s r4,r1, r5 // instr1 instr2 r1→ RAW (Data Dependence)
add.s r1 ,r3, r6 // instr1 instr3 r1 → WAW (Output Dependence)
sub.s r3,r1, r4 // instr2 instr4 r4 → RAW (Data Dependence), instr3 instr4 r3 → WAR
// instr1 instr4 r3 → Output Dependence (Can be WAR)
After Register Renaming:
div.s r1,r2,r3
mult.s r4,r1, r5
add.s r8 ,r3, r6
sub.s r9,r8, r4
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

6. Consider the following instruction sequence (floating point) on a processor (shown

below) which uses Tomasulo’s algorithm to dynamically schedule instructions (dual issue
per cycle - no speculation)

The processor has the following non-pipelined execution units:

A 2-cycle, FP add unit
A 3-cycle, FP multiply unit
Assume instructions can begin to execute in the same cycle as soon as its dispatched and resides
in Reservation Stations
Trace the execution by showing Reservation Stations and FP Registers at the end of cycles#
1,2,3,5 and 6

Reservation Station of Multiplier/Divider is numbered as 4 and 5 as opposed to 1 and 2 as

shown in the processor diagram above FP Registers
Busy -> Denotes if the operand is used in other operations or if it’s ready
Tag -> Denotes the reservation station number that uses the operand
Reservation Stations
S1, S2 -> Denotes the source operands used in the instruction sequence
Tag1 -> 0 indicates the values is available and can be dispatched to the ALU unit if its free
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

-> Any other number indicates if the value is dependent on the completion of the specific
instruction in the reservation station
[Tag, tag1 and Busy bits are just added for explanation purposes. You can ignore those if you
find them confusing]

Cycle# 1:

Cycle# 2:

Cycle#3:
After 2 cycles, the ADD instruction (w) computes a value by 6.0 + 7.8 = 13.8 for register R4. The value is then
forwarded to reservation stations that are waiting for value of R4.

Cycle#5:
After 2 cycles, the ADD instruction (y) computes 13.8 + 7.8 = 21.6 for register R4. This R4 value is broadcasted to all
the reservation stations waiting for R4. Since no prior values for R4 remain pending, the FP registers are updated
immediately.
CSE490/590, Spring 2025 Homework 4 (not graded) Solution

Cycle#6:
Starting of cycle 3, MUL instruction (x) had all the registers to start multiplication. It computes the multiplication of R0
and R4 6.0 * 13.8 = 82.8 in Register R2. Result is broadcasted and updated in FP registers accordingly after 3 more
cycle from 3rd cycle (i.e. 6th cycle).

More details: Solution for 1-9 cycles:

Similarly, in the 9th cycle, according to instruction (z), R8 is calculated as 21.6 * 82.8 = 1788.48 and updated in the
FP registers accordingly.

Bromley Coverage Paper
No ratings yet
Bromley Coverage Paper
20 pages
Chapter 2 Solutions: For More Practice
No ratings yet
Chapter 2 Solutions: For More Practice
8 pages
CSE 420 Fall 2018 Module 1 Sample Questi
No ratings yet
CSE 420 Fall 2018 Module 1 Sample Questi
18 pages
Tut10 Selected Ans
No ratings yet
Tut10 Selected Ans
7 pages
CSE 30321 - Lecture 02-03 - in Class Example Handout: Discussion - Overview of Stored Programs
No ratings yet
CSE 30321 - Lecture 02-03 - in Class Example Handout: Discussion - Overview of Stored Programs
8 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
CSCI 510: Computer Architecture Written Assignment 2 Solutions
No ratings yet
CSCI 510: Computer Architecture Written Assignment 2 Solutions
6 pages
MIS 6010 Assignment #3 (Spring 2025)
No ratings yet
MIS 6010 Assignment #3 (Spring 2025)
9 pages
Computer Architecture - Mid - Solution
No ratings yet
Computer Architecture - Mid - Solution
25 pages
cs433 Fa19 hw4 Solution
No ratings yet
cs433 Fa19 hw4 Solution
12 pages
19f Cpe221 Final Solution
No ratings yet
19f Cpe221 Final Solution
8 pages
CPE 221 Final Exam Solution Fall 2018
No ratings yet
CPE 221 Final Exam Solution Fall 2018
6 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
5 pages
Hw5 Solution
No ratings yet
Hw5 Solution
11 pages
Exam19s2 Answers
No ratings yet
Exam19s2 Answers
12 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
156 pages
2011 Quiz 4 Sol
No ratings yet
2011 Quiz 4 Sol
17 pages
Lecture 08
No ratings yet
Lecture 08
17 pages
18s Cpe221 Final Solution
No ratings yet
18s Cpe221 Final Solution
7 pages
Module-5: Syntax Directed Translation, Intermediate Code Generation, Code Generation 5.1,5.2,5.3, 6.1,6.2,8.1,8.2
No ratings yet
Module-5: Syntax Directed Translation, Intermediate Code Generation, Code Generation 5.1,5.2,5.3, 6.1,6.2,8.1,8.2
37 pages
18s Cpe221 Test1 Solution
No ratings yet
18s Cpe221 Test1 Solution
4 pages
LAB03 Report
No ratings yet
LAB03 Report
8 pages
CEG 2136 - Fall 2011 - Final PDF
No ratings yet
CEG 2136 - Fall 2011 - Final PDF
8 pages
CTCD Unit 4
No ratings yet
CTCD Unit 4
25 pages
Short Exam 1
No ratings yet
Short Exam 1
2 pages
CH02 Solution-1 PDF
No ratings yet
CH02 Solution-1 PDF
10 pages
CH02 Solution
No ratings yet
CH02 Solution
10 pages
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
No ratings yet
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
6 pages
CCEE 213 - 2006 - 2007 - II - Final
No ratings yet
CCEE 213 - 2006 - 2007 - II - Final
10 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
Coen317 Assignment2 Sol
No ratings yet
Coen317 Assignment2 Sol
14 pages
Sheet 1 Solution
No ratings yet
Sheet 1 Solution
7 pages
21cs43 Mces - Lab Manual
No ratings yet
21cs43 Mces - Lab Manual
40 pages
hw2 Cse490-590-Sp2025 Sol
No ratings yet
hw2 Cse490-590-Sp2025 Sol
5 pages
220 PracticeProblems 8 MultiCycleDP Sol
No ratings yet
220 PracticeProblems 8 MultiCycleDP Sol
34 pages
21BLC1374 Lab8
No ratings yet
21BLC1374 Lab8
10 pages
COA Assignment
No ratings yet
COA Assignment
11 pages
THE1
No ratings yet
THE1
7 pages
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
No ratings yet
ECE 341 Final Exam Solution: Problem No. 1 (10 Points)
9 pages
ps1 Sol
No ratings yet
ps1 Sol
11 pages
Instruction Format
No ratings yet
Instruction Format
29 pages
Computer System Organisation by Morris Mano Solution
33% (3)
Computer System Organisation by Morris Mano Solution
39 pages
ECE331 HW 1 HC12 Assembly Language
No ratings yet
ECE331 HW 1 HC12 Assembly Language
2 pages
Csa HW 4
No ratings yet
Csa HW 4
2 pages
Lab1
No ratings yet
Lab1
4 pages
16 32 r0 RTN Program Counter or PC: CPE 221 Test 1 Solution Spring 2019
No ratings yet
16 32 r0 RTN Program Counter or PC: CPE 221 Test 1 Solution Spring 2019
4 pages
Announced Quiz 3: ECE511/CSE511 Computer Architecture
No ratings yet
Announced Quiz 3: ECE511/CSE511 Computer Architecture
1 page
Arm Cheat Sheet
No ratings yet
Arm Cheat Sheet
2 pages
CS2100 Exam16s2 - Qns
No ratings yet
CS2100 Exam16s2 - Qns
12 pages
Selection
No ratings yet
Selection
14 pages
COL216 Assignment 4: 1 Problem Statement
No ratings yet
COL216 Assignment 4: 1 Problem Statement
4 pages
ARM Assembly ProblemsF
No ratings yet
ARM Assembly ProblemsF
8 pages
Ssse MC Lab Manual
No ratings yet
Ssse MC Lab Manual
20 pages
CEG 2136 - Fall 2008 - Final PDF
No ratings yet
CEG 2136 - Fall 2008 - Final PDF
9 pages
19f Cpe221 Test1 Solution
No ratings yet
19f Cpe221 Test1 Solution
4 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
Math Full Prove
No ratings yet
Math Full Prove
30 pages
Computer Architecture Course: IT089IU International University - VNU HCM Date: March 2021 Dr. Le Hai Duong Time: 3 Hours
No ratings yet
Computer Architecture Course: IT089IU International University - VNU HCM Date: March 2021 Dr. Le Hai Duong Time: 3 Hours
8 pages
Assignment-4 Ca
100% (1)
Assignment-4 Ca
10 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
Database Administrationupload
No ratings yet
Database Administrationupload
4 pages
System Software (BCA)
No ratings yet
System Software (BCA)
38 pages
Mod Menu Log - Com - Roblox.client
No ratings yet
Mod Menu Log - Com - Roblox.client
18 pages
Santhan Resume
No ratings yet
Santhan Resume
2 pages
Computer-Reviewer L
No ratings yet
Computer-Reviewer L
5 pages
BACKPROPAGATION (Training - Example, Ƞ, N
No ratings yet
BACKPROPAGATION (Training - Example, Ƞ, N
4 pages
First Lessons With Make Code and The Micro Bit 1 Slides
No ratings yet
First Lessons With Make Code and The Micro Bit 1 Slides
17 pages
Laporan Praktikum DDP
No ratings yet
Laporan Praktikum DDP
9 pages
Manual QCA With R - v170407
No ratings yet
Manual QCA With R - v170407
64 pages
Oracle XML Publisher-Font File
No ratings yet
Oracle XML Publisher-Font File
6 pages
Assignment: 1. C++ Program To Demonstrate The Use of Static Static Variables in A Function
No ratings yet
Assignment: 1. C++ Program To Demonstrate The Use of Static Static Variables in A Function
12 pages
An Introduction To System Software and Virtual Machines: After Studying This Chapter, You Will Be Able To
No ratings yet
An Introduction To System Software and Virtual Machines: After Studying This Chapter, You Will Be Able To
63 pages
Practicals For Boards 22-23
No ratings yet
Practicals For Boards 22-23
30 pages
Bennicci J. Mastering TypeScript. A Step-By-Step Guide... JavaScript Apps 2024
No ratings yet
Bennicci J. Mastering TypeScript. A Step-By-Step Guide... JavaScript Apps 2024
102 pages
Python For RaspberryPi Quickie
No ratings yet
Python For RaspberryPi Quickie
14 pages
LU 36-39 Ruby Basic-Cs-Arrays
No ratings yet
LU 36-39 Ruby Basic-Cs-Arrays
15 pages
DBMS
No ratings yet
DBMS
34 pages
Aim: Code:: Program 1
No ratings yet
Aim: Code:: Program 1
33 pages
Physical Database Design
No ratings yet
Physical Database Design
23 pages
W8_Graded_Unanswered
No ratings yet
W8_Graded_Unanswered
17 pages
Btech Cse 7 8 Sem Compiler Design 71905 Nov 2019
No ratings yet
Btech Cse 7 8 Sem Compiler Design 71905 Nov 2019
2 pages
Time Sheet Requirements
No ratings yet
Time Sheet Requirements
3 pages
Logcat
No ratings yet
Logcat
7 pages
Lesson 1 Introduction To Java
No ratings yet
Lesson 1 Introduction To Java
16 pages
Lagrange Multiplier - Using - Fmincon
No ratings yet
Lagrange Multiplier - Using - Fmincon
3 pages
Poja
No ratings yet
Poja
50 pages
Mad Unit-1
No ratings yet
Mad Unit-1
69 pages
Unit I: C++ and Object Oriented Programming
No ratings yet
Unit I: C++ and Object Oriented Programming
51 pages
A Site Property Can Be Set To
No ratings yet
A Site Property Can Be Set To
17 pages

hw4 Cse490-590-Sp2025 Sol

Uploaded by

hw4 Cse490-590-Sp2025 Sol

Uploaded by

CSE490/590, Spring 2025 Homework 4 (not graded) Solution

ALU/Branch Load/Store Cycle

a. Compute the IPC in part (a)

IPC = 6/4 = 1.5

Loop: lw $s0, 0($t0)

Time-efficient solution (avoids stalls in loop body when possible):

3. For the code sequence shown below

add.d $f11, $f11, $f15

b) Compute the number of cycles needed for 4 iterations

1. l.d $f12, 0($f5)

4. Consider the following code sequence.

I1: lw $s4, 0($s1)

6. Consider the following instruction sequence (floating point) on a processor (shown

The processor has the following non-pipelined execution units:

Reservation Station of Multiplier/Divider is numbered as 4 and 5 as opposed to 1 and 2 as

More details: Solution for 1-9 cycles:

You might also like