HW3 Sol PDF
HW3 Sol PDF
Homework 3
1) For the following code snippet list the data dependencies and rewrite the code to resolve
name dependencies. (15 points)
0 & 1, 3 & 5 on R2
4 & 6 on R4
Use the instruction latencies as indicated in the selected slide to first show all the stalls
that is present in the following piece of code if the branch is not taken.
Now unroll this loop as many times as needed and schedule the instructions to remove all
the stalls. You may rename registers i.e. use new registers and/or change the
immediate/offset values. You can ignore structural hazards as well.
What is the speed up that you achieved after unrolling and scheduling?
Loop: LD F0, 0(r0)
LD F2, 0(r2)
MULTD F4, F0, F2
LD F6, 0(r4)
ADDD F8, F4, F6
SD 0(r6), F8
ADDI r0, r0, #4
ADDI r2, r2, #4
ADDI r4, r4, #4
ADDI r6, r6, #4
SUBI r8, r8, #1
BNEZ r8, Loop
Clock
Loop: LD F0, 0(r0) 1
LD F2, 0(r2) 2
Stall
MULTD F4, F0, F2 4
LD F6, 0(r4) 5
Stall
Stall
Stall
Stall
Stall
ADDD F8, F4, F6 11
Stall
Stall
Stall
SD 0(r6), F8 15
ADDI r0, r0, #4 16
ADDI r2, r2, #4 17
ADDI r4, r4, #4 18
ADDI r6, r6, #4 19
SUBI r8, r8, #1 20
Stall
BNEZ r8, Loop 22
Stall 23
Unrolled loop is (23*2)/18 = 2.55 times faster than without unrolling with respect to
clock cycles consumed.
(Ignoring pipeline fill time)
CPI for a) is 23/12 ≈ 2
CPI for b) is 1
After unrolling the CPI is now the same as the ideal CPI.
3) Tomasulo’s Algorithm
Consider the following specifications. (40 points)
Fill in the execution profile for the code given in the table which includes the
cycles that each instruction occupies in the IS, EX, and WR stages and comments
to justify your answer such as type of hazards and the registers involved.
# Instruction IS EX WR Comments
1 LD F0, 0(r0) 1 2 3
2 ADDD F2, F0, F4 2 4-8 9 RAW on F0 from #1
RAW on F4 from #2
3 MULTD F4, F2, F6 3 17-24 25 Structural Hazard from MULT at #7
(Only one functional Unit)
Structural Hazard from Adder
4 ADDD F6, F8, F10 4 9-13 14
Add instruction #2
5 DADDI r0,r0, #8 5 6 7
6 LD F1, 0(r1) 6 7 8
7 MULTD F1, F1, F8 7 9-16 17 RAW on F1 from #6
Structural Hazard from adder
8 ADDD F6, F3, F5 8 14-18 19
Add instruction #4
9 DADDI r1, r1, #8 9 10 11