0% found this document useful (0 votes)

44 views

HW3 Sol PDF

The document summarizes homework assignments for an advanced computer architecture course. It includes two problems, the first asking to identify data dependencies and resolve name dependencies in a code snippet, and the second asking to analyze stalls and schedule instructions to remove stalls for a floating point pipeline code. It also provides specifications for applying Tomasulo's algorithm to another code sample and completing its execution profile.

Uploaded by

Anonymous GJK5Z8Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

HW3 Sol PDF

Uploaded by

Anonymous GJK5Z8Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Advanced Computer Architecture CMSC 611

Homework 3

Due in class Oct 17th, 2012

(Show your work to receive partial credit)

1) For the following code snippet list the data dependencies and rewrite the code to resolve
name dependencies. (15 points)

Loop: LD R2, 0(R7) ----- 0

Add R1, R2, R3 ----- 1
Sub R4, R6, R1 ----- 2
Add R2, R5, R6 ----- 3
LD R4, 32(R1) ----- 4
SD 36(R1), R2 ----- 5
BEQ R4, Loop ----- 6

Data Dependency – 1 & 2, 1 & 4 and 1 & 5 on R1

0 & 1, 3 & 5 on R2

4 & 6 on R4

Name Dependencies- Anti-dependence between 1 and 3 because of R2

Output dependence between 2 and 4 because of R4

Output dependence between 0 and 3 because of R2

Name dependencies can be resolved by renaming registers R2 and R4 with temporary

registers T2 and T4

Loop: LD R2, 0(R7) ----- 0

Add R1, R2, R3 ----- 1
Sub R4, R6, R1 ----- 2
Add T2, R5, R6 ----- 3
LD T4, 32(R1) ----- 4
SD 36(R1), T2 ----- 5
BEQ T4, Loop ----- 6
2) From the course slides. (Multi-cycle FP pipeline for MIPS) (45 points)

 Use the instruction latencies as indicated in the selected slide to first show all the stalls
that is present in the following piece of code if the branch is not taken.
 Now unroll this loop as many times as needed and schedule the instructions to remove all
the stalls. You may rename registers i.e. use new registers and/or change the
immediate/offset values. You can ignore structural hazards as well.
 What is the speed up that you achieved after unrolling and scheduling?
Loop: LD F0, 0(r0)
LD F2, 0(r2)
MULTD F4, F0, F2
LD F6, 0(r4)
ADDD F8, F4, F6
SD 0(r6), F8
ADDI r0, r0, #4
ADDI r2, r2, #4
ADDI r4, r4, #4
ADDI r6, r6, #4
SUBI r8, r8, #1
BNEZ r8, Loop
Clock
Loop: LD F0, 0(r0) 1
LD F2, 0(r2) 2
Stall
MULTD F4, F0, F2 4
LD F6, 0(r4) 5
Stall
Stall
Stall
Stall
Stall
ADDD F8, F4, F6 11
Stall
Stall
Stall
SD 0(r6), F8 15
ADDI r0, r0, #4 16
ADDI r2, r2, #4 17
ADDI r4, r4, #4 18
ADDI r6, r6, #4 19
SUBI r8, r8, #1 20
Stall
BNEZ r8, Loop 22
Stall 23

Unrolling it twice is sufficient to remove all the stalls.

Loop: LD F0, 0(r0) 1

LD F2, 0(r2) 2
LD F10, 4(r0) 3
LD F12, 4(r2) 4
MULTD F4, F0, F2 5
MULTD F14, F10, F12 6
LD F6, 0(r4) 7
LD F16, 4(r4) 8
ADDI r0, r0, #8 9
ADDI r2, r2, #8 10
ADDI r4, r4, #8 11
ADDD F8, F4, F6 12
ADDD F18, F14, F16 13
ADDI r6, r6, #8 14
SUBI r8, r8, #2 15
SD -8(r6), F8 16
BNEZ r8, Loop 17
SD -4(r6), F18 18

Unrolled loop is (23*2)/18 = 2.55 times faster than without unrolling with respect to
clock cycles consumed.
(Ignoring pipeline fill time)
CPI for a) is 23/12 ≈ 2
CPI for b) is 1
After unrolling the CPI is now the same as the ideal CPI.

3) Tomasulo’s Algorithm
Consider the following specifications. (40 points)

FU type cycles in EX Number of FUs

Integer 1 1
FP adder 5 1
FP multiplier 8 1
FP Divide 24 1

Assume the following.

 Functional units are not pipelined.
 All stages except EX take one cycle to complete.
 No limit on reservation stations.
 There is no forwarding between functional units. Both integer and floating point
results are communicated through the CDB.
 Memory accesses use the integer functional unit to perform effective address
calculation.
 All loads and stores will access memory during the EX stage. Pipeline stage EX
does both the effective address calculation and memory access for loads/stores.
 There are unlimited load/store buffers and an infinite instruction queue.
 Loads and stores take one cycle to execute. Loads and stores share a memory
access unit.
 If an instruction is in the WR stage in cycle x, then an instruction that is waiting
on the same functional unit (due to a structural hazard) can begin execution in
cycle x , unless it needs to read the CDB, in which case it can only start executing
on cycle x + 1.
 Only one instruction can write to the CDB in a clock cycle. Branches and stores
do not need the CDB since they don’t have WR stage.
 Whenever there is a conflict for a functional unit or the CDB, assume program
order.
 When an instruction is done executing in its functional unit and is waiting for the
CDB, it is still occupying the functional unit and its reservation station. (meaning
no other instruction may enter).
 Treat the BNEZ instruction as an Integer instruction. Assume LD instruction after
the BNEZ can be issued the cycle after BNEZ instruction is issued due to branch
prediction

Fill in the execution profile for the code given in the table which includes the
cycles that each instruction occupies in the IS, EX, and WR stages and comments
to justify your answer such as type of hazards and the registers involved.

# Instruction IS EX WR Comments
1 LD F0, 0(r0) 1 2 3
2 ADDD F2, F0, F4 2 4-8 9 RAW on F0 from #1
RAW on F4 from #2
3 MULTD F4, F2, F6 3 17-24 25 Structural Hazard from MULT at #7
(Only one functional Unit)
Structural Hazard from Adder
4 ADDD F6, F8, F10 4 9-13 14
Add instruction #2
5 DADDI r0,r0, #8 5 6 7
6 LD F1, 0(r1) 6 7 8
7 MULTD F1, F1, F8 7 9-16 17 RAW on F1 from #6
Structural Hazard from adder
8 ADDD F6, F3, F5 8 14-18 19
Add instruction #4
9 DADDI r1, r1, #8 9 10 11

IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
Ece 7373 HW#4
0% (1)
Ece 7373 HW#4
2 pages
Physics Dictionary: F4 - Chapter 1 - Introduction To Physics
No ratings yet
Physics Dictionary: F4 - Chapter 1 - Introduction To Physics
12 pages
Adv Topic Compiler Supported ILPSlides
No ratings yet
Adv Topic Compiler Supported ILPSlides
18 pages
AdvTopicCompilerSupportedILP
No ratings yet
AdvTopicCompilerSupportedILP
17 pages
HW3 Solution
No ratings yet
HW3 Solution
14 pages
MN Loop Unrolling
No ratings yet
MN Loop Unrolling
5 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
18 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
4 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
156 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Solution 2
No ratings yet
Solution 2
3 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
CSCI 510: Computer Architecture Written Assignment 2 Solutions
No ratings yet
CSCI 510: Computer Architecture Written Assignment 2 Solutions
6 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
cs433 Fa19 hw4 Solution
No ratings yet
cs433 Fa19 hw4 Solution
12 pages
Unit II
No ratings yet
Unit II
84 pages
cs146 Fall2017 Midterm1xx
No ratings yet
cs146 Fall2017 Midterm1xx
12 pages
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
No ratings yet
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
26 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
Introduction To Advanced Pipelining
No ratings yet
Introduction To Advanced Pipelining
64 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
EE-457 Spring
No ratings yet
EE-457 Spring
11 pages
No. of Cycles IF ID EXE MEM WB
No ratings yet
No. of Cycles IF ID EXE MEM WB
5 pages
Problemset 2solutions
No ratings yet
Problemset 2solutions
6 pages
Midterm Solutions Mar 30
No ratings yet
Midterm Solutions Mar 30
6 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Tutorial Module 4
No ratings yet
Tutorial Module 4
9 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
chapter4_2
No ratings yet
chapter4_2
34 pages
Leglite
No ratings yet
Leglite
5 pages
5th Exp
No ratings yet
5th Exp
9 pages
Lecture 9: Case Study - MIPS R4000 and Introduction To Advanced Pipelining
No ratings yet
Lecture 9: Case Study - MIPS R4000 and Introduction To Advanced Pipelining
23 pages
FPGA Design Final
No ratings yet
FPGA Design Final
4 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
F Capacitor, Find The Appropriate Value of
No ratings yet
F Capacitor, Find The Appropriate Value of
2 pages
Sheet 8
No ratings yet
Sheet 8
13 pages
220 PracticeProblems 8 MultiCycleDP Sol
No ratings yet
220 PracticeProblems 8 MultiCycleDP Sol
34 pages
Intro To Static Pipelining: CS252 Graduate Computer Architecture
No ratings yet
Intro To Static Pipelining: CS252 Graduate Computer Architecture
52 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
Assignment Questions
No ratings yet
Assignment Questions
3 pages
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
No ratings yet
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
31 pages
ACA Unit 3
No ratings yet
ACA Unit 3
17 pages
CSA HW 4
No ratings yet
CSA HW 4
2 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
SAP Report Group 2
0% (1)
SAP Report Group 2
14 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
Activity 1 - 2021 Hpca
No ratings yet
Activity 1 - 2021 Hpca
4 pages
Chapter 03 Solution
No ratings yet
Chapter 03 Solution
19 pages
2022HI400070G Nivedita
No ratings yet
2022HI400070G Nivedita
27 pages
Lec 11
No ratings yet
Lec 11
19 pages
ILP2 (Unit4)
No ratings yet
ILP2 (Unit4)
27 pages
CA_HW5 copy
No ratings yet
CA_HW5 copy
4 pages
05 Instruction+Level+Parallelism
No ratings yet
05 Instruction+Level+Parallelism
11 pages
Calculated Encryption
From Everand
Calculated Encryption
John C Livingstone
No ratings yet
Blue Fox: Arm Assembly Internals and Reverse Engineering
From Everand
Blue Fox: Arm Assembly Internals and Reverse Engineering
Maria Markstedter
No ratings yet
Touch Screen Based Wireless Industrial Automation
0% (1)
Touch Screen Based Wireless Industrial Automation
2 pages
Transforemer ratio bridge and self balancing bridge
No ratings yet
Transforemer ratio bridge and self balancing bridge
10 pages
Marshall 2014 Product Catalogue - Web
No ratings yet
Marshall 2014 Product Catalogue - Web
33 pages
BCA I I Digital System
No ratings yet
BCA I I Digital System
4 pages
A Textile-Based Capacitive Pressure Sensor
No ratings yet
A Textile-Based Capacitive Pressure Sensor
7 pages
Moisture Analysers, MB Series: Balances
No ratings yet
Moisture Analysers, MB Series: Balances
1 page
Courses in Electrical Engineering: Digital Electronics Chapter Six: Combinatory Logic
No ratings yet
Courses in Electrical Engineering: Digital Electronics Chapter Six: Combinatory Logic
29 pages
227 e Cr269 en Voith Geared Variable Speed Couplings
No ratings yet
227 e Cr269 en Voith Geared Variable Speed Couplings
14 pages
Matched Filter
No ratings yet
Matched Filter
2 pages
Lecture 2. Telephone System Installations
100% (1)
Lecture 2. Telephone System Installations
25 pages
Multis Lm55 56
No ratings yet
Multis Lm55 56
100 pages
Neural Computation: Mark Van Rossum Lecture Notes For The MSC/DTC Module. Version 06/07
No ratings yet
Neural Computation: Mark Van Rossum Lecture Notes For The MSC/DTC Module. Version 06/07
113 pages
Cable & Wiring Presentation
100% (1)
Cable & Wiring Presentation
71 pages
Final Project
No ratings yet
Final Project
6 pages
Flow Sensor
No ratings yet
Flow Sensor
6 pages
Zumtobel Mellwolight 5 LED Recessed Luminaire Data Sheet
No ratings yet
Zumtobel Mellwolight 5 LED Recessed Luminaire Data Sheet
2 pages
Satrocat
No ratings yet
Satrocat
255 pages
Set A
No ratings yet
Set A
5 pages
Homework 3
No ratings yet
Homework 3
3 pages
Reed Solomon Codes
No ratings yet
Reed Solomon Codes
6 pages
APH2
No ratings yet
APH2
106 pages
Lab 08
No ratings yet
Lab 08
10 pages
CD Player DAC Transport List
No ratings yet
CD Player DAC Transport List
88 pages
EC105
No ratings yet
EC105
17 pages
Experiment 3 (No Load Blocked Rotor Test On Single Phase Im)
No ratings yet
Experiment 3 (No Load Blocked Rotor Test On Single Phase Im)
5 pages
Lm2 Lm6 English
No ratings yet
Lm2 Lm6 English
24 pages
General Description: High-Speed CAN Transceiver With Standby Mode
No ratings yet
General Description: High-Speed CAN Transceiver With Standby Mode
27 pages
GDU 8 Channel Data Acquisition Unit 100-120 V 60 HZ
No ratings yet
GDU 8 Channel Data Acquisition Unit 100-120 V 60 HZ
2 pages
1500array SM
No ratings yet
1500array SM
28 pages

HW3 Sol PDF

Uploaded by

HW3 Sol PDF

Uploaded by

Advanced Computer Architecture CMSC 611

Due in class Oct 17th, 2012

Loop: LD R2, 0(R7) ----- 0

Data Dependency – 1 & 2, 1 & 4 and 1 & 5 on R1

Name Dependencies- Anti-dependence between 1 and 3 because of R2

Output dependence between 2 and 4 because of R4

Output dependence between 0 and 3 because of R2

Name dependencies can be resolved by renaming registers R2 and R4 with temporary

Loop: LD R2, 0(R7) ----- 0

Unrolling it twice is sufficient to remove all the stalls.

Loop: LD F0, 0(r0) 1

FU type cycles in EX Number of FUs

Assume the following.

You might also like