0% found this document useful (0 votes)
46 views7 pages

CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)

This document contains instructions for an exam for the CS398 course. It provides the date, time and location for a second chance exam. It lists the sections that students can choose from to take the exam. It also includes instructions for the exam such as the time limit, materials allowed, and a warning to show work for credit. The exam contains 3 questions worth a total of 100 points. Question 1 is on pipelining for 40 points. Question 2 is on cache analysis for 25 points. Question 3 involves rewriting code to optimize cache performance for 20 points. Students are to indicate which questions they want graded on a scantron form.

Uploaded by

vs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views7 pages

CS398 Exam 3, 2 Chance December 17th, 2012: Circle The Section That Attend (So We Can Hand Back Your Exam)

This document contains instructions for an exam for the CS398 course. It provides the date, time and location for a second chance exam. It lists the sections that students can choose from to take the exam. It also includes instructions for the exam such as the time limit, materials allowed, and a warning to show work for credit. The exam contains 3 questions worth a total of 100 points. Question 1 is on pipelining for 40 points. Question 2 is on cache analysis for 25 points. Question 3 involves rewriting code to optimize cache performance for 20 points. Students are to indicate which questions they want graded on a scantron form.

Uploaded by

vs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CS398 Exam 3, 2nd Chance

December 17th, 2012

Name:
NETID:
Circle the section that attend (so we can hand back your exam).
Monday

Tuesday

AYA (1-3pm) Craig

AYE (9-11am) Maria

AYB (2-4pm) Jon

AYF (10am-noon) Ting

AYC (3-5pm) Michael

AYG (11am-1pm) Ting

AYD (4-6pm) Ting

AYH (noon-2pm) Jon


AYI (1-3pm) Ryan
AYJ (2-4pm) Michael
AYK (3-5pm) Michael
AYL (4-6pm) Ryan
AYM (5-7pm) Ryan

This exam has 7 pages; the final sheet is provided as a reference to you.
You have 120 minutes.
No calculators or other electronics are allowed. You may bring one 8.5 x 11 sheet of handwritten notes.
To make sure you receive credit, please write clearly and show your work.
We will not answer questions regarding course material.
The 2nd chance test is done at the granularity of the 3 questions; if you choose to have a
question graded, we will grade ALL PARTS of that question and use that to update your score.

Question

Maximum

40

25

20

Total

100

Your Score

Question 1: Pipelining (40 points)


Consider the 6-stage pipeline shown below; in this pipeline, the MEM stage occupies two pipeline stages
(MEM1, MEM2) with loads completing in the MEM2 stage (arithmetic instructions complete in the EX stage as
normal). Full bypassing is provided. Assume that all branches and jumps are predicted as not-taken.
Conditional branches and indirect branches (e.g., jr) are resolved in the EX stage. Unconditional jumps are
resolved in the ID stage. Assume mul is a single real instruction that is executed by the ALU and completes in
the EX stage.

MEM1/MEM2
WB


EX/MEM1
WB
M
WB
Control
MEM2/WB


EX
M
WB
IF/ID

PC


Read
a
data 1


b
Read
register 1
Mem1Read Mem1Wr
c


Addr
Instr
d
Read
ALU


ALUSrc
register 2
Zero
forwA


Write
Read
Result
Address
a
register
data 2

Instruction
b
MemToReg
Data
0
memory
memory
Write
Registers
c


1
data
d


Write Read
Instr [15 - 0]
1
data
data
RegDst


Extend
forwB
0
Rt


0
Rd


1
EX/MEM1.
Rs
RegisterDst


MEM1/MEM2.


RegisterDst
Forwarding


Unit




MEM2/WB.RegisterDst



For all of this question consider the MIPS assembly code on the following page. Corresponding C code is
!
shown below.
!
Part (a) Annotate the MIPS assembly to indicate all of the true data dependences. (5 points)
!
For the next two parts, use your scantron form. We recommend that you first answer on the next page and then
!
copy your answers to the scantron form. In any case, we wont give credit for any answers not on the scantron.
!
!
Part (b) Indicate which instructions will be stalled: a) no stall b) 1-cycle stall c) 2-cycle stall (5 points)
!
Part (c) Indicate how each of the forwarding muxes (forwA, forwB) will set in the cycles when each
!
instruction is in the EX stage. Answer a-d as labeled in the diagram above. (10 point)
!
!
void !
typedef struct pixel {!
!
map (pixel_t *pixel, int scale) {!
int x, y, z;!
!
while (pixel != NULL) {!
struct pixel *next;!
pixel->z = scale*pixel->x +!
} pixel_t;!
!
pixel->y;!
!
pixel = pixel->next;
!
}!
}!

!
Question 1:! Pipelining, cont.

Mark this box if you want this question graded.


It will replace your score for this question, even if lower.

map:

!beq !$a0, $0, done!

loop:

!lw !$t1, 0($a0)

stall

!forwA

!mul !$t1, $t1, $a1

____1

!____10 !____11!

!lw !$t2, 4($a0)

____2

!____12 !!

!add !$t2, $t2, $t1

____3

!____13 !____14!

!sw !$t2, 8($a0)

____4

!____15 !____16!

!lw !$a0, 12($a0)

____5

!____17 !!

!bne !$a0, $0, loop

____6

!____18 !____19!

done:

!forwB!

!jr !$ra!
Part (e) Re-schedule/re-write the function to
make it faster. Faster code will achieve more
points, but your answer must fit in the space below.
(10 points)

Part (d) Compute how many cycles each loop


iteration takes on average. Explain your answer for
partial credit. (10 points)

Question 2: Cache Analysis (25 points)


For an 8KB 2-way set-associative, write-back cache with 32B blocks on a machine with 32-bit address
spaces (both virtual and physical) and no hardware prefetching, consider the following code:
struct hoof { int has_horseshoe, shoe_size; };
struct unicorn {
int horn_length;
char *name;
struct hoof *hooves[4];
// this is an array of pointers
};
struct unicorn unicorns[1000];
// thats a whole lotta unicorns
int longest_horn = 0, biggest_shoe = 0;
for (int i = 0 ; i < 1000 ; i ++) {
if (unicorns[i].horn_length >= longest_horn) {
longest_horn = unicorns[i].horn_length;
}
for (int j = 0 ; j < 4 ; j ++) {
if (unicorns[i].hooves[j]->has_horseshoe &&
(unicorns[i].hooves[j]->shoe_size >= biggest_shoe)) {
biggest_shoe = unicorns[i].hooves[j]->shoe_size;
}
}
}

Assume that everything is in registers, except the data structure unicorns.

Do not write here. Really.

Question 2: Cache Analysis (25 points)

Mark this box if you want this question graded.


It will replace your score for this question, even if lower.

Part (a) Compute the MINIMUM number of cache misses per outer-loop iteration that is possible for the
code on the previous page. Explain how you computed it and the assumptions you made! (15 points)

Part (b) Compute the MAXIMUM number of cache misses per outer-loop iteration that is possible for the
code on the previous page. Explain how you computed it and the assumptions you made! (10 points)

Question 3: Cache-aware Programming (20 points)


Mark this box if you want this question graded.
It will replace your score for this question, even if lower.
Rewrite the following codes to optimize cache performance on a system with hardware stream prefetching
(i.e., if you are fetching sequentially no software prefetching is necessary) and a single-level cache. The
cache is a 2-way set associative 16KB cache with 32B blocks. Software prefetch syntax, if you choose to
use it, is shown on the right.
void __builtin_prefetch(const void *addr,
! unsigned
rw,
unsigned
locality);!
!

a) (10 points)
!
addr: the address of the memory to prefetch.
#define N 5000!
rw: (optional) a compile-time constant 1 or 0; 1: the
!
program anticipates writing the data soon, 0 (default) in
for (int i = 0 ; i < N ; i ++) {!
the near term, the program expects to only read the data.
for (int j = 0 ; j < N ; j += 2) {! locality: (optional) a compile-time constant from 0 to 3.
0: the data has no temporal locality, so need not be left in
A[i][j] = A[i][j+1];!
the cache after the access, 3: (default) the data has a high
B[j][i] = B[j+1][i];!
degree of temporal locality and should be retained in all
levels of cache if possible. Use 0 or 3 based on the
}!
expected reuse.
}!
!
!
!
!
!
!
!

Question 3: Cache-aware Programming (20 points), cont.


b) (10 points)
!
#define N 5000!
double A[N][N][N], B[N][N], C[N][N];!
!
for (int i = 0 ; i < N ; i ++) {!
for (int j = 0 ; j < N ; j ++) {!
double temp = 0.0;!
for (int k = 0 ; k < N ; k ++) {!
temp += B[i][0] * A[k][j][i];!
}!
C[i][j] = temp;!
}!
}!

You might also like