0% found this document useful (0 votes)
467 views13 pages

Cse320 Final Exam Practice Solutions

This document provides practice questions for a CSE320 final exam on single-cycle and multi-cycle datapaths. It includes instructions to add to the datapaths, such as load/store and branch instructions. It also asks how to calculate clock cycle times and determine control signals when these new instructions are added. Short answer questions cover topics like pipelining, performance, and cache modifications.

Uploaded by

samad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
467 views13 pages

Cse320 Final Exam Practice Solutions

This document provides practice questions for a CSE320 final exam on single-cycle and multi-cycle datapaths. It includes instructions to add to the datapaths, such as load/store and branch instructions. It also asks how to calculate clock cycle times and determine control signals when these new instructions are added. Short answer questions cover topics like pipelining, performance, and cache modifications.

Uploaded by

samad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CSE320FinalExamPracticeQuestions

SingleCycleDatapath/MultiCycleDatapathAddinginstructions
Modifythedatapathandcontrolsignalstoperformthenewinstructionsinthecorrespondingdatapath.
Usetheminimalamountofadditionalhardwareandclockcycles/controlstates.
Remember:

Whenaddingnewinstructions,don'tbreaktheoperationofthestandardones.
AvoidaddingALUs,adders,RegFiles,ormemoriestothedatapath
YoucanaddMUXes,logicgates,etc.buttrytodominimally.(thesecostintermsofarea,cycle
time,etc)

a. LoadWordRegister(usesRinstructionformat)
lwrRt,Rd(Rs)#Reg[Rt]=Mem[Reg[Rd]+Reg[Rs]]
b. Add3operands(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),rx(5),(6bitsnotused))
add3Rd,Rs,Rt,Rx#Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]
c. AddtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
addmRd,Rt,Offset(Rs)#Reg[Rd]=Reg[Rt]+Mem[signextendedoffset+Reg[Rs]]
d. BranchonlessthanorEqual(usesIinstructionformat)
blezRs,label#ifReg[Rs]<0,PC=PC+4+(signextendedoffset<<2)
e. BranchEqualtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
beqmRd,Rt,Offset(Rs)#ifReg[Rt]=Mem[Offset+Reg[Rs]],PC=PC+4+Reg[Rd]
f. BranchEqualto0toImmediate(usesRinstructionformat)
beqzi(Rs),Label#ifMem[Reg[Rs]]=0,thenPC=PC+(signextendedoffset)
(NOTE:ThisisnotPC+4,andnotshiftedby2)
g. StoreWordandIncrement
swincRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]+4
h. StoreWordandDecrement
swdecRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]4
Whatifyouweretoadd(g)and(h)simultaneouslytothedatapaths?
DatapathTiming
1. Calculatethedelayinthemodifieddatapathswhenperforminginstructionsabove.Assumethe
followingdelays:

Memory:200ps
RegisterFilesAccess(READ/Write):50ps
ALUandadders:100ps
LogicGatesandMultiplexors:1ps
Allothertimesarenegligible

2. CalculatetheminimalclockcycletimeifallofthenewinstructionswereaddedintheSingleand
Multicyclecases.

OtherDatapathQuestions
GivenMIPScode,canyoudetermine..

WhatishappeningatclockcycleXintheSingleCycleDatapath?OrwhatcycleisoperationX
happening?
WhatishappeningatclockcycleXintheMultiCycleDatapath?OrwhatcycleisoperationX
happening?
Howmanycyclesitwilltaketoexecutethecode?
Canyouidentifythesignals(controlandvalues)inthedatapathforagivenclockcycle?
Andotherquestionsofthisnature.

ShortAnswerMiscQuestions
1. Whatistheprimaryadvantageoffixedsizedopcodes?

Instruction decode is faster and more efficient. Control does not need to determine the
length/ position of the opcode in the instruction.

2. Willaspeedupof20on50%ofaprogramresultinanoverallspeedupofatleast2times?
Explainyouranswer
ThenewoverallspeedupiscalculatedaccordingtoAmdahlsLaw.Foranoverallspeedupof2,
thenewexecutiontimemustbe50%orlessoftheoldexecutiontime.
No.Thenewoverallexecutiontime=50%+50%/20=52.5%oftheold.

3. Whatarethe5componentsofamoderncomputersystem(Hint:Twoofthemcanbecombined
andcalledtheprocessor)
Datapath+control=processor,memory,input,output
4. Whatisastoredprogramcomputer?
Acomputerwheretheinstructionoftheprogramarestoredinmemory,theCPUisassignedthe
taskoffetchingtheinstructionfrommemory,decodingthemandexecutingthem.

5. TrueorFalse:
Programexecutiontimeincreasewhentheinstructioncountincrease(IC)TRUE
Inaload/storearchitecture,theonlyinstructionsthataccessmemoryareloadandstore
types.TRUE
Morepowerfulinstructionsleadtohigherperformancesincethetotalnumberof
instructionsexecutedissmallerforagiventaskwithmorepowerfulinstructions.FALSE
Anaddoperationhas3operands(2inputand1output),thereforeaddinstructions
mustbe3addressinstructions.FALSE
6. Inasystemexecutingjobs,whenisthroughput=1/latency?
Throughputofamachineisthenumberofinstructionswhichareexecutedpersecond.Latency
isthelengthoftimeperexecutionofaninstruction.

Throughput=1/latencywhenasystemisexecutingonetaskatatimeeg.Inasingleormulti
cycledatapath

Pipleliningincreasesthroughputsincemultipleinstructionsareexecutingsimultaneously.
Thereforethelatencyofeachinstruction(onaverage)isshorterthanthelengthofan
instruction.

7. Whataretheadvantagesanddisadvantagesofwritethroughandwritebackcache
modificationsinsharedmemorysystems?
Writethroughwillslowthesystemdown,takingmoretimeforeachwrite.However,witha
writebackcache,theremaybedatacontentionsincethemultiplereferencescouldbe
referencingthedatawhenitisdirty.

Pipelining

1. Whatarethemainbenefitsanddisadvantagesofpipelining?
2. Namethetypeofpipelininghazards.Definehowandwhentheycanoccurinsystems(in
general).Definehow/whentheyoccurinMIPS.GiveaMIPSdatapathorcodeexampleofeach
type.
3. Manyprocessorshave5or6stagepipelines.AtypicalvaluefortheCPI(cyclesperinstruction)in
suchprocessorsisintherangeof1.0to1.5.Doesitmeanthatthelatencyofexecutionofmost
instruction1or2clockcycles?Why,orwhynot?
4. Whydoconditionalbranchesimpacttheperformanceofapipelinedimplementation?
5. Brieflydescribe2solutionstoreducetheperformanceimpactofconditionalbranchinstructions
inapipelinedimplementation.
6. Givensequencesofinstructionsdeterminetheforwardingpathsandrequiredstalls

Performance

1. Thecomputerspends82%ofthetimecomputingand18%waitingforthedisk.Theinstruction
mixandtheaveragecyclesperinstruction(CPI)foreachtypeis:
Type

Instruction %

CPI

int

40%

FP

30%

Other

30%

a. Consider3modificationstothecomputer.Computethespeedupforeach.
i.
Theprocessorisreplacedwithanewonethatreducesthetotalcomputationtime
by35%.
Speedup = 1 / ((1 - 0.82) + 0.82 * 0.65) = 1.40

ii.
Thediskisreplacedwithasolidstatedevicethatreducesthediskwaitingtimeby
85%.
Speedup = 1 / ((1 - 0.18) + 0.18 * 0.15) = 1.18

iii.
Theprocessorisreplacedwithanewonethathasimprovedfloatingpoint
performance.TheaveragefloatingpointCPIisreducedto3;allotheraspectsare
unchanged.

Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2.5


Average CPI (enhanced) = 0.40 * 1 + 0.30 * 3 + 0.30 * 2 = 1.9
Speedup (computation) = 2.5 / 1.9 = 1.315
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.315) = 1.244

b. Whichmodificationgavethebestspeedup?

Modification(i)providesthebestspeedup.

c. Forthetwomodificationsinpart(i)thatdidnotresultinthebestspeedup,isitpossiblefor
themtoachievethespeedupachievedbythemodificationinpart(ii)?Showyourworkand
explainyouranswer.
i. Aninfineitelyfastdask:Speedup=1/(10.18)=1.22whichisstillslowerthani
ii. IfFPonly1clockcycle:
Average CPI (enhanced) = 0.40 * 1 + 0.30 * 1 + 0.30 * 2 = 1.3
Speedup (computation) = 2.5 / 1.3 = 1.92
Speedup = 1 / ((1 - 0.82) + 0.82 / 1.92) = 1.647

2. YouhavetwoRiSC16processorsXandZ,withthefollowingcharacteristics.Theyareboth
multicycleprocessors,inwhichaninstructionexecutesinavariablenumberofprocessor
cycles.XandZexecutevariationsonthesameinstructionset(RiSC)isthefollowingway:
a. ProcessorXimplementsthebaseinstructionset,includingLUI.ProcessorXimplements
multiplicationinsoftware,meaningthereisnotMULTinstruction.
b. ProcessorZeliminatestheLUIinstructioninfavorofaMULTinstruction,gettingLUI
functionalityfromLW.
c. ProcessorZsMULTinstructionusestheALUover&overagaininaloop,performing
shiftsandconditionaladds,andrequires80processorcyclespermultiply.
d. ExecutingoneMULTinstructiononProcessorZeliminatesonaverage30instructions
thatwouldbeexecutedonProcessorXwhenimplementedinasoftware.However,
ProcessorZthenneedadditionalALUfunctionalitywhichincreasingtheALUscritical
pathfrom10nsto12ns.
Also,Assumethefollowing:
- Cacheread/write:10ns
- Registerfileread/write:8ns
- ALUoperation:10nsforprocessorX,12nsforprocessorZ

Assumethefollowingdistributionofinstructiontypes(assumethatLUIrequires3cycles):

MULT
LUI
LW
SW
RType
BEQ

ProcessorX
0%
5%
20%
10%
45%
20%

ProcessorZ
5%
0%
25%
10%
40%
20%

Forexample,ifprocessorZexecutes5MULTinstructionsoutofevery100.ForeachMULT
instruction,processorXexecutesanadditional30instructions.
a. Comparetheexecutiontimesofthetwoprocessors.
Executiontime=TIC=Cycletime*InstructionCount*AverageCPI
Exectime(x)=Tx*Ix*Cx
Exectime(z)=Tz*Iz*Cz
Tx=10ns;Tz=12ns

IfIz=100,Ix=95+(5*30)=245
CPIforeachinstructiontype:
MULT=80cycles,LUI=3,LW=5,SW=4,RTYPE=4,BEQ=3

Therefore:
AverageCPIforX=Cx=(0.05*3)+(0.2*5)+(0.1*4)+(0.45*4)+(0.2*3)=3.95
AverageCPIforZ=Cz=(0.05*80)+(0.25*5)+(0.1*4)+(0.4*4)+(0.2*3)=7.85

Comparingexecutiontimes:
Exectime(x)=10ns/c*3.95c/i*245i=9677.5ns
Exectime(z)=12ns/c*7.85c/i*100i=9420ns

ProcessorZwiththemultiplyinstructionisabout1.03timesfasterthanprocessorXforthis
instructionmix.
b. AtwhatclockspeedforprocessorZarethetwodesignsequalinperformance?

EquatingthetwoexcutiontimesandsolvingforTz
10ns*3.95cpi*245instructions=Tz*7.85cpi*100instructions
Tz=(10ns*3.95*245)/(7.85*100)
Tz=12.33ns

ForsmallerTz(fasterclock),processorZhasbetterperformance;forlargerTz(slowerclock),
processorXhasbetterperformance.
c. (moredifficult)AssumingtheoriginalALUlatencyforprocessorZ(12ns),howfastwould
yoursoftwareemulatedmultiplyhavetobe(onaverage)forprocessorXtobejustasfastas
processorZ?Inotherwords,howmanyinstructionswouldprocessorXexecuteinplaceof1
MUL?
RememberthatIxwasdefinedtobe95+(#ofmultiplications)*(costofeach)
WefirstneedtofindtheinstructioncountofprocessorXnecessaryforequalperformance.
10ns*3.95*Ix=12ns*7.85*100
whichimpliesIx=238.5
Totalnumberofmultiplyemulateinstructionsis(238.595)=143.5
Thereforenumberofinstructionspermultiply=143.5/5=~28instructions

3. Twoimportantparameterscontroltheperformanceofaprocessor:cycletimeandcyclesper
instruction.Thereisanenduringtradeoffbetweenthesetwoparametersinthedesignprocess
ofmicroprocessors.Whilesomedesignersprefertoincreasetheprocessorfrequencyatthe
expenseoflargeCPI,otherdesignersfollowadifferentschoolofthoughtinwhichreducingthe
CPIcomesattheexpenseoflowerprocessorfrequency.Considerthefollowingmachines,and
comparetheirperformanceusingthefollowinginstructionmix:25%loads,13%stores,47%ALU
instructions,and15%branches/jumps.Assumetheunmodifiedmulticycledatapathandfinite
statemachine.
M1:Themulticycledatapathisdesignedwitha1GHzclock
M2:AmachinelikeM1exceptthatregisterupdatesaredoneinthesameclockcycleasa
memoryreadofALUoperation.Thusinthefinitestatemachine,states6and7andstates3
and4arecombined.Thismachinehasan3.2GHzclock,sincetheregisterupdateincreases
thelengthofthecriticalpath.
M3:AmachinelikeM2exceptthateffectiveaddresscalculationsaredoneinthesameclock
cycleasamemoryaccess.Thusstates2,3,and4canbecombined,ascan2and5,aswellas
6and7.Thismachinehasa2.8GHzclockbecauseofthelongcyclecreatedbycombining
addresscalculationandmemoryaccess.
Findoutwhichofthemachinesisfastest.Arethereinstructionmixesthatwouldmakeanother
machinefaster,andifso,whatarethey?
IntheoriginalmulticycledatapaththeCPIforeachinstructionisasfollows:

Loads:5cycles

Stores:4cycles

ALU:4cycles

Branch/Jumps:3cycles

PerformanceM1:

AverageCPI=.25*5+.13*4+.47*4+.15*3=4.1

CycleTime=(CPI*#instructions)/clockrate=4.1I/1GHz=4.1I*109seconds

PerformanceM2:

Loadsshortento4cycles

ALUsshortento3cycles

AverageCPI=.25*4+.13*4+.47*3+.15*3=3.38

CycleTime=(CPI*#instructions)/clockrate=3.38I/3.2GHz=1.06I*109seconds

PerformanceM3:

Loadsshortento3cycles

Storesshortento3cycles

ALUsshortento3cycles

AverageCPI=.25*3+.13*3+.47*3+.15*3=3

CycleTime=(CPI*#instructions)/clockrate=3I/2.8GHz=1.07I*109seconds

M2isfastest.


M1canneverbefasterthanM2,evenifalltheinstructionsarebranchinstructions,theCPIwill
be3forall3cases,andtheclockrateisfasterontheother2processors.

M3canbefasterthanM2,ifallinstructionloadsorallstoresthen

Ex:
M2:AverageCPI=1*4+0*4+0*3+0*3=4

M3:AverageCPI=1*3+0*3+0*3+0*3=3

M2CycleTime=(CPI*#instructions)/clockrate=4I/3.2GHz=1.25I*109seconds

M3CycleTime=(CPI*#instructions)/clockrate=3I/3.2GHz=1.07I*109seconds

ReviewAdderandALUCreationandBuildinglargerALUsfromunits
Considerthe4bitALUbelowwhichcanperformthefollowing5operations:add,sub,AND,ORand
negateB.
InputsareA={A3,A2,A1,A0},B={B3,B2,B1,B0},andCin.OutputsareresultR={R3,R2,R1,R0}andCout.Numbers
arein2scomplementform.Fillinthetablebelow,foreachoperation,whatthevaluesofthecontrol
signalsshouldbe.Indicatedontcareswhereappropriate.

Operation
Add
Sub
OR
AND
NegateB

m1
0
0
1
0
0

M0
0
0
0
1
0

DigitalLogic
1. UsingBooleanalgebra,provethefollowing:

Cin
0
1
X
X
0

BINV
0
1
0
0
0

Az
1
1
1
1
1

a. bd+cd=((bd)+(cd))
bd+cd=(bd)(cd)

bd+cd=(b+d)(c+d)

bd+cd=bc+cd+bd+dd
bd+cd=bc+cd+bd+0
bd+cd=bc(d+d)+cd+bd
bd+cd=bcd+bcd+cd+bd
bd+cd=cd(1+b)+bd(c+1)
bd+cd=bd+cd

b. abc+bcd+abd=abc+abd

DeMorgans

DeMorgans

Distributive
Complementary

Null
Commutative
Distributive
Null

abc+bcd(a+a)+abd=abc+abd
abc+abcd+abcd+abd=abc+abd
abc(1+d)+abd(c+1)=abc+abd
abc+abd=abc+abd

c. a+a(ab+bc)=a+b+c
a+a((ab)(bc))=a+b+c

a+a((a+b)(b+c))=a+b+c

a+a(ab+bb+ac+bc)=a+b+c
a+a(ab+0+ac+bc)=a+b+c

a+a(ab+ac+bc)=a+b+c

a+aab+aac+abc=a+b+c

a+ab+ac+abc=a+b+c

a+a(b+c+bc)=a+b+c

a+b+c+bc=a+b+c

a+b+c(1+b)=a+b+c

a+b+c=a+b+c

Null
Distributive
Commutative
Null

DeMorgans
DeMorgans
Distributive
Complementary

Distributive
Idempotence
Idempotence

NoName
Null

2. Considerthefollowingfunction:z(x3,x2,x1,x0)=x3x2+x3x1x0+x3x2x0+x3x2x0
a. Howmanyliteralsdoeszcontain?
11
b. Isz,minimal?Ifnot,findtheminimalexpressionusingBooleanalgebra.
No.

x3x2+x3x1x0+x3x2x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2(1+x0)+x3x1x0+x3x2x0
x3x2+x3x2x0+x3x1x0+x3x2x0
x3x2+x3x1x0+x2x0

c. Findtheequivalentsumofminterms(SOP)forz(usingmnotation)

z(x3,x2,x1,x0)=m(4,5,6,7,11,12,14,15)

d. Findtheequivalentproductofmaxterms(POS)forz(usingMnotation)

z(x3,x2,x1,x0)=M(0,1,2,3,8,9,10,13)

3. Youwererecentlyhiredasanengineerinacompanythatdesignsalarmsystemscustommade
tomeetthecustomersspecifications.Youareaskedtodesignasystemthatusestheinputsof
threesensorsA,B,andC.Thealarmshouldgooff(activated)whenthefollowingcriteriaare
met:

WhenAisoff,or
WhenBisonandCisoff,or
WhenbothAandCareon.

a. Writethetruthtableforthefunction

Alarm

b. WritetheBooleanexpressioninProductofSums(POS)form.
A+B+C
c. Draw/Implementthefunctionusing2selectorMUXgates.

a.

d. Draw/Implementthefunctionusing2levelNANDNANDgates.

Thestraightforwardsolutionusingminterms/SOPexpression:

ABetterSolutionusingDemorganslaw:A+B+C=((A+B+C))=(ABC)

4. Findtheminimal2levelimplementationusingNORNORgates,ofasystemwithtwo2bit
inputs(A={a1,a0}&B={b1,b0})whichoutputthefollowing.IfA+Biseven,thentheoutputis
theirproduct.IfA+Bisodd,thentheoutputistheirsum.
a1 a0 b1 b0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

A+B
Z
Z3 Z2 Z1 Z0
(decimal) (decimal)
0
0
0 0 0 0
1
1
0 0 0 1
2
0
0 0 0 0
3
3
0 0 1 1
1
0
0 0 0 0
2
2
0 0 1 0
3
3
0 0 1 1
4
3
0 0 1 1
2
0
0 0 0 0
3
3
0 0 1 1
4
4
0 1 0 0
5
5
0 1 0 1
3
3
0 0 1 1
4
3
0 0 1 1
5
5
0 1 0 1
6
9
1 0 0 1

Z3=a1a0b1b0
Z2=a1a0b1+a1a0b1b0=a1a0b1+a1b1b0
Z1=a1b1b0+a1a0b0+a1a0b1+a1a0b1+a1b1b0

Z0=(a0+b0)(a1+a0+b1

5. Implementthefunctionalityofa2inputDecoderusingminimalAND,ORandNOTgates.
Decoderstakeabinarynumberandmapthisvaluetoanoutputline.2inputvalues,means4
differentvalues(4outputs)

S1
0
0
1
1

S0
0
1
0
1

F3
0
0
0
1

F2
0
0
1
0

F1
0
1
0
0

F0
1
0
0
0

6. Implementa7segmentController.(truthtable,Booleanexpressions,gatelogic).Practicewith
anycombinationoflogicunits.

q0
q1
q2
q3

A
B
C
D
E
F
G

7-segment
Controller

7. Implementthe7segmentusinga4selectorDEMUXandOrgates.

You might also like