0% found this document useful (0 votes)
196 views

Module 3.3 - Problems On Performance

Here are the steps to solve this problem: 1) R1 clock rate = 2 GHz = 2 billion cycles/second 2) R1 instruction time = 1 billion instructions / 2 GHz = 0.5 seconds 3) R2 clock rate = 1 GHz = 1 billion cycles/second 4) R2 CPI = 1.5 5) R2 instruction time = 1 billion instructions * 1.5 CPI / 1 GHz = 1.5 seconds 6) R1 is faster than R2. 7) R1 takes 0.5 seconds while R2 takes 1.5 seconds to execute the same program. 8) Therefore, R1 is 3 times (1.5/

Uploaded by

Kshitiz Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views

Module 3.3 - Problems On Performance

Here are the steps to solve this problem: 1) R1 clock rate = 2 GHz = 2 billion cycles/second 2) R1 instruction time = 1 billion instructions / 2 GHz = 0.5 seconds 3) R2 clock rate = 1 GHz = 1 billion cycles/second 4) R2 CPI = 1.5 5) R2 instruction time = 1 billion instructions * 1.5 CPI / 1 GHz = 1.5 seconds 6) R1 is faster than R2. 7) R1 takes 0.5 seconds while R2 takes 1.5 seconds to execute the same program. 8) Therefore, R1 is 3 times (1.5/

Uploaded by

Kshitiz Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Performance of Processor

Processor Performance - Terminologies


 Clock Rate CR
 Cycle Count CC
 Cycle Time CT
 Cycles Per Instructions CPI
 Instruction Count IC
 Million Instructions Per Second MIPS
 Millions FLoating point Operation Per Second MFLOPS
Machine Clock Rate

 Clock Rate (CR) (MHz, GHz) is inverse of Clock Cycle (CC) time (clock period)
CC = 1 / CR

one clock period


10 nsec clock cycle => 100 MHz clock rate
5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate Clock Rate (CR) is measured in MHz, GHz
1 nsec clock cycle => 1 GHz clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate
Performance of computer
 Clock is used to synchronize the working of a unit
 Clock Cycle: Discrete time intervals at which the events happen in a
computer system
 The length of each clock cycle is considered as clock period.

 A clock period is also called tick, clock tick etc.


Computer Clock
 The clock rate is the inverse of the clock cycle time.

 ie, Clock Rate = 1/Clock Cycle Time

 The clock cycle time is the amount of time for one clock period to elapse
(e.g. 5 ns).
 Question

 If a computer has a clock cycle time of 5 ns, What is the clock rate?

Type your answer in chat box


Computer Clock
 The clock rate is the inverse of the clock cycle time.
 Clock Rate = 1/Clock Cycle Time
 The clock cycle time is the amount of time for one clock period to
elapse (e.g. 5 ns).
 Question
 If a computer has a clock cycle time of 5 ns, What is the clock rate?
Performance matrix of a computer system
 Execution time: It is the time taken to finish a task

 CPU execution time is a combination of user CPU time and system CPU time

 Throughput: It is defined as the total quantity of completed work in a


specific period of time.
Processor Performance Metrics
 Execution time: It is the time taken to finish a task
 Response time: the time between the start and the completion of a task (in time
units)
 Throughput: the total amount of tasks done in a given time period (in number of
tasks per unit of time)

 • Example: Car assembly factory:

 4 hours to produce a car (response time)


 6 cars per an hour produced (throughput)
Defining (Speed) Performance
 Normally interested in reducing
 Response time ( execution time) – the time between the start and the completion of a task
 Important to individual users
 Thus, to maximize performance, need to minimize execution time

performanceX = 1 / execution_timeX

If X is n times faster than Y, then


performanceX execution_timeY
-------------------- = --------------------- =n
performanceY execution_timeX
How many times faster is machine A?
 Problem:
 machine A runs a program in 20 seconds
 machine B runs the same program in 25 seconds
 how many times faster is machine A than machine B??

Type your answer in chat box


If X is n times faster than Y, then
performanceX execution_timeY
-------------------- = --------------------- =n
performanceY execution_timeX
How many times faster is machine A?
 Problem:
 machine A runs a program in 20 seconds
 machine B runs the same program in 25 seconds
 how many times faster is machine A?
Performance Factors
 Want to distinguish elapsed time and the time spent on our task
 CPU execution time (CPU time) – time the CPU spends working on a task
 Does not include time waiting for I/O or running other programs

CPU execution time = # CPU clock cycles x clock cycle time


for a program for a program
or
CPU execution time # CPU clock cycles for a program
= -------------------------------------------
for a program clock rate
 Can improve performance by reducing either the length of the clock cycle or
the number of clock cycles required for a program
Performance Equation
 Our basic performance equation is then

CPU time = Instruction_count x CPI x clock_cycle


or
Instruction_count x CPI
CPU time = -----------------------------------------------
clock_rate
Factors that affect Performance
 These equations separate the three key factors that affect
performance
 Can measure the CPU execution time by running the program
 The clock rate is usually given
 Can measure overall instruction count by using profilers/ simulators
without knowing all of the implementation details
 CPI varies by instruction type and ISA implementation for which we
must know the implementation details
Cycles Per Instruction (CPI)

 Computing the CPI is done by looking at the different types of instructions


and their individual cycle counts
n
CPI =  (CPIi x ICi)
i=1

 Where ICi is the count (percentage) of the number of instructions


of class i executed
 CPIi is the (average) number of clock cycles per instruction for
that instruction class
 n is the number of instruction classes
Effective CPI
 The overall Effective CPI for an application can then be calculated as

 Where ICi is the count (percentage) of the number of instructions of class i executed
 ICtotal is total number of machine instructions executed
 CPIi is the (average) number of clock cycles per instruction for that instruction class
 n is the number of instruction classes
F – Frequency
T- Processor Execution Time
Ic- Instruction Count
MIPS : millions of instructions per second
 For example, a program that executes 3 million instructions in 2
seconds has a MIPS rating of 1.5 ‰
 Advantage : Easy to understand and measure ‰
 Disadvantages : May not reflect actual performance, since simple
instructions do better.
MFLOPS - millions of floating point operations per
second
 MFLOPS : millions of floating point operations per second „

 For example, a program that executes 4 million fp. instructions in


5 seconds has a MFLOPS rating of 0.8 „
 Advantage : Easy to understand and measure „
 Disadvantages : Same as MIPS, only measures floating point
Amdahl’s law uses factors -Enhancement.
 Fraction enhanced using N processors
 Speedup before and after enhancement
Problem
 Suppose that we are considering developing a parallel program to improve on an existing
sequential program and that we determine that 10% of the execution time of the sequential
program is spent in inherently sequential code. (We have to inspect the code to determine this.)
The remaining code can be parallelized, although we do not as yet know how many processors
would be optimal. What is the maximum possible speedup that could be obtained if we were to
develop a parallel version that used ten processors?

Max. Speedup=
Problem
 Suppose that we know that 20% of inherently sequential computation in the
problem of interest is made parallel. What is the least number of processors
that we need to use to obtain a speedup of 6.0?
Amdhal’s - Speedup Based Problem
 Consider a CPU used in Web servicing. We need to enhance 30% of the processor by increasing
the computation speed 10 times faster on computation process in web service applications.

Solution Fractionenhanced= f= 30% =0.3

Speedenhanced =SUf = 10

Speedupoverall = 1/(1-0.3)+(0.3/10)
= 1/0.7+0.03
=1/0.73
͌ 1.369
MODULE 1
Introduction and Overview of Computer Architecture
Problems to be solved
Problem 1
 Assume that # of instructions in the program is 1,000,000,000. Suppose
we have two implementations of the same instruction set architecture
(ISA). For some program,
 Machine A has a clock cycle time of 10 ns. and a CPI of 2.0
 Machine B has a clock cycle time of 20 ns. and a CPI of 1.2 „

 Which machine is faster for this program, and by


how much?
Problem 2
Solve the following
 CPU clock rate is 1 MHz ‰ Program takes 45 million cycles to
execute ‰ What’s the CPU time?

 CPU clock rate is 500 MHz „ Program takes 45 million cycles to


execute „ What’s the CPU time?
Problem 3
 You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The
following table gives instruction frequencies for Benchmark B, as well as how many cycles the
instructions take, for the different classes of instructions. Calculate the CPI and MIPS.
Problem 4
Problem 5
 Suppose a program (or a program task) takes 1 billion instructions to
execute on a processor running at 2 GHz. Suppose also that 50% of the
instructions execute in 3 clock cycles, 30% execute in 4 clock cycles, and
20% execute in 5 clock cycles. What is the execution time for the
program or task?

 Note: We have the instruction count: 109 instructions. The clock time can be
computed quickly from the clock rate to be 0.5×10-9 seconds. So we only need to to
compute clocks per instruction as an effective value:
Problem 6
Problem 7
 We want to compare the computers R1 and R2, which differ that R1 has the machine
instructions for the floating point operations, while R2 has not (FP operations are implemented
in the software using several non-FP instructions). Both computers have a clock frequency of
400 MHz. In both we perform the same program, which has the following mixture of
commands:
 a) Calculate the MIPS for the computers R1 and R2.
 b) Calculate the CPU program execution time on the computers R1 and R2, if there are 12000
instructions in the program?
Problem 8
 The clock of the processor runs at 200 MHz with 4.4 Cycles per
instruction. Compute the MIPS processor speed for the benchmark
in millions of instructions per second?
Problem 9
 Suppose that we are considering developing a parallel program to
improve on an existing sequential program and that we determine
that 20% of the execution time of the sequential program is spent
in inherently sequential code. (We have to inspect the code to
determine this.) The remaining code can be parallelized, although
we do not as yet know how many processors would be optimal.
What is the maximum possible speedup that could be obtained if
we were to develop a parallel version that used ten processors?
Problem 10
 Suppose that we know that fraction of inherently sequential
computation is 0.12 in the problem of interest. What is the least
number of processors that we need to use to obtain a speedup of
5.0?
Text Book(s) Friday, March 2
4, 2023

 David A. Patterson and . John L. Hennessy


―Computer Organization and Design-The
Hardware/Software Interface‖ 5th edition,
Morgan Kaufmann, 2011.

 Carl Hamacher, Zvonko Vranesic, Safwat Zaky,


Computer organization, Mc Graw Hill, Fifth
edition ,Reprint 2011.
Reference Books
 W. Stallings, Computer organization and
architecture, Prentice-Hall, 8th edition, 2009
Time for Discussion

Any Queries??
THANK YOU

You might also like