L-2 (Computer Performance)
L-2 (Computer Performance)
Computer Architecture
3
What do we measure?
Define performance….
5
Computer Performance: TIME, TIME,
TIME!!!
• Response Time (elapsed time, latency):
• How long does it take to complete (start to finish) a task? Individual user
• Eg: how long must I wait for the database query? concerns…
8
Relative Performance
9
Relative Performance
10
Execution Time
• Elapsed Time
• counts everything (disk and memory accesses, waiting for I/O, running other
programs, etc.) from start to finish
• a useful number, but often not good for comparison purposes
Elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
• doesn't count waiting for I/O or time spent running other programs
• can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
• Our focus:
• user CPU time (CPU execution time or, simply, execution time)
• time spent executing the lines of code that are in our program
• For easier writing, user CPU time has been termed simply as CPU time in rest of
the studies.
Execution Time
CPU Clocking
Clock cycle. The speed of a computer processor, or CPU, is determined
by the clock cycle, which is the amount of time between two pulses of an
oscillator.
13
CPU Time
Performance Equation - I
14
Example
16
CPU Time Example
17
CPU Time Example
18
Instruction Count and CPI
Performance Equation - II 19
Factors Influencing Performance
20
CPU Time Example
21
CPU Time Example
22
CPU Time Example
23
Self Help
• Which code sequence has the most instructions? Which sequence will be
faster? How much? What is the CPI for each sequence?
CPI Example
Which code sequence has the most instructions?
• Two different compilers are being tested for a 500 MHz. machine with
three different classes of instructions: Class A, Class B, and Class C,
which require 1, 2 and 3 cycles (respectively). Both compilers are used
to produce code for a large piece of software.
• Compiler 1 generates code with 5 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
• Compiler 2 generates code with 10 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
• Benchmark suites
• Each vendor announces a SPEC rating for their system
• a measure of execution time for a fixed collection of programs
• is a function of a specific CPU, memory system, IO system, operating
system, compiler enables easy comparison of different systems
• The SPEC rating specifies how much faster a system is, compared to
a baseline machine – a system with SPEC rating 600 is 1.5 times
faster than a system with SPEC rating 400
• Note that this rating incorporates the behavior of all 29 programs – this
may not necessarily predict performance for your favorite program!
31
Summary
33
34
Power Wall
Power Wall
39
+ ■ Gene Amdahl [AMDA67]
If a program needs 20 hours using a single processor core, and a particular part of the program
which takes one hour to execute cannot be parallelized, If there are no limitations of using
processors, then calculate the minimum execution time of the program.
Answer:
while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of
how many processors are devoted to a parallelized execution of this program, the minimum
execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited
to at most 20 times (1/(1 − p) = 20).
42
+
Amdahl’s Law
+
Little’s Law
■ Fundamental and simple relation with broad applications
■ Queuing system
■ If server is idle an item is served immediately, otherwise an
arriving item joins a queue
■ There can be a single queue for a single server or for multiple
servers, or multiples queues with one being for each of multiple
servers
+
Little’s Law
■ Average number of items in a queuing system equals the
average rate at which items arrive multiplied by the time that
an item spends in the system
■ Relationship requires very few assumptions
■ Because of its simplicity and generality it is extremely useful
L = A xW
46
Little’s Law
You are estimating number of threads required by your server to execute clients requests
efficiently and initially you starts 4 threads on the server. Request arrival rate on your server is 4
request/sec and each request takes fixed amount of time to complete with following time
descriptions. The arrival rate is fixed and all new request arrivals have fixed service time.
Request_1= 0.2 sec; Request_2 =0.9 sec; Request_3=0.6 sec Request_4=0.5 sec
What improvement factor should you think for the maximization of your thread uses?
47
Little’s Law
Since each request would be assigned to each thread so no waits are there so Average Response
time is (0.2+0.9+0.6+0.5)/4=0.55
Applying Little's law to estimate required threads on server to serve requests is now as follows:
So, there should be 2 threads only for request arrivals of 4 req/sec with given service times,
and in this case any 2 requests must wait on each cycle of arrivals because we have only 2
resources (threads) and arrival rate is 4 requests/sec. So any 2 requests must wait and this
results in increased response time. And two 2 threads on the server will always remain idle.