02 Performance
02 Performance
Architecture
Performance
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides
Some material adapted from Hennessy & Patterson / © 2003 Elsevier Science
2
Moore’s Law
• Transistors double every year
– Revised to every two years
– Sometimes revised to performance instead
of transistors
(Source: Intel)
(Wgsimon - Wikipedia)
4
Defining Performance
• Performance means different things to different
people, therefore its assessment is subtle
Analogy from the airlines industry:
• How to measure performance for an airplane?
– Cruising speed (How fast it gets to the destination)
– Flight range (How far it can reach)
– Passenger capacity (How many passengers it can carry)
Performance Metrics
• Response (execution) time:
– The time between the start and the completion of a task
– Measures user perception of the system speed
– Common in reactive and time critical systems
– Single-user computer
• Throughput:
– The total number of tasks done in a given time
– Relevant to batch processing (billing, credit card processing)
– Also many-user services (web servers)
• Power:
– Power consumed or battery life
– Especially relevant for mobile
6
Response-time Metric
• Maximizing performance means
minimizing response (execution) time
7
Response-time Metric
Response-time Metric
Example
A program runs in 10 seconds on a computer “A” with a 400 MHz clock.
We desire a faster computer “B” that could run the program in 6 seconds.
The designer has determined that a substantial increase in the clock speed is
possible, however it would cause computer “B” to require 1.2 times as many clock
cycles as computer “A”. What should be the clock rate of computer “B”?
Example
A program runs in 10 seconds on a computer “A” with a 400 MHz clock.
We desire a faster computer “B” that could run the program in 6 seconds.
The designer has determined that a substantial increase in the clock speed is
possible, however it would cause computer “B” to require 1.2 times as many clock
cycles as computer “A”. What should be the clock rate of computer “B”?
To get the clock rate of the faster computer, we use the same formula
12
Or
13
Example
Suppose we have two implementation of the same instruction set architecture.
Machine “A” has a clock cycle time of 1 ns and a CPI of 2.0 for some program, and
machine “B” has a clock cycle time of 2 ns and a CPI of 1.2 for the same program.
Which machine is faster for this program and by how much?
Both machines execute the same instructions for the program. Assume the
number of instructions is “I”,
CPU clock cycles (A) = I 2.0 CPU clock cycles (B) = I 1.2
Which code sequence executes the most instructions? Which will be faster?
What is the CPI for each sequence?
Answer:
Sequence 1: executes 2 + 1 + 2 = 5 instructions
Sequence 2: executes 4 + 1 + 1 = 6 instructions
17
Since Sequence 2 takes fewer overall clock cycles but has more
instructions it must have a lower CPI
18
Amdahl’s Law
The performance enhancement possible with a given improvement
is limited by the amount that the improved feature is used
Timeold notFP FP
Timenew notFP FP / S
23