Performance Chap4
Performance Chap4
Todays topics: IEEE 754 representations FP arithmetic Evaluating a system Reminder: assignment 4 due in a week start early!
Examples
Final representation: (-1)S x (1 + Fraction) x 2(Exponent Bias)
Represent -0.75ten in single and double-precision formats Single: (1 + 8 + 23)
Double: (1 + 11 + 52)
What decimal number is represented by the following single-precision number? 1 1000 0001 010000000
2
Examples
Final representation: (-1)S x (1 + Fraction) x 2(Exponent Bias)
Represent -0.75ten in single and double-precision formats Single: (1 + 8 + 23) 1 0111 1110 1000000 Double: (1 + 11 + 52) 1 0111 1111 110 1000000 What decimal number is represented by the following single-precision number? 1 1000 0001 010000000 -5.0
FP Addition
Consider the following decimal example (can maintain only 4 decimal digits and 2 exponent digits)
9.999 x 101 9.999 x 101
Add
+ +
10.015 x 101
Normalize
1.0015 x 102
Check for overflow/underflow Round
1.002 x 102
Re-normalize
4
FP Addition
Consider the following decimal example (can maintain only 4 decimal digits and 2 exponent digits)
9.999 x 101 9.999 x 101
Add
+ +
10.015 x 101
Normalize
1.0015 x 102
Check for overflow/underflow Round
1.002 x 102
Re-normalize
5
FP Multiplication
Similar steps: Compute exponent (careful!) Multiply significands (set the binary point correctly) Normalize Round (potentially re-normalize) Assign sign
MIPS Instructions
The usual add.s, add.d, sub, mul, div
Comparison instructions: c.eq.s, c.neq.s, c.lt.s. These comparisons set an internal bit in hardware that is then inspected by branch instructions: bc1t, bc1f Separate register file $f0 - $f31 : a double-precision value is stored in (say) $f4-$f5 and is referred to by $f4 Load/store instructions (lwc1, swc1) must still use integer registers for address computation
7
Code Example
float f2c (float fahr) { return ((5.0/9.0) * (fahr 32.0)); }
(argument fahr is stored in $f12) lwc1 $f16, const5($gp) lwc1 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra
8
Performance Metrics
Possible measures: response time time elapsed between start and end of a program throughput amount of work done in a fixed time
The two measures are usually linked A faster processor will improve both More processors will likely only improve throughput What influences performance?
Execution Time
Consider a system X executing a fixed workload W
PerformanceX = 1 / Execution timeX
Execution time = response time = wall clock time - Note that this includes time to execute the workload as well as time spent by the operating system co-ordinating various events
The UNIX time command breaks up the wall clock time as user and system time
10
Performance Equation - I
CPU execution time = CPU clock cycles x Clock cycle time Clock cycle time = 1 / Clock speed
If a processor has a frequency of 3 GHz, the clock ticks 3 billion times in a second as well soon see, with each clock tick, one or more/less instructions may complete If a program runs for 10 seconds on a 3 GHz processor, how many clock cycles did it run for? If a program runs for 2 billion clock cycles on a 1.5 GHz processor, what is the execution time in seconds?
12
Performance Equation - II
CPU clock cycles = number of instrs x avg clock cycles per instruction (CPI) Substituting in previous equation,
13
CPI: the nature of each instruction and the quality of the architecture implementation
14
Example
Execution time = clock cycle time x number of instrs x avg CPI Which of the following two systems is better? A program is converted into 4 billion MIPS instructions by a compiler ; the MIPS processor is implemented such that each instruction completes in an average of 1.5 cycles and the clock speed is 1 GHz
The same program is converted into 2 billion x86 instructions; the x86 processor is implemented such that each instruction completes in an average of 6 cycles and the clock speed is 1.5 GHz 15
Benchmark Suites
Measuring performance components is difficult for most users: average CPI requires simulation/hardware counters, instruction count requires profiling tools/hardware counters, OS interference is hard to quantify, etc.
Each vendor announces a SPEC rating for their system a measure of execution time for a fixed collection of programs is a function of a specific CPU, memory system, IO system, operating system, compiler enables easy comparison of different systems
SPEC CPU
SPEC: System Performance Evaluation Corporation, an industry consortium that creates a collection of relevant programs The 2006 version includes 12 integer and 17 floating-point applications The SPEC rating specifies how much faster a system is, compared to a baseline machine a system with SPEC rating 600 is 1.5 times faster than a system with SPEC rating 400 Note that this rating incorporates the behavior of all 29 programs this may not necessarily predict performance for your favorite program!
17
Amdahls Law
Architecture design is very bottleneck-driven make the common case fast, do not waste resources on a component that has little impact on overall performance/power Amdahls Law: performance improvements through an enhancement is limited by the fraction of time the enhancement comes into play Example: a web server spends 40% of time in the CPU and 60% of time doing I/O a new processor that is ten times faster results in a 36% reduction in execution time (speedup of 1.56) Amdahls Law states that maximum execution time reduction is 40% (max speedup of 1.66)
19
Title
Bullet
20