2.Week
2.Week
components
Reduce the frequency
• Architectural examples of memory access by
incorporating
increasingly complex
include: and efficient cache
structures between
the processor and
main memory
Graphics display
Wi-Fi modem
(max speed)
Hard disk
Optical disc
Laser printer
Scanner
Mouse
Keyboard
101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)
106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores
103
102
10
0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010
(1 – f)T fT Speedup =
N Time to execute program on a single processor
Time to execute program on N parallel
1
1 f 1 T processors
N
=T(1 - f) + Tf
T(1 - f) +Tf
N
Figur e 2 .3 I llust r a t ion of Am da hl’s La w
T is the total execution time of the program using a single processor
f is a fraction of the execution time involves code that is infinitely
parallelizable with no scheduling overhead
Spe dup f = 0 .9 5
f = 0 .9 0
f = 0 .7 5
f = 0 .5
N um be r of Pr oce ssor s
Compiler technology X X X
Processor implementation X X
Ic : Instruction Count
p : The number of processor cycles needed to decode and execute the instruction
m: The number of memory references needed
k: the ratio between memory cycle time and processor cycle time
: cycle time (1/f)
CPI: Clock cycle per instruction
T: The time needed to execute a given programe
Calculating the Mean
The three
The use of benchmarks to
compare systems involves common
calculating the mean value of formulas used
a set of data points related to for calculating
execution time
a mean are:
• Arithmetic
• Geometric
• Harmonic
MD
AM
(a) GM
HM
MD
AM
(b) GM
HM
MD
AM
(c) GM
HM
MD
AM
(d) GM
HM
MD
AM
(e) GM
HM
MD
AM
(f) GM
HM
MD
AM
(g) GM
HM
0 1 2 3 4 5 6 7 8 9 10 11
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
Arithmetic Mean
◼ An Arithmetic Mean (AM) is an appropriate measure
if the sum of all the measurements is a meaningful
and interesting value
◼ The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
◼ If the total time doubles, the mean value doubles
A Comparison of Arithmetic and Harmonic Means for Rates
Computer Computer Computer Computer Computer Computer
A time B time C time A rate B rate C rate
(secs) (secs) (secs) (MFLOPS) (MFLOPS) (MFLOPS)
Program 1
2.0 1.0 0.75 50 100 133.33
(108 FP ops)
Program 2
0.75 2.0 4.0 133.33 50 25
(108 FP ops)
Total
execution 2.75 3.0 4.75 – – –
time
Arithmetic
mean of 1.38 1.5 2.38 – – –
times
Inverse
of total
0.36 0.33 0.21 – – –
execution
time (1/sec)
Arithmetic
mean of – – – 91.67 75.00 79.17
rates
Harmonic
mean of – – – 72.72 66.67 42.11
rates
A Comparison of Arithmetic and Geometric Means for Normalized
Results
(a) Results normalized to Computer A
• Benchmark suite
– A collection of programs, defined in a high-level language
– Together attempt to provide a representative test of a computer in a
particular application or system programming area
– SPEC
– An industry consortium
– Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
– Performance measurements are widely used for comparison and
research purposes
SPEC CPU2017
• Best known SPEC benchmark suite
• Industry standard suite for processor intensive applications
• Appropriate for measuring performance for applications that
spend most of their time doing computation rather than I/O
• Consists of 20 integer benchmarks and 23 floating-point
benchmarks written in C, C++, and Fortran
• For all of the integer benchmarks and most of the floating-
point benchmarks, there are both rate and speed benchmark
programs
• The suite contains over 11 million lines of code
Rate Speed Language Kloc Application Area
523.xalancbmk_r 623.xalancbmk_s C++ 520 XML to HTML conversion via XSLT (A)
Kloc = line count (including comments/whitespace) for source files used in a build/1000
Rate Speed Language Kloc Application Area
(B)
511.povray_r C++ 170 Ray tracing
519.ibm_r 619.ibm_s C 1 Fluid dynamics SPEC
521.wrf_r 621.wrf_s Fortran, C 991 Weather forecasting CPU2017
526.blender_r C++ 1577 3D rendering and animation
Benchmarks
527.cam4_r 627.cam4_s Fortran, C 407 Atmosphere modeling
628.pop2_s Fortran, C 338 Wide-scale ocean modeling
(climate level)
Kloc = line count (including comments/whitespace) for source files used in a build/1000
Base Peak
541.leela_r
892 1410 896 1420 (a) Rate Result
833 2420 770 2610 (768 copies)
548.exchange2_r
602.gcc_s
546 7.29 535 7.45 SPEC
605.mcf_s
866 5.45 700 6.75 CPU 2017
276 5.90 247 6.61 Integer
620.omnetpp_s
Benchmarks
188 7.52 179 7.91 for HP
623.xalancbmk_s
Integrity
625.x264_s
283 6.23 271 6.51 Superdome X
407 3.52 343 4.18
631.deepsjeng_s
(b) Speed
469 3.63 439 3.88 Result
641.leela_s
329 8.93 299 9.82 (384 threads)
648.exchange2_s
Get next
program
Run program
three times
Select
median value
Ratio(prog) =
Tref(prog)/TSUT(prog)
End