0% found this document useful (0 votes)
33 views

Fundamentals of Computer Design - 1

This document discusses key concepts in computer architecture: 1. It defines computer architecture as including the instruction set architecture, organization, hardware, and more. The instruction set provides a critical interface between software and hardware. 2. Dependability and performance are important principles of computer design. Dependability can be measured through metrics like mean time to failure and availability. Performance is measured in units of work completed per second. 3. Quantitative principles for optimizing computer design include taking advantage of parallelism, exploiting locality of reference, focusing on common cases using Amdahl's law, and understanding the processor performance equation.

Uploaded by

qwety300
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Fundamentals of Computer Design - 1

This document discusses key concepts in computer architecture: 1. It defines computer architecture as including the instruction set architecture, organization, hardware, and more. The instruction set provides a critical interface between software and hardware. 2. Dependability and performance are important principles of computer design. Dependability can be measured through metrics like mean time to failure and availability. Performance is measured in units of work completed per second. 3. Quantitative principles for optimizing computer design include taking advantage of parallelism, exploiting locality of reference, focusing on common cases using Amdahl's law, and understanding the processor performance equation.

Uploaded by

qwety300
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

12Z602 - Computer Architecture II

•Dependability
•Performance
•Quantitative principles of Computer
Design
Defining Computer Architecture
• “Old” view of computer architecture:
– Instruction Set Architecture (ISA) design
– i.e. decisions regarding:
» registers, memory addressing, addressing modes,
instruction operands, available operations, control
flow instructions, instruction encoding

• “Real” computer architecture:


– Specific requirements of the target machine
– Design to maximize performance within
constraints: cost, power, and availability
– Includes ISA, microarchitecture, hardware
What is *Computer Architecture*
Computer Architecture =
Instruction Set Architecture +
Organization + Hardware + …
The Instruction Set: a Critical Interface

software

instruction set

hardware
Define and quantify dependability
• How decide when a system is operating properly?
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their networking
or power service would be dependable
• Systems alternate between 2 states of service with
respect to an SLA:
1. Service accomplishment, where the service is delivered
as specified in SLA
2. Service interruption, where the delivered service is
different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1
Measures of dependability
• Module reliability = measure of continuous service
accomplishment (or time to failure) from a
reference initial instant.
2 metrics
1. Mean Time To Failure (MTTF) measures Reliability
2. Failures In Time (FIT) = 1/MTTF, the rate of failures
• Traditionally reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service
Interruption
– Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate
between the 2 states of accomplishment and
interruption (number between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR)
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for 10 disks (1M hour
MTTF per disk), 1 disk controller (0.5M hour
MTTF), and 1 power supply (0.2M hour MTTF):

FailureRat e 

MTTF 
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for 10 disks (1M hour
MTTF per disk), 1 disk controller (0.5M hour
MTTF), and 1 power supply (0.2M hour MTTF):
FailureRat e  10  (1 / 1,000,000)  1 / 500,000  1 / 200,000
 10  2  5 / 1,000,000
 17 / 1,000,000
 17,000 FIT
MTTF  1,000,000,000 / 17,000
 59,000hours
Definition: Performance
• Performance is in units of things per sec
– bigger is better
• If we are primarily concerned with response time
Performance = 1/Execution time

" X is n times faster than Y" means

Performance(X) Execution_time(Y)
n = =
Performance(Y) Execution_time(X)
Performance: What to measure
• Usually rely on benchmarks.
• To increase predictability, collections of benchmark
applications, called benchmark suites, are popular
• SPECCPU: popular desktop benchmark suite
– Measuring processing time
• Server benchmarks: Processor throughput
oriented.
– SPECSFS (NFS file server) and SPECWeb (WebServer) added as
server benchmarks
• Transaction Processing Council measures server
performance and ability to handle transactions for
databases (airline reservation)
– TPC-C Complex query for Online Transaction Processing
– TPC-H models ad hoc decision support
– TPC-E OLTP
– TPC-App application server and web services benchmark
How to Summarize Suite Performance
• Designer vs user view (rely on benchmark to select
the system)
• Arithmetic average of execution time of all pgms?
– But they vary by 4X in speed, so some would be more important
than others in arithmetic average
• Could add a weights per program, but how pick
weight?
– Different companies want different weights for their products
• SPECRatio: Normalize execution times to reference
computer, yielding a ratio proportional to
• performance =
time on reference computer
time on computer being rated
How Summarize Suite Performance
• If program SPECRatio on Computer A is 1.25
times bigger than Computer B, then
ExecutionTimereference
SPECRatio A ExecutionTime A
1.25  
SPECRatioB ExecutionTimereference
ExecutionTimeB
ExecutionTimeB Performance A
 
ExecutionTime A PerformanceB
• Note that when comparing 2 computers as a ratio,
execution times on the reference computer drop
out, so choice of reference computer is irrelevant
How Summarize Suite Performance
• Since ratios, proper mean is geometric mean
(SPECRatio unitless, so arithmetic mean meaningless)

n
GeometricMean  n  SPECRatio
i 1
i

1. Geometric mean of the ratios is the same as the


ratio of the geometric means
2. Ratio of geometric means
= Geometric mean of performance ratios
 choice of reference computer is irrelevant!
• These two points make geometric mean of ratios
attractive to summarize performance
Quantitative Principles of Computer
Design
Outline
• Quantitative Principles of Computer Design
1. Take Advantage of Parallelism
2. Principle of Locality
3. Focus on the Common Case
4. Amdahl’s Law
5. The Processor Performance Equation
1) Taking Advantage of Parallelism
PARALLELISM @ SYSTEM LEVEL
• Increasing throughput of server computer -- multiple
processors or multiple disks can be used.

PARALLELISM @ INDIVIDUAL PROCESSOR


• Pipelining: overlap instruction execution to reduce the
total time to complete an instruction sequence.
– Not every instruction depends on immediate predecessor 
executing instructions completely/partially in parallel possible
– Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)
Pipelined Instruction Execution
Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7


I

ALU
n Ifetch Reg DMem Reg

s
t
r.

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg
2) The Principle of Locality
• The Principle of Locality:
– Program access a relatively small portion of the address space at
any instant of time.
– Program tend to reuse data and instructions that it has used
recently.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it will
tend to be referenced again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced, items
whose addresses are close by tend to be referenced soon
(e.g., array access)
3) Focus on the Common Case
• In making a design trade-off, favor the frequent
case over the infrequent case
– E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it 1st
– E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st
• Frequent case is often simpler and can be done
faster than the infrequent case
• Eg: Adding 2 numbers
• What is frequent case and how much performance
improved by making case faster => Amdahl’s Law
Amdahl’s Law
• It states that performance improvement gained from
using enhancement is limited by the fraction of the
time the enhancement can be used.
• Speedup:

• Depends on two factors:


1. Fraction enhanced = fraction of execution time taken for enhancement

2. Speedup enhanced = speedup achieved by enhancing Fraction enhanced


Amdahl’s Law
• Execution Time using computer with the enhancement
= Unenhanced portion of computer + time spent using
enhancement

Fraction enhanced
Fraction enhanced / Speedup enhanced

Ex Time old Ex Time new


Amdahl’s Law

ExTimeold 1
Speedupoverall  
ExTimenew Fractionenhanced
1  Fractionenhanced  
Speedupenhanced

Amdhal’s law expresses the law of diminishing returns: The improvement


gained by enhancement diminishes as additional enhancements are done.

Best you could ever hope to do:


1
Speedupmaximum 
1 - Fractionenhanced 
Amdahl’s Law example
• Suppose in a web server new CPU 10X faster
• Assume that it’s an I/O bound server, so 60% time
waiting for I/O
1
Speedup overall 
1  Fraction enhanced   Fraction enhanced
Speedup enhanced
1 1
   1.56
1  0.4  0.4 0.64
10
• Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster
Amdahl’s Law example: Make the
common case fast
• Fraction = 0.1, Speedup = 10
1
Speedup overall 
1  Fraction enhanced   Fraction enhanced
Speedup enhanced
1 1
   1.1
1  0.1  0.1 0.91
10
• Fraction = 0.9, Speedup = 10
1 1
Speedup overall    5.3
1  0.9  0.9 0.19
10
Problem
• Suppose FP square root (FPSQR) is responsible for 20% of the
execution time for a graphics. One proposal is to enhance the FPSQR
hardware and speed up this operation by a factor of 10. The other
alternative is just to try to make all FP instructions in the graphics
processor run faster by a factor of 1.6; FP instructions are responsible
for half of the execution time for the application. The design team
believes that they can make all FP instructions run 2 times faster with
the same effort as required for the fast square root. Compare these two
design alternatives.
Design 1: FPSQR enhancement
Fraction enhanced = 20%
Speedup enhanced = 10
Design 2: FP enhancement
Fraction enhanced = 50%
Speedup enhanced = 1.6
Problem:
Suppose FP square root (FPSQR) is responsible for 20% of the execution time for a
graphics. One proposal is to enhance the FPSQR hardware and speed up this
operation by a factor of 10. The other alternative is just to try to make all FP
instructions in the graphics processor run faster by a factor of 1.6; FP instructions
are responsible for half of the execution time for the application. The design team
believes that they can make all FP instructions run 2 times faster with the same
effort as required for the fast square root. Compare these two design alternatives.

Improving the performance of the FP operations overall is


slightly better because of the higher frequency.
What’s a Clock Cycle?

Latch combinational
or logic
register
Processor performance equation
• Micro-processors are based on a clock running at
a constant rate

CPU time = CPU clock cycles for a program * Clock cycle time  1

CPI = CPU clock cycles for a program 2


Instruction count

CPU time = Instruction count * CPI * Clock cycle time

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle
Problem
• Consider a graphics card, with
– FP operations (excluding FPSQR): frequency 25%,
average CPI 4.0
– FPSQR operations only: frequency 2%, average CPI
20
– all other instructions: average CPI 1.3333333

• Design option 1: decrease CPI of FPSQR to 2


• Design option 2: decrease CPI of all FP operations to
2.5
Problem
• Suppose a program (or a program task) takes 1 billion
instructions to execute on a processor running at 2 GHz. 50% of
the instructions execute in 3 clock cycles, 30% execute in 4
clock cycles, and 20% execute in 5 clock cycles. What is the
execution time for the program or task? If the processor is
redesigned such that all instructions that initially executed in 5
cycles now execute in 4 cycles. What is the overall percentage
improvement?

You might also like