Introduction to ACA 2021
Introduction to ACA 2021
OF
COMPUTER ARCHITECTURE
What is COMPUTER ARCHITECTURE?
•Hardware Designer – thinks about circuits, components, timing,
functionality, ease of debugging “construction engineer”
3
Why do I Care?
•You may actually do computer architecture someday
4
What is COMPUTER ARCHITECTURE?
•Computer Architecture =
Machine Organization (What the machine looks like )
+
Instruction Set Architecture (How you talk to the machine)
5
6
7
8
The Instruction Set Architecture
•is the agreed-upon interface between all the software that runs
on the machine and the hardware that executes it.
9
Machine Organization
•Once you have decided on an ISA, you must decide how to
design the hardware to execute those programs written in the
ISA as fast as possible.
10
The Challenge of Computer Architecture
•The industry changes faster than any other.
•It’s all about making programs run faster than the next guy’s
machine.
11
What is the first computer?
•Z1 originally created by Konrad Zuse (Germany) 1936-1938 – first
electrically binary programmable computer
12
•First 25 years of computers era there was a performance growth
of 25% per year
13
•Growth rate, cost advantage of micro processors --increased the
fraction of computer business.
14
EFFECT :
1. Enhanced the capability available to computer users
4. On Software development
15
•Revolution in computer design – emphasized both architectural
innovation & efficient use of technology improvements
16
Evolution of computers
•1960s large main frames
•1970s mini computers and super computers -
•late 1980s desktops and work stations - led to rise of servers
and
•1990s we have internet , WWW, the 1st hand held computing
devices (PDAs)
•2000s we have cell phones then Embedded computers
17
1. Classification
2. Short history
3. Performance growth
4. Measuring performance
5. CPU performance equation
6. Cost of chip
7. Dependability
8. Trends in technology
18
CLASSIFICATION
1) Personal Mobile Device (PMD)
• $100- $1000 per system, $10-$100 per processor
• Cost, energy, media performance, responsiveness
2) Desktop
• $ 500 to $ 5000 per system, $ 50 to $ 500 per Processor
• price/performance ratio, graphics performance, energy
3) Server
• $5000 to $ 10,000,000 per system, $ 200 to $ 2000 per Processor
• high throughput, scalability, availability, energy
4) Clusters/warehouse scale computers
• $100,000- $200,000,000 per system, $ 50- $ 250 per processor
• price/performance ratio, high throughput, energy, availability
5) Embedded System:
• $10 to $100,000 per system, $0.01 to $100 per Processor
• price, energy, application-specific performance
19
Contd…
•Computer architecture deals with architecture on several levels:
1. Digital Logic
• describes the implementation of digital components by logic gates
2. Microarchitecture (μA)
• defines the structure of the CPU including alu, control logic, internal busses, cache
hierarchy, .. .
• this level is also called computer organization
4. System Architecture
• defines system bus, memory system, I/O buses, the interconnect for several CPUs, ...
20
Instruction Set Architectures
• Classifying ISAs : based on access of the operands into 4 classes
• a) Stack
- all operands are on the upper stack positions
• b) Accumulator
- the first operand is in the accumulator, the second operand in memory
• c) Register-memory
- the first operand is in a register, the second operand in memory
• d) Register-register
- both operands have been loaded into registers before (therefore also called
load/store architecture)
• In some earlier architectures (like the DEC VAX architecture) also memory-memory access was
implemented
21
Seven Dimensions of an ISA
1. Class of ISA
2.Memory addressing
3. Addressing modes
4. Type and size of operands
5. Operations
6. Control flow instructions
7. Encoding an ISA
22
ARCHITECTURE.
•Implementation of computer components:
• Organization – high level aspects- memory design, memory interconnect,
designof CPU
• Hardware – detailed logic
• ISA
23
Short History
• Some important milestones of advanced computer architecture:
1965: IBM ships the first computer based on integrated circuits
1976: first vector CPU in supercomputer Cray-1
1978: Intel introduces the 8086 architecture
1984: Motorola 68020 is the first CPU with cache
1987: first RISC processors (SPARC and MIPS)
1989: Intel ships the 80486 CPU with over 1 million transistors
1992: DEC Alpha 21064 is the first 64-bit microprocessor
1995: Sun's UltraSPARC is the first CPU with a SIMD unit (DLP)
1998: DEC Alpha 21264 with 4-fold superscalarity (ILP) and out-of-order instruction execution
2000: first dual-core CPU (Power4) presented by IBM
24
Trends in Technology
Five implementation technologies, which changed at a dramatic pace :
• Integrated circuit logic technology
• Semiconductor DRAM (dynamic random-access memory)
• Semiconductor Flash (electrically erasable programmable read-only
memory)
• Magnetic disk technology
• Network technology
Trends in Technology
• The technology's feature size is decreased from 10 μm in 1971 to 0.09 μm in 2006
• Average annual increase during the last years:
– transistor density: ~ 35%
– chip (die) area: ~ 15%
– transistors per chip (die): ~ 55%
– CPU clock rate: ~ 30%
– DRAM capacity: ~ 40%
– hard disk density: ~ 30% (until 1990), ~ 100% (since 1990)
• Moore's law states that the number of transistors per chip doubles every 18 months
...
26
•IC Logic Technology
–Transistor density 35%
–Die size 10 to 20%
–Growth rate in transistor 40 to 55%
•Semiconductor DRAM
– 25% to 40% per year
• Semiconductor Flash
• 50 % to 60 % per year
•Network Technology
27
–Performance of switches and transmission system
Performance Trends : Bandwidth Vs Latency
•Band width/ through put
–Total amount of work done in a given time
•Megabytes per sec for disk transfer
•Latency/response time
–Time b/w start and completion of an event
•Milliseconds for disk access
28
Trends in Power& Energy in Integrated Circuits
Power also provides challenges.
1. Power : Brought in & distributed around the chip
2. Power dissipated as heat & must be removed.
▪Load capacitance.
▪Voltage
▪Now most Micro Processors turn off clock for inactive modules -- Save energy & dynamic
power
How the vast amount of additional transistors can be used to take most profit of them?
– larger caches?
– increased instruction-level parallelism?
– more speculative instruction execution?
– replication of processors cores ?
• However the power consumption becomes more and more the limiting factor:
– dynamic power Pdynamic per transistor can be estimated by
Pdynamic = 1/2 × capacitive load × voltage2 × frequency
31
How to improve energy efficiency?
• Do nothing well.
• Turn off the clock of inactive modules
• Overclocking
• safe to run at a higher clock rate for a short time possibly on just a few cores until
temperature starts to rise
Why Such Change in 10 years?
•Performance
–Technology Advances
•CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance
–Computer architecture advances improves low-end
•RISC, superscalar, RAID redundant array of inexpensive disks, …
33
Trends in technology
•IC logic tech -35% then 40 to 55%
•Semi conductor DRAM – 40%
•Magnetic disk tech – 30 – 60 -100- (-30%)
•N/W tech – depends on perf of switches & transmission system.
34
Performance Growth
• Exponential growth of CPU's performance
- About 50% per year increase until 2002,
- only about 20% per year increase since 2002
⇒ no more growth in future?
35
Contd…
• Fast DRAM memory chips (DDR2, DDR3, RDRAM) appeared in the last 20 years:
• effective frequency for data transfer
•The gap between CPU and memory speed is increasing by about 50% per year
• The low DRAM latency must be hidden by exploiting multilevel cache hierarchies.
36
Trends in Cost
•Manufacturing costs decrease over time – learning curve
•Increasing volumes affect cost in several ways.
cost decreases about 10% for each doubling of volume.
•Much of low end of computer business is a commodity
business.
overall product cost is lower because of the competition among the suppliers of the
components and the volume efficiencies the suppliers can achieve.
37
38
Cost of an Integrated Circuit
N is a parameter called the process-complexity factor, a measure of manufacturing difficulty.
41
•Die Yield = Wafer Yield X (1 + [Defects per Area * Die Area /
alpha]) ^ (-alpha)
•Alpha =4
42
Why are we interested?
43
Cost Vs Price
44
Cost of Operation
• warehouse-scale computers, which contain tens of
thousands of servers,
• the cost to operate the computers is significant in addition to the cost of
purchase.
46
Dependability
•W.r.t SLA (Service Level Agreement)
•Architects must design systems to cope with challenges of dependability.
•Guarantee that their networking or power service would be dependable
49
•Transitions b/w these 2 states caused by
–Failures (from state1 to state 2)
–Restorations (from state2 to state 1)
50
Module Reliability
•Is measure of continuous service accomplishment from a reference point.
–MTTF
•Reciprocal of MTTF is rate of failures – FIT (Failures per billion hours (109)of
operation)
•MTTF of 1,000,000hrs equals 1000 FIT
–MTTR
•Measured for service interruption
–MTBF
•MTTF + MTTR
51
Module availability
•Measure of service accomplishment w.r.t alteration b/w 2 states
of accomplishment and interruption.
MTTF
Module availability = ---------------------
(MTTF + MTTR)
52
8. Measuring, Reporting, Summarizing Performance
•Measure Performance
–Bench marks
•Report Performance
–Benchmark report
53
Measurement of Performance
• The performance px of a program x can be determined by measuring its execution time tx
• A program y is faster than a program x by a factor c, if
c= tx = 1/ px = py
----- --------- -----
ty 1/ py px
54
Contd…
•Performance is typically given for benchmarks
• Several kinds of benchmarks:
•synthetic benchmarks: small programs that try to simulate the behavior of real
programs
examples: Dhrystone, Whetstone
•kernels: small key fragments of real programs
•toy problems: short complete programs
•benchmark suite: a collection of real programs, either from one or various application
areas
•example: SPEC benchmark
•Main problem of the first three benchmark kinds: chip architect can optimize the
hardware for the short benchmark codes
55
Contd…
• SPEC benchmark (Standard Performance Evaluation Corporation) consists of
•real application programs
•easily portable
•effect of I/O is minimized as far as possible
•written in C, C++ and Fortran
•well suited for measuring performance of desktop CPUs
•often updated to avoid manipulation: SPEC89, 92, 95, 2000 and SPEC2006
•SPEC2000 contains 12 integer and 14 float programs
•SPEC2000 integer benchmarks vary from part of a C compiler to a chess program to a
quantum computer simulation
56
Benchmarks (Contd…)
Desktop Server
•Processor intensive •Processor throughput-oriented
–Processing rate measured
•Graphics intensive •File server
–Tests perforamnce of I/O sys and processor
•Web server
–Request for static & dynamic pages
–Clients post data to server
•Transaction Processing
–Handles transactions with databases.
57
Report Performance
•Reproducibility
•List everything another experimenter would need to duplicate results.
•Actual Performance time
•Both baseline and optimized results
•Tabular and graph format
•Audit and Cost information
•High performance
•Cost performance
58
Summarize Performance Result
•Summarize performance results of the suite in a single number.
•Compare Arithmetic Means of the execution times of the programs in suite
•Add weighing factor to each benchmark and have weighted arithmetic mean.
•Normalize execution times to a reference computer. (SPECRatio)
•Evaluate geometric mean from the above
59
9. Quantitative Principles of Computer Design
•Take advantage of parallelism
•Parallelism at system level
•Scalability - multiple processors & disks
•Parallelism among instructions
•Pipelining
•Parallelism at detailed design
•Set-associative caches use multiple banks of memory
•Modern ALUs use carry look- ahead
–A carry look-ahead adder improves speed by reducing the amount of time required to
determine carry bits
60
•Principle Of Locality
•Programs tend to reuse data and instructions that they have
used recently.
•Program spends 90% of its execution time in only 10% of the code.
•Temporal Locality
•Recently accessed items are likely to be accessed in the near future
•Spatial Locality
•Items whose addresses are near one another tend to be referenced close
together in time.
61
•Focus on the common case
•Determining how to spend resources
•Power
•Resource allocation
•Performance
•Decide what the frequent case is and
•how much performance can you improve by making it faster.
•Amdahl’s law
08/13/19 62
Amdahl’s law
•Measures the performance gain obtained by improving some
portion of a computer.
•“Performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster
mode can be used ”
63
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E
–Fraction enhanced
• fraction of comp. time that can be converted for enhancement (20 sec of total
60 sec)
•<=1
–Speedup enhanced
•Improvement gained by enhanced execution mode (5sec original mode and 2
sec enhanced mode)
•>1
64
Amdahl’s Law
1
ExTimeold
Speedupoverall = =
ExTimenew (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
65
Processor performance equation
•Ticks, clock ticks, clock periods, clocks, cycles, clock cycles – discrete time
events
•CPU time = CPU clock cycles for a program X Clock cycle time
or
•CPU time = CPU clock cycles for a program
Clock Rate
66
•Avg. CPI = CPU clock cycles for a program
Instruction Count (IC)
•Clock cycles = CPI X IC
•CPU time = IC X CPI X Clock cycle time
68
69
Pitfalls
•Falling prey to Amdahl’s law
•Too much effort and disappointing results.
•A single point of failure
•One component failure brings down the whole system.
•Fault detection can lower availability.
•Detect faults, no mechanism to correct them.
70
Fallacies
•Cost of processor dominates cost of system
•Only 20% for servers and 10% for PCs
•Benchmarks remain valid indefinitely
•Updates available from time to time.
•Rated MTTF is 1,200,00hrs (140 yrs), so disks practically never
fail.
•Peak performance tracks observed performance
•Varies from 5% to 58%
71
Assignment -1
73