0% found this document useful (0 votes)
12 views

Introduction to ACA 2021

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Introduction to ACA 2021

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

FUNDAMENTALS

OF
COMPUTER ARCHITECTURE
What is COMPUTER ARCHITECTURE?
•Hardware Designer – thinks about circuits, components, timing,
functionality, ease of debugging “construction engineer”

•Computer Architect – thinks about high-level components, how


they fit together, how they work together to deliver
performance. “Building architect”

3
Why do I Care?
•You may actually do computer architecture someday

•You may actually care about system performance someday

•The ability of application programs, compilers, operating


systems, etc. to deliver performance depends critically on an
understanding of the underlying computer organization.

4
What is COMPUTER ARCHITECTURE?
•Computer Architecture =
Machine Organization (What the machine looks like )
+
Instruction Set Architecture (How you talk to the machine)

5
6
7
8
The Instruction Set Architecture
•is the agreed-upon interface between all the software that runs
on the machine and the hardware that executes it.

9
Machine Organization
•Once you have decided on an ISA, you must decide how to
design the hardware to execute those programs written in the
ISA as fast as possible.

•This must be done every time a new implementation of the


architecture is released, with typically very different
technological constraints.

10
The Challenge of Computer Architecture
•The industry changes faster than any other.

•The ground rules change every year.


– new problems
– new opportunities
– different tradeoffs

•It’s all about making programs run faster than the next guy’s
machine.
11
What is the first computer?
•Z1 originally created by Konrad Zuse (Germany) 1936-1938 – first
electrically binary programmable computer

•(John Vincent A.)Atanasoff – (Cliff. B)Berry Computer (ABC) – 1937-


1942 at lowa state college- first digital computer

•ENIAC(Electronic numerical integrator and computer) J.Presper


Eckert & John Mauchly at the University of Pennsylvania and began
construction in 1943 and was not completed until 1946

12
•First 25 years of computers era there was a performance growth
of 25% per year

•Then the IC Technology with microprocessors improved it by


35% per year

•2 significant changes happened :


- virtual elimination of Assembly Language Programming.
- creation of standardized, vendor independent OS (UNIX, Linux)

13
•Growth rate, cost advantage of micro processors --increased the
fraction of computer business.

•These changes led to develop new architecture RISC which


focuses on ILP and use of CACHES.

•RISC improved performance bar and growth rate to 50% per


year in 20th century

14
EFFECT :
1. Enhanced the capability available to computer users

2. Improvement in cost-performance led to new classes of


computers

3. Dominance of micro processor based computers across the


entire range of computer design.
• Mini computers replaced by servers

4. On Software development
15
•Revolution in computer design – emphasized both architectural
innovation & efficient use of technology improvements

•Since 2002, performance growth dropped out to -20% per year

•Reasons : a) maximum power dissipation of air cooled chips


b) Little ILP to exploit efficiently - TLP + DLP
c) unchanged memory latency

16
Evolution of computers
•1960s large main frames
•1970s mini computers and super computers -
•late 1980s desktops and work stations - led to rise of servers
and
•1990s we have internet , WWW, the 1st hand held computing
devices (PDAs)
•2000s we have cell phones then Embedded computers

17
1. Classification
2. Short history
3. Performance growth
4. Measuring performance
5. CPU performance equation
6. Cost of chip
7. Dependability
8. Trends in technology
18
CLASSIFICATION
1) Personal Mobile Device (PMD)
• $100- $1000 per system, $10-$100 per processor
• Cost, energy, media performance, responsiveness
2) Desktop
• $ 500 to $ 5000 per system, $ 50 to $ 500 per Processor
• price/performance ratio, graphics performance, energy
3) Server
• $5000 to $ 10,000,000 per system, $ 200 to $ 2000 per Processor
• high throughput, scalability, availability, energy
4) Clusters/warehouse scale computers
• $100,000- $200,000,000 per system, $ 50- $ 250 per processor
• price/performance ratio, high throughput, energy, availability
5) Embedded System:
• $10 to $100,000 per system, $0.01 to $100 per Processor
• price, energy, application-specific performance
19
Contd…
•Computer architecture deals with architecture on several levels:

1. Digital Logic
• describes the implementation of digital components by logic gates

2. Microarchitecture (μA)
• defines the structure of the CPU including alu, control logic, internal busses, cache
hierarchy, .. .
• this level is also called computer organization

3. Instruction Set Architecture (ISA)


• defines instructions, registers and addressing modes
• represents the boundary between hard- and software

4. System Architecture
• defines system bus, memory system, I/O buses, the interconnect for several CPUs, ...

20
Instruction Set Architectures
• Classifying ISAs : based on access of the operands into 4 classes
• a) Stack
- all operands are on the upper stack positions
• b) Accumulator
- the first operand is in the accumulator, the second operand in memory
• c) Register-memory
- the first operand is in a register, the second operand in memory
• d) Register-register
- both operands have been loaded into registers before (therefore also called
load/store architecture)

• In some earlier architectures (like the DEC VAX architecture) also memory-memory access was
implemented
21
Seven Dimensions of an ISA
1. Class of ISA
2.Memory addressing
3. Addressing modes
4. Type and size of operands
5. Operations
6. Control flow instructions
7. Encoding an ISA

22
ARCHITECTURE.
•Implementation of computer components:
• Organization – high level aspects- memory design, memory interconnect,
designof CPU
• Hardware – detailed logic
• ISA

•All 3 collectively will be the ARCHITECTURE.

23
Short History
• Some important milestones of advanced computer architecture:
1965: IBM ships the first computer based on integrated circuits
1976: first vector CPU in supercomputer Cray-1
1978: Intel introduces the 8086 architecture
1984: Motorola 68020 is the first CPU with cache
1987: first RISC processors (SPARC and MIPS)
1989: Intel ships the 80486 CPU with over 1 million transistors
1992: DEC Alpha 21064 is the first 64-bit microprocessor
1995: Sun's UltraSPARC is the first CPU with a SIMD unit (DLP)
1998: DEC Alpha 21264 with 4-fold superscalarity (ILP) and out-of-order instruction execution
2000: first dual-core CPU (Power4) presented by IBM
24
Trends in Technology
Five implementation technologies, which changed at a dramatic pace :
• Integrated circuit logic technology
• Semiconductor DRAM (dynamic random-access memory)
• Semiconductor Flash (electrically erasable programmable read-only
memory)
• Magnetic disk technology
• Network technology
Trends in Technology
• The technology's feature size is decreased from 10 μm in 1971 to 0.09 μm in 2006
• Average annual increase during the last years:
– transistor density: ~ 35%
– chip (die) area: ~ 15%
– transistors per chip (die): ~ 55%
– CPU clock rate: ~ 30%
– DRAM capacity: ~ 40%
– hard disk density: ~ 30% (until 1990), ~ 100% (since 1990)

• Moore's law states that the number of transistors per chip doubles every 18 months
...

26
•IC Logic Technology
–Transistor density 35%
–Die size 10 to 20%
–Growth rate in transistor 40 to 55%

•Semiconductor DRAM
– 25% to 40% per year

• Semiconductor Flash
• 50 % to 60 % per year

•Magnetic Disk Technology


–Density 1990 30% and then to 60%
–1996 100%
–2004 40%

•Network Technology
27
–Performance of switches and transmission system
Performance Trends : Bandwidth Vs Latency
•Band width/ through put
–Total amount of work done in a given time
•Megabytes per sec for disk transfer

•Latency/response time
–Time b/w start and completion of an event
•Milliseconds for disk access

•Bandwidth grows by at least the square of the improvement in latency

28
Trends in Power& Energy in Integrated Circuits
Power also provides challenges.
1. Power : Brought in & distributed around the chip
2. Power dissipated as heat & must be removed.

• System architect perspective:


• Maximum power a processor ever requires
• Sustained power consumption (determines the cooling requirement)
• Energy efficiency ( Energy =power X time)

• power required per transistor:


• dynamic power α (½)x capacitive load x voltage2 x Frequency switched

➢Power & energy greatly reduced by lowering voltage from 5v to 1v in 20 years.
➢Technology improves
▪Number of transistors .

▪Load capacitance.

▪Voltage

▪Frequency which they switch

▪Overall growth in power consumption & energy.

▪Now most Micro Processors turn off clock for inactive modules -- Save energy & dynamic
power
How the vast amount of additional transistors can be used to take most profit of them?
– larger caches?
– increased instruction-level parallelism?
– more speculative instruction execution?
– replication of processors cores ?

• However the power consumption becomes more and more the limiting factor:
– dynamic power Pdynamic per transistor can be estimated by
Pdynamic = 1/2 × capacitive load × voltage2 × frequency

– capacitive load depends on feature size and number of outputs

– static power depends on leakage current:


Pstatic = current × voltage

31
How to improve energy efficiency?
• Do nothing well.
• Turn off the clock of inactive modules

• Dynamic Voltage-Frequency Scaling (DVFS).


• low activity periods -- no need to operate at the highest clock frequency and voltages.

• Design for typical case


• low power modes to save energy

• Overclocking
• safe to run at a higher clock rate for a short time possibly on just a few cores until
temperature starts to rise
Why Such Change in 10 years?
•Performance
–Technology Advances
•CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance
–Computer architecture advances improves low-end
•RISC, superscalar, RAID redundant array of inexpensive disks, …

•Price: Lower costs due to …


–Simpler development
•CMOS VLSI: smaller systems, fewer components
–Higher volumes
•CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units
–Lower margins by class of computer, due to fewer services
•Function
–Rise of networking/local interconnection technology

33
Trends in technology
•IC logic tech -35% then 40 to 55%
•Semi conductor DRAM – 40%
•Magnetic disk tech – 30 – 60 -100- (-30%)
•N/W tech – depends on perf of switches & transmission system.

34
Performance Growth
• Exponential growth of CPU's performance
- About 50% per year increase until 2002,
- only about 20% per year increase since 2002
⇒ no more growth in future?

•CPU performance growth is partly due to semiconductor technology:


– semiconductor feature size is shrinking
– transistors get smaller
⇒ more transistors per chip
⇒ higher clock rates

• What about memory performance?

35
Contd…
• Fast DRAM memory chips (DDR2, DDR3, RDRAM) appeared in the last 20 years:
• effective frequency for data transfer

•33 MHz in 1985 ® 800 MHz in 2006


• bandwidth

• 132 MByte/s in 1985 ®6.4 GByte/s in 2006


• latency for accessing a single memory word

• 125 ns in 1985 ® 45 ns in 2006


• DRAM chips are optimized for throughput, not for latency!

•The gap between CPU and memory speed is increasing by about 50% per year
• The low DRAM latency must be hidden by exploiting multilevel cache hierarchies.

36
Trends in Cost
•Manufacturing costs decrease over time – learning curve
•Increasing volumes affect cost in several ways.
cost decreases about 10% for each doubling of volume.
•Much of low end of computer business is a commodity
business.
overall product cost is lower because of the competition among the suppliers of the
components and the volume efficiencies the suppliers can achieve.

37
38
Cost of an Integrated Circuit
N is a parameter called the process-complexity factor, a measure of manufacturing difficulty.

For 40 nm processes in 2010, N ranged from 11.5 to 15.5


Cost of an Integrated Circuit
•Cost of IC =
C(Die) + C(T.Die) + C(P.F.Test) / Final test yield
• Cost of die =
cost of wafer/ Dies per wafer X Die yield
Dies Per Wafer = {[pi * (Wafer Diameter/2) ^ 2] / Die Area} - {[pi *
Wafer Diameter] / (2 * Die Area) ^ 1/2}

41
•Die Yield = Wafer Yield X (1 + [Defects per Area * Die Area /
alpha]) ^ (-alpha)
•Alpha =4

42
Why are we interested?

•manufacturing process → wafer cost, wafer yield, defects per


unit area, die area.
•No. of defects per unit area small, no.of good dies per wafer
•Designer affects die size, cost, die testing, packing and testing.

43
Cost Vs Price

•Cost – to manufacture a product


•Price - Product sells
•Cost Vs Price margin shrinking.
•Margins – R&D, marketing, sales, equipment maintenance,
building rental, cost of financing, profits and taxes.

44
Cost of Operation
• warehouse-scale computers, which contain tens of
thousands of servers,
• the cost to operate the computers is significant in addition to the cost of
purchase.

• purchase price of servers and networks is


• just over 60% of the monthly cost to operate a warehouse-scale computer,
• assuming a short lifetime of the IT equipment of 3 to 4 years

• About 30% of the monthly operational costs are for


• power use and to distribute power and to cool the IT equipment
Problem
1.Find the no. of dies per 300mm (30cm) wafer for a die that is
1.5cm on a side.
2.Find the die yield for dies that are 1.5cm on a side and 1.0cm
on a side, assuming a defect density of 0.4 per cm2 and alpha is 4

46
Dependability
•W.r.t SLA (Service Level Agreement)
•Architects must design systems to cope with challenges of dependability.
•Guarantee that their networking or power service would be dependable

•Systems will be in 2 states of service :


1.Service accomplishment
•Service is delivered as specified.
2.Service interruption
•Delivered service is different from SLA.

49
•Transitions b/w these 2 states caused by
–Failures (from state1 to state 2)
–Restorations (from state2 to state 1)

•Quantifying the transitions → measures of dependability:


–Module reliability
–Module availability

50
Module Reliability
•Is measure of continuous service accomplishment from a reference point.
–MTTF
•Reciprocal of MTTF is rate of failures – FIT (Failures per billion hours (109)of
operation)
•MTTF of 1,000,000hrs equals 1000 FIT
–MTTR
•Measured for service interruption
–MTBF
•MTTF + MTTR

51
Module availability
•Measure of service accomplishment w.r.t alteration b/w 2 states
of accomplishment and interruption.
MTTF
Module availability = ---------------------
(MTTF + MTTR)

52
8. Measuring, Reporting, Summarizing Performance
•Measure Performance
–Bench marks

•Report Performance
–Benchmark report

•Summarize Performance Results


–Yield a ratio proportional to performance

53
Measurement of Performance
• The performance px of a program x can be determined by measuring its execution time tx
• A program y is faster than a program x by a factor c, if
c= tx = 1/ px = py
----- --------- -----
ty 1/ py px

• Execution time can be measured in several ways:


1) elapsed time, also called wall-clock time:
– total time for a complete task, including disk accesses and OS overhead
– typically measured in sec. or msec.
2) CPU time:
– time the processor is computing and not waiting for I/O
– typically measured in cycles

54
Contd…
•Performance is typically given for benchmarks
• Several kinds of benchmarks:
•synthetic benchmarks: small programs that try to simulate the behavior of real
programs
examples: Dhrystone, Whetstone
•kernels: small key fragments of real programs
•toy problems: short complete programs
•benchmark suite: a collection of real programs, either from one or various application
areas
•example: SPEC benchmark
•Main problem of the first three benchmark kinds: chip architect can optimize the
hardware for the short benchmark codes

55
Contd…
• SPEC benchmark (Standard Performance Evaluation Corporation) consists of
•real application programs
•easily portable
•effect of I/O is minimized as far as possible
•written in C, C++ and Fortran
•well suited for measuring performance of desktop CPUs
•often updated to avoid manipulation: SPEC89, 92, 95, 2000 and SPEC2006
•SPEC2000 contains 12 integer and 14 float programs
•SPEC2000 integer benchmarks vary from part of a C compiler to a chess program to a
quantum computer simulation

56
Benchmarks (Contd…)
Desktop Server
•Processor intensive •Processor throughput-oriented
–Processing rate measured
•Graphics intensive •File server
–Tests perforamnce of I/O sys and processor
•Web server
–Request for static & dynamic pages
–Clients post data to server
•Transaction Processing
–Handles transactions with databases.

57
Report Performance
•Reproducibility
•List everything another experimenter would need to duplicate results.
•Actual Performance time
•Both baseline and optimized results
•Tabular and graph format
•Audit and Cost information
•High performance
•Cost performance

58
Summarize Performance Result
•Summarize performance results of the suite in a single number.
•Compare Arithmetic Means of the execution times of the programs in suite
•Add weighing factor to each benchmark and have weighted arithmetic mean.
•Normalize execution times to a reference computer. (SPECRatio)
•Evaluate geometric mean from the above

59
9. Quantitative Principles of Computer Design
•Take advantage of parallelism
•Parallelism at system level
•Scalability - multiple processors & disks
•Parallelism among instructions
•Pipelining
•Parallelism at detailed design
•Set-associative caches use multiple banks of memory
•Modern ALUs use carry look- ahead
–A carry look-ahead adder improves speed by reducing the amount of time required to
determine carry bits

60
•Principle Of Locality
•Programs tend to reuse data and instructions that they have
used recently.
•Program spends 90% of its execution time in only 10% of the code.

•Temporal Locality
•Recently accessed items are likely to be accessed in the near future
•Spatial Locality
•Items whose addresses are near one another tend to be referenced close
together in time.

61
•Focus on the common case
•Determining how to spend resources
•Power
•Resource allocation
•Performance
•Decide what the frequent case is and
•how much performance can you improve by making it faster.
•Amdahl’s law

08/13/19 62
Amdahl’s law
•Measures the performance gain obtained by improving some
portion of a computer.
•“Performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster
mode can be used ”

63
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E
–Fraction enhanced
• fraction of comp. time that can be converted for enhancement (20 sec of total
60 sec)
•<=1

–Speedup enhanced
•Improvement gained by enhanced execution mode (5sec original mode and 2
sec enhanced mode)
•>1

64
Amdahl’s Law

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced


Speedupenhanced

1
ExTimeold
Speedupoverall = =
ExTimenew (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced

65
Processor performance equation
•Ticks, clock ticks, clock periods, clocks, cycles, clock cycles – discrete time
events

•CPU time = CPU clock cycles for a program X Clock cycle time
or
•CPU time = CPU clock cycles for a program

Clock Rate

66
•Avg. CPI = CPU clock cycles for a program
Instruction Count (IC)
•Clock cycles = CPI X IC
•CPU time = IC X CPI X Clock cycle time

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle

•Processor performance is dependent on:


• clock cycle time(rate)
• Instruction count
• Clock cycles per instruction
67
•10% improvement in one of them lead to 10% improvement in
CPU time… but…
•Clock cycle time – H/W Technology and organization
•CPI - organization and ISA
•IC – ISA and compiler technology

68
69
Pitfalls
•Falling prey to Amdahl’s law
•Too much effort and disappointing results.
•A single point of failure
•One component failure brings down the whole system.
•Fault detection can lower availability.
•Detect faults, no mechanism to correct them.

70
Fallacies
•Cost of processor dominates cost of system
•Only 20% for servers and 10% for PCs
•Benchmarks remain valid indefinitely
•Updates available from time to time.
•Rated MTTF is 1,200,00hrs (140 yrs), so disks practically never
fail.
•Peak performance tracks observed performance
•Varies from 5% to 58%

71
Assignment -1

Exercises from 5th edition 1st chapter:


1.9, 1.10, 1.11, 1.15, 1.17.
Law of diminishing returns
•If one picks optimally (achieved speed-up) what to improve you will see
monotonically decreasing improvements as you improve.
•If, however, one picks non-optimally, after improving a sub-optimal
component and moving on to improve a more optimal improvement one
can see an increase in return.

73

You might also like