CH6 - Computer Abstractions and Technology
CH6 - Computer Abstractions and Technology
Abstractions and
Technology
Computer Organization
502044
Acknowledgement
This slide show is intended for use in class, and is not a complete document.
Students need to refer to the book to read more lessons and exercises.
Students have the right to download and store lecture slides for reference
purposes; Do not redistribute or use for purposes outside of the course.
📧 trantrungtin.tdtu.edu.vn
2
Syllabus
● 6.1 Introduction
● 6.2 Great Ideas in Computer ● 6.5 Technologies for Building
Architecture Processors and Memory
● 6.3 Below Your Program ● 6.6 Performance
● 6.4 Under the Covers ● 6.7 The Power Wall
● 6.8 The Switch from
Uniprocessors to
Multiprocessors
3
CS2100 Computer Organisation Unit1 - 4
LEVELS OF REPRESENTATION
6.1 Introduction
● Our computers are digital systems, and implemented into Personal
computers, Servers and Embedded computers.
● Binary numbers / codes are suitable for digital circuits which works on
LOW / HIGH signals.
● Decimal, Binary, Octal and Hexadecimal are radixes using in computer
science.
● Numbers for calculating, Codes for transferring.
● Register, memory are physical devices that store the binary information.
6
The Computer Revolution
● Progress in computer technology
○ Underpinned by Moore’s Law
● Makes novel applications feasible
○ Computers in automobiles
○ Cell phones
○ Human genome project
○ World Wide Web
○ Search Engines
● Computers are pervasive
7
Classes of Computers
● Personal computers
○ General purpose, variety of software
○ Subject to cost/performance tradeoff
● Server computers
○ Network based
○ High capacity, performance, reliability
○ Range from small servers to building sized
8
Classes of Computers
● Supercomputers
○ High-end scientific and engineering calculations
○ Highest capability but represent a small fraction of the overall computer market
● Embedded computers
○ Hidden as components of systems
○ Stringent power/performance/cost constraints
9
The PostPC Era
10
The PostPC Era
● Personal Mobile Device (PMD)
○ Battery operated
○ Connects to the Internet
○ Hundreds of dollars
○ Smart phones, tablets, electronic glasses
● Cloud computing
○ Warehouse Scale Computers (WSC)
○ Software as a Service (SaaS)
○ Portion of software run on a PMD and a portion run in the Cloud
○ Amazon and Google
11
What You Will Learn
● How programs are translated into the machine language
○ And how the hardware executes them
● The hardware/software interface
● What determines program performance
○ And how it can be improved
● How hardware designers improve performance
● What is parallel processing
12
Understanding Performance
● Algorithm
○ Determines number of operations executed
● Programming language, compiler, architecture
○ Determine number of machine instructions executed per operation
● Processor and memory system
○ Determine how fast instructions are executed
● I/O system (including OS)
○ Determines how fast I/O operations are executed
13
6.2 Eight Great Ideas in Computer Architecture
● ●Our computers are digital systems, and implemented into Personal
computers, Servers and Embedded computers.
● ●Binary numbers / codes are suitable for digital circuits which works on
LOW / HIGH signals.
● ●Decimal, Binary, Octal and Hexadecimal are radixes using in computer
science.
● ●Numbers for calculating, Codes for transferring.
● ●Register, memory are physical devices that store the binary information.
14
Some Great Ideas
● Design for Moore’s Law
● Use abstraction to simplify design
● Make the common case fast
● Performance via parallelism
● Performance via pipelining
● Performance via prediction
● Hierarchy of memories
● Dependability via redundancy
15
6.3 Below Your Program
16
Below Your Program
● Application software
○ Written in high-level language
● System software
○ Compiler: translates HLL code to machine code
○ Operating System: service code
■ Handling input/output
■ Managing memory and storage
■ Scheduling tasks & sharing resources
● Hardware
○ Processor, memory, I/O controllers
17
Levels of Program Code
● High-level language
○ Level of abstraction closer to problem domain
○ Provides for productivity and portability
● Assembly language
○ Textual representation of instructions
● Hardware representation
○ Binary digits (bits)
○ Encoded instructions and data
18
6.4 Under the Covers
19
Components of a Computer
The BIG Picture
● Same components for
all kinds of computer
○ Desktop, server,
embedded
● Input/output includes
○ User-interface devices
■ Display, keyboard, mouse
○ Storage devices
■ Hard disk, CD/DVD, flash
○ Network adapters
■ For communicating with other computers
20
Touchscreen
● PostPC device
● Supersedes keyboard and mouse
● Resistive and Capacitive types
○ Most tablets, smart phones use capacitive
○ Capacitive allows multiple touches simultaneously
21
Through the Looking Glass
● LCD screen: picture elements (pixels)
○ Mirrors content of frame buffer memory
22
Opening the Box
Capacitive multitouch LCD screen
Computer board
23
Inside the Processor (CPU)
● Datapath: performs operations on data
● Control: sequences datapath, memory, ...
● Cache memory
○ Small fast SRAM memory for immediate access to data
24
Inside the Processor
● Apple A5
25
Abstractions The BIG Picture
● Abstraction helps us deal with complexity
○ Hide lower-level detail
● Instruction set architecture (ISA)
○ The hardware/software interface
● Application binary interface
○ The ISA plus system software interface
● Implementation
○ The details underlying and interface
26
A Safe Place for Data
● Volatile main memory
○ Loses instructions and data when power off
● Non-volatile secondary memory
○ Magnetic disk
○ Flash memory
○ Optical disk (CDROM, DVD)
27
Networks
● Communication, resource sharing, nonlocal access
● Local area network (LAN): Ethernet
● Wide area network (WAN): the Internet
● Wireless network: WiFi, Bluetooth
28
6.5 Technologies for Building Processors
and Memory
29
§1.5 Technologies for Building Processors and Memory
Technology Trends
● Electronics technology continues to evolve
○ Increased capacity and performance
○ Reduced cost
DRAM capacity
30
Semiconductor Technology
● Silicon: semiconductor
● Add materials to transform properties:
○ Conductors
○ Insulators
○ Switch
31
Manufacturing ICs
● Yield: proportion of working dies per wafer
32
Intel Core i7 Wafer
● 300mm wafer, 280 chips, 32nm technology
● Each chip is 20.7 x 10.5 mm
33
Integrated Circuit Cost
● Nonlinear relation to area and defect rate
Cost per wafer
Cost per die
○ Wafer cost and area are fixed=
Dies per wafer Yield
○ Defect rate determined by manufacturing process
○ Die area determined by architecture
Dies per wafer Waferand circuit
areadesign
Die area
1
Yield =
(1+ (Defects per area Die area/2))2
34
6.6 Performance
35
§1.6 Performance
Defining Performance
● Which airplane has the best performance?
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
36
Response Time and Throughput
● Response time
○ How long it takes to do a task
● Throughput
○ Total work done per unit time
■ e.g., tasks/transactions/… per hour
● How are response time and throughput affected by
○ Replacing the processor with a faster version?
○ Adding more processors?
● We’ll focus on response time for now…
37
Relative Performance
● Define Performance = 1/Execution Time
● “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n
39
CPU Clocking
● Operation of digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
41
CPU Time Example
● Computer A: 2GHz clock, 10s CPU time
● Designing Computer B
○ Aim for 6s CPU time
○ Can do faster clock, but causes 1.2 × clock cycles
● How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A Clock Rate A
= 10s 2GHz = 20 109
1.2 20 109 24 109
Clock RateB = = = 4GHz
6s 6s
42
Instruction Count and CPI
● Instruction Count for a program
○ Determined by program, ISA and compiler
● Average cycles per instruction
○ Determined by CPU hardware
○ If different instructions have different CPI
■ Average CPI affected by instruction mix
43
CPI Example
● Computer A: Cycle Time = 250ps, CPI = 2.0
● Computer B: Cycle Time = 500ps, CPI = 1.2
● Same ISA
● Which is faster, and by how much?
B = I 600ps = 1.2
CPU Time
…by this
CPU Time I 500ps
A much
44
CPI in More Detail
● If different instruction classes take different numbers of cycles
n
Clock Cycles = (CPIi Instruction Count i )
i=1
Relative frequency
45
CPI Example
● Alternative compiled code sequences using instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = ◼ Avg. CPI = 9/6 =
2.0 1.5 46
Performance Summary
● Performance depends on
○ Algorithm: affects IC, possibly CPI
○ Programming language: affects IC, CPI
○ Compiler: affects IC, CPI
○ Instruction set architecture: affects IC, CPI, Tc
47
6.7 The Power Wall
48
Power Trends
● In CMOS IC technology
×30 5V → 1V ×1000
49
Reducing Power
● Suppose a new CPU has
○ 85% of capacitive load of old CPU
○ 15% voltage and 15% frequency reduction
51
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
52
Multiprocessors
● Multicore microprocessors
○ More than one processor per chip
● Requires explicitly parallel programming
○ Compare with instruction level parallelism
■ Hardware executes multiple instructions at once
■ Hidden from the programmer
○ Hard to do
■ Programming for performance
■ Load balancing
■ Optimizing communication and synchronization
53
SPEC CPU Benchmark
● Programs used to measure performance
○ Supposedly typical of actual workload
● Standard Performance Evaluation Corp (SPEC)
○ Develops benchmarks for CPU, I/O, Web, …
● SPEC CPU2006
○ Elapsed time to execute a selection of programs
■ Negligible I/O, so focuses on CPU performance
○ Normalize relative to reference machine
○ Summarize as geometric mean of performance ratios
■ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i=1
i
54
CINT2006 for Intel Core i7 920
55
SPEC Power Benchmark
● Power consumption of server at different workload levels
○ Performance: ssj_ops/sec
○ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_opsi poweri
i =0 i =0
56
SPECpower_ssj2008 for Xeon X5650
57
Pitfall: Amdahl’s Law
● Improving an aspect of a computer and expecting a proportional
improvement in overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
59
Pitfall: MIPS as a Performance Metric
● MIPS: Millions of Instructions Per Second
○ Doesn’t account for
■ Differences in ISAs between computers
■ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate
61
Multiprocessors
● Multicore microprocessors
○ More than one processor per chip
● Requires explicitly parallel programming
○ Compare with instruction level parallelism
■ Hardware executes multiple instructions at once
■ Hidden from the programmer
○ Hard to do
■ Programming for performance
■ Load balancing
■ Optimizing communication and synchronization
62
SPEC CPU Benchmark
● Programs used to measure performance
○ Supposedly typical of actual workload
● Standard Performance Evaluation Corp (SPEC)
○ Develops benchmarks for CPU, I/O, Web, …
● SPEC CPU2006
n
○ Elapsed time to execute a selection of programs
■ Negligible I/O, so focuses on CPU performance
n
Execution time ratio
i=1
i
63
CINT2006 for Intel Core i7 920
64
SPEC Power Benchmark
● Power consumption of server at different workload levels
○ Performance: ssj_ops/sec
○ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_opsi poweri
i =0 i =0
65
SPECpower_ssj2008 for Xeon X5650
66
Pitfall: Amdahl’s Law
● Improving an aspect of a computer and expecting a proportional
improvement in overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
68
Pitfall: MIPS as a Performance Metric
● MIPS: Millions of Instructions Per Second
○ Doesn’t account for
■ Differences in ISAs between computers
■ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate