0% found this document useful (0 votes)
59 views

Computer Abstractions and Technology

intro to Computer Architecture
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Computer Abstractions and Technology

intro to Computer Architecture
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter 1

Computer Abstractions
and Technology

Progress in computer technology

Makes novel applications feasible

Underpinned by Moores Law

1.1 Introduction

The Computer Revolution

Computers in automobiles
Cell phones
Human genome project
World Wide Web
Search Engines

Computers are pervasive


Chapter 1 Computer Abstractions and Technology 2

Classes of Computers

Desktop computers

Server computers

General purpose, variety of software


Subject to cost/performance tradeoff
Network based
High capacity, performance, reliability
Range from small servers to building sized

Embedded computers

Hidden as components of systems


Stringent power/performance/cost constraints
Chapter 1 Computer Abstractions and Technology 3

The Processor Market

Chapter 1 Computer Abstractions and Technology 4

What You Will Learn

How programs are translated into the


machine language

The hardware/software interface


What determines program performance

And how the hardware executes them

And how it can be improved

How hardware designers improve


performance
What is parallel processing
Chapter 1 Computer Abstractions and Technology 5

Understanding Performance

Algorithm

Programming language, compiler, architecture

Determine number of machine instructions executed


per operation

Processor and memory system

Determines number of operations executed

Determine how fast instructions are executed

I/O system (including OS)

Determines how fast I/O operations are executed

Chapter 1 Computer Abstractions and Technology 6

Application software

Written in high-level language

System software

Compiler: translates HLL code to


machine code
Operating System: service code

1.2 Below Your Program

Below Your Program

Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources

Hardware

Processor, memory, I/O controllers


Chapter 1 Computer Abstractions and Technology 7

What is Computer Architecture?


Application (Netscape)

Software
Hardware

Operating System
Compiler
(Unix;
Assembler Windows 9x)
Processor Memory I/O system

Instruction Set
Architecture

Datapath & Control


Digital Design
Circuit Design

transistors, IC layout

CS 161

Key Idea: levels of abstraction

hide unnecessary implementation details


helps us cope with enormous complexity of real
systems

The Instruction Set: A Critical


Interface
software

instruction set

hardware

Levels of Program Code

High-level language

Assembly language

Level of abstraction closer


to problem domain
Provides for productivity
and portability
Textual representation of
instructions

Hardware representation

Binary digits (bits)


Encoded instructions and
data
Chapter 1 Computer Abstractions and Technology 10

The BIG Picture

Same components for


all kinds of computer

Desktop, server,
embedded

1.3 Under the Covers

Components of a Computer

Input/output includes

User-interface devices

Storage devices

Display, keyboard, mouse


Hard disk, CD/DVD, flash

Network adapters

For communicating with


other computers

Chapter 1 Computer Abstractions and Technology 11

Anatomy of a Computer
Output
device

Network
cable

Input
device

Input
device

Chapter 1 Computer Abstractions and Technology 12

Inside the Processor

AMD Barcelona: 4 processor cores

Chapter 1 Computer Abstractions and Technology 13

A Safe Place for Data

Volatile main memory

Loses instructions and data when power off

Non-volatile secondary memory

Magnetic disk
Flash memory
Optical disk (CDROM, DVD)

Chapter 1 Computer Abstractions and Technology 14

Networks

Communication and resource sharing


Local area network (LAN): Ethernet

Within a building

Wide area network (WAN: the Internet


Wireless network: WiFi, Bluetooth

Chapter 1 Computer Abstractions and Technology 15

The von Neumann Computer

Stored-Program Concept Storing programs as


numbers by John von Neumann Eckert and
Mauchly worked in engineering the concept.
Idea: A program is written as a sequence of
instructions, represented by binary numbers. The
instructions are stored in the memory just as data. They
are read one by one, decoded and then executed by the
CPU.

Historical Perspective

1944: The First Electronic Computer ENIAC at


IAS, Princeton Univ. (18,000 vacuum tubes)

Decade of 70s (Microprocessors)


Programmable Controllers, Single Chip Microprocessors
Personal Computers

Decade of 80s (RISC Architecture)


Instruction Pipelining, Fast Cache Memories
Compiler Optimizations

Decade of 90s (Instruction Level Parallelism)


Superscalar Processors, Instruction Level Parallelism (ILP),
Aggressive Code Scheduling, Out of Order Execution

Decade of 2000s (Multi-core processors)


Thread Level Parallelism (TLP), Low Cost Supercomputing

Performance Growth In Perspective

Doubling every 18 months since 1982


Cars travel at 11,000 mph; get 4000 miles/gal
Air Travel LA-NY in 90 seconds (Mach 200)
Wheat yield 20,000 bushels per acre

Doubling every 24 months since 1970


Cars travel at 200,000 mph; get 50,000 miles/gal
Air Travel LA-NY in 6 seconds (Mach 3,000)
Wheat yield 300,000 bushels per acre

Technology => Dramatic Change

Processor

Main Memory

2X in performance every 1.5 years; 1000X


performance in last decade (Moores Law)
DRAM capacity: 2x / 2 years; 1000X size in
last decade
Cost/bit: improves about 25% per year

Disk

capacity: > 2X in size every 1.5 years


Cost/bit: improves about 60% per year

Technology Trends

Electronics
technology
continues to evolve

Increased capacity
and performance
Reduced cost

Year

Technology

1951

Vacuum tube

1965

Transistor

1975

Integrated circuit (IC)

1995

Very large scale IC (VLSI)

2005

Ultra large scale IC

DRAM capacity

Relative performance/cost
1
35
900
2,400,000
6,200,000,000
Chapter 1 Computer Abstractions and Technology 20

1.7 Real Stuff: The AMD Opteron X4

Manufacturing ICs

Yield: proportion of working dies per wafer


Chapter 1 Computer Abstractions and Technology 21

AMD Opteron X2 Wafer

X2: 300mm wafer, 117 chips, 90nm technology


X4: 45nm technology
Chapter 1 Computer Abstractions and Technology 22

Integrated Circuit Cost


Cost per wafer
Cost per die
Dies per wafer Yield
Dies per wafer Wafer area Die area
1
Yield
(1 (Defects per area Die area/2))2

Nonlinear relation to area and defect rate

Wafer cost and area are fixed


Defect rate determined by manufacturing process
Die area determined by architecture and circuit design
Chapter 1 Computer Abstractions and Technology 23

Which airplane has the best performance?

1.4 Performance

Defining Performance

Chapter 1 Computer Abstractions and Technology 24

Response Time and Throughput

Response time

How long it takes to do a task

Throughput

Total work done per unit time

How are response time and throughput affected


by

e.g., tasks/transactions/ per hour

Replacing the processor with a faster version?


Adding more processors?

Well focus on response time for now


Chapter 1 Computer Abstractions and Technology 25

Relative Performance

Define Performance = 1/Execution Time


X is n time faster than Y
Performanc e X Performanc e Y
Execution time Y Execution time X n

Example: time taken to run a program

10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Chapter 1 Computer Abstractions and Technology 26

Measuring Execution Time

Elapsed time

Total response time, including all aspects

Processing, I/O, OS overhead, idle time

Determines system performance

CPU time

Time spent processing a given job

Discounts I/O time, other jobs shares

Comprises user CPU time and system CPU


time
Different programs are affected differently by
CPU and system performance
Chapter 1 Computer Abstractions and Technology 27

CPU Clocking

Operation of digital hardware governed by a


constant-rate clock
Clock period

Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle

e.g., 250ps = 0.25ns = 2501012s

Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0109Hz


Chapter 1 Computer Abstractions and Technology 28

CPU Time
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles

Clock Rate

Performance improved by

Reducing number of clock cycles (good


algorithm or hardware design)
Increasing clock rate (good technology)
Hardware designer must often trade off clock
rate against cycle count
Chapter 1 Computer Abstractions and Technology 29

CPU Time Example

Computer A: 2GHz clock, 10s CPU time


Designing Computer B

Aim for 6s CPU time


Can do faster clock, but causes 1.2 clock cycles

How fast must Computer B clock be?


Clock CyclesB 1.2 Clock Cycles A
Clock RateB

CPU Time B
6s
Clock Cycles A CPU Time A Clock Rate A
10s 2GHz 20 10 9
1.2 20 10 9 24 10 9
Clock RateB

4GHz
6s
6s
Chapter 1 Computer Abstractions and Technology 30

Instruction Count and CPI


Clock Cycles Instruction Count Cycles per Instruction
CPU Time Instruction Count CPI Clock Cycle Time
Instruction Count CPI

Clock Rate

Instruction Count for a program

Determined by program, ISA and compiler

Average cycles per instruction

Determined by CPU hardware


If different instructions have different CPI

Average CPI affected by instruction mix


Chapter 1 Computer Abstractions and Technology 31

CPI Example

Computer A: Cycle Time = 250ps, CPI = 2.0


Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time
CPU Time

Instruction Count CPI Cycle Time


A
A
I 2.0 250ps I 500ps
A is faster

Instruction Count CPI Cycle Time


B
B
I 1.2 500ps I 600ps

CPU Time

B I 600ps 1.2
CPU Time
I 500ps
A

by this much

Chapter 1 Computer Abstractions and Technology 32

CPI in More Detail

If different instruction classes take different


numbers of cycles
n

Clock Cycles (CPIi Instruction Count i )


i1

Weighted average CPI

n
Clock Cycles
Instruction Count i

CPI
CPIi

Instruction Count i1
Instruction Count

Relative frequency
Chapter 1 Computer Abstractions and Technology 33

CPI Example

Alternative compiled code sequences using


instructions in classes A, B, C
Class

CPI for class

IC in sequence 1

IC in sequence 2

Sequence 1: IC = 5

Clock Cycles
= 21 + 12 + 23
= 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6

Clock Cycles
= 41 + 12 + 13
=9
Avg. CPI = 9/6 = 1.5

Chapter 1 Computer Abstractions and Technology 34

Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time

Program
Instruction Clock cycle

Performance depends on

Algorithm: affects IC, possibly CPI


Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Chapter 1 Computer Abstractions and Technology 35

1.5 The Power Wall

Power Trends

In CMOS IC technology
Power Capacitive load Voltage 2 Frequency
30

5V 1V

1000

Chapter 1 Computer Abstractions and Technology 36

Reducing Power

Suppose a new CPU has

85% of capacitive load of old CPU


15% voltage and 15% frequency reduction

Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85


4

0.85
0.52
2
Pold
Cold Vold Fold

The power wall

We cant reduce voltage further


We cant remove more heat

How else can we improve performance?


Chapter 1 Computer Abstractions and Technology 37

1.6 The Sea Change: The Switch to Multiprocessors

Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency
Chapter 1 Computer Abstractions and Technology 38

Multiprocessors

Multicore microprocessors

More than one processor per chip

Requires explicitly parallel programming

Compare with instruction level parallelism

Hardware executes multiple instructions at once


Hidden from the programmer

Hard to do

Programming for performance


Load balancing
Optimizing communication and synchronization
Chapter 1 Computer Abstractions and Technology 39

SPEC CPU Benchmark

Programs used to measure performance

Standard Performance Evaluation Corp (SPEC)

Supposedly typical of actual workload


Develops benchmarks for CPU, I/O, Web,

SPEC CPU2006

Elapsed time to execute a selection of programs

Negligible I/O, so focuses on CPU performance

Normalize relative to reference machine


Summarize as geometric mean of performance ratios

CINT2006 (integer) and CFP2006 (floating-point)


n

Execution time ratio

i 1

Chapter 1 Computer Abstractions and Technology 40

CINT2006 for Opteron X4 2356


Name

Description

IC109

CPI

Tc (ns)

Exec time

Ref time

SPECratio

perl

Interpreted string processing

2,118

0.75

0.40

637

9,777

15.3

bzip2

Block-sorting compression

2,389

0.85

0.40

817

9,650

11.8

gcc

GNU C Compiler

1,050

1.72

0.47

24

8,050

11.1

mcf

Combinatorial optimization

336

10.00

0.40

1,345

9,120

6.8

go

Go game (AI)

1,658

1.09

0.40

721

10,490

14.6

hmmer

Search gene sequence

2,783

0.80

0.40

890

9,330

10.5

sjeng

Chess game (AI)

2,176

0.96

0.48

37

12,100

14.5

libquantum

Quantum computer simulation

1,623

1.61

0.40

1,047

20,720

19.8

h264avc

Video compression

3,102

0.80

0.40

993

22,130

22.3

omnetpp

Discrete event simulation

587

2.94

0.40

690

6,250

9.1

astar

Games/path finding

1,082

1.79

0.40

773

7,020

9.1

xalancbmk

XML parsing

1,058

2.70

0.40

1,143

6,900

6.0

Geometric mean

11.7

High cache miss rates


Chapter 1 Computer Abstractions and Technology 41

SPEC Power Benchmark

Power consumption of server at different


workload levels

Performance: ssj_ops/sec
Power: Watts (Joules/sec)

Overall ssj_ops per Watt

10

ssj_ops
i0

10

power
i0

Chapter 1 Computer Abstractions and Technology 42

SPECpower_ssj2008 for X4
Target Load %

Performance (ssj_ops/sec)

Average Power (Watts)

100%

231,867

295

90%

211,282

286

80%

185,803

275

70%

163,427

265

60%

140,160

256

50%

118,324

246

40%

920,35

233

30%

70,500

222

20%

47,126

206

10%

23,066

180

0%

141

1,283,590

2,605

Overall sum
ssj_ops/ power

493
Chapter 1 Computer Abstractions and Technology 43

Improving an aspect of a computer and


expecting a proportional improvement in
overall performance
Timproved

Example: multiply accounts for 80s/100s

Taffected

Tunaffected
improvemen t factor

1.8 Fallacies and Pitfalls

Pitfall: Amdahls Law

How much improvement in multiply performance to


get 5 overall?
80
Cant be done!
20
20
n

Corollary: make the common case fast


Chapter 1 Computer Abstractions and Technology 44

Fallacy: Low Power at Idle

Look back at X4 power benchmark

Google data center

At 100% load: 295W


At 50% load: 246W (83%)
At 10% load: 180W (61%)
Mostly operates at 10% 50% load
At 100% load less than 1% of the time

Consider designing processors to make


power proportional to load
Chapter 1 Computer Abstractions and Technology 45

Pitfall: MIPS as a Performance Metric

MIPS: Millions of Instructions Per Second

Doesnt account for

Differences in ISAs between computers


Differences in complexity between instructions

Instruction count
MIPS
Execution time 10 6
Instruction count
Clock rate

6
Instruction count CPI
CPI

10
6
10
Clock rate

CPI varies between programs on a given CPU


Chapter 1 Computer Abstractions and Technology 46

Cost/performance is improving

Hierarchical layers of abstraction

In both hardware and software

Instruction set architecture

Due to underlying technology development

1.9 Concluding Remarks

Concluding Remarks

The hardware/software interface

Execution time: the best performance


measure
Power is a limiting factor

Use parallelism to improve performance


Chapter 1 Computer Abstractions and Technology 47

You might also like