0% found this document useful (0 votes)
108 views

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

This document provides an overview of an advanced computer architecture course including administrative details like instructors, topics, goals, activities, expectations, components and policies. It outlines the course structure, assignments, projects and expectations. It also details policies around grading, academic integrity and late work.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

This document provides an overview of an advanced computer architecture course including administrative details like instructors, topics, goals, activities, expectations, components and policies. It outlines the course structure, assignments, projects and expectations. It also details policies around grading, academic integrity and late work.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ECE 252 / CPS 220 Administrivia

Advanced Computer Architecture I


• addresses, email, website, etc.
Fall 2009 • list of topics
Duke University • expected background
• course requirements
Prof. Daniel Sorin ([email protected]) • grading and academic misconduct

based on slides developed by

Profs. Roth (Penn), Hill, Wood, Sohi, Smith,


Lipasti (Wisconsin), and Vijaykumar (Purdue)

© 2009 by by Sorin, Roth, Hill, Wood, ECE 252/ CPS 220 Lecture Notes 1 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 2
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Instructors Where to Get Answers


professor: Dan Sorin ([email protected]) Consult course resources in this order:
• research: fault-tolerant computer architecture, verification- • Course Website (http://www.ee.duke.edu/~sorin/ece252)
aware microprocessor design, memory systems for
• ECE 252 Google Group
multicore processors, plus other topics
• teaching: architecture (152, 252, 259), fault-tolerance (254) • TAs: Meng Zhang and Jun Pang
• Professor Sorin
• office: 209C Hudson Hall
• office hours: TBD
TAs: Meng Zhang (mz28@ee) and Jun Pang (pangjun@cs) Email to TAs and Professor must have subject that begins with
ECE252
• PhD students in computer architecture
• Otherwise I can’t promise it won’t end up in spam folder
• office hours: TBD

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 3 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 4
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
What is This Course All About? Course Goals and Activities
State-of-the-art computer hardware design Course Goals
Topics + Understand how current processors work
• Microarchitecture of single core microprocessors + Understand how to evaluate/compare processors
• Memory system architecture + Learn how to use simulator to perform experiments
• Multithreaded processors + Learn research skills by performing term project
• Multicore processors + Learn how to critically read research papers
Course Activities:
Fundamentals, current systems, and future systems • Will loosely follow textbook
• Major emphasis on cutting-edge issues
Will read from: classic papers, brand-new papers, textbook
• Students will read and discuss many research papers
• Term project

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 5 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 6
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

What You Should Expect from Course What I Expect You to Know Already
Things NOT to expect in this course: Courses you should have taken already
• 100% of class = me lecturing to you • Basic architecture (ECE 152 / CPS 104 or equivalent)
• Homework sets and exams where every question is either • Programming in C/C++/Java (our simulator is in C)
quantitative or has a single correct answer
• Basic OS (ECE 153 / CPS 110) — not critical, but helpful
Topics you should remember fondly - I will not cover these in any
Things to expect in this course: detail in this course
• Active discussions/arguments about architecture ideas • Instruction sets, computer arithmetic, assembly
programming, memory, I/O
• Essay questions
Topics that wil be briefly reviewed but that you should’ve seen
• Being asked to explain, discuss, defend, and argue
before
• Questions with multiple possible answers
• Pipelining, caches, virtual memory

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 7 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 8
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Course Components Term Project
Reading Materials This is a semester-long research project
• Computer Architecture: A Quantitative Approach by • Do not expect to do whole thing in last week, because
Hennessy and Patterson, 4th Edition E[project grade] < B
• (optional) Modern Processor Design by Shen and Lipasti • I will suggest a bunch of possible project ideas, but many
students choose to pursue their own ideas
• Recent research papers (on course website)
• Project proposals due TBD
Homework
You may “combine” this project with a project from another class,
• 4 to 6 homework assignments, performed in groups of 2
but you MUST consult with me first
Term Project
You must absolutely, positively reference prior work
• Groups of 2 or 3
• Please ask me if you have ANY questions
Exams • Not knowing != valid excuse
• Midterm and final exam

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 9 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 10
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Grading Academic Integrity and Late Policy


Grading breakdown Academic Misconduct
• Homework: 30% • University policy will be followed strictly
• Midterm: 15% • Zero tolerance for cheating and/or plagiarism
• Project: 25%
• Final: 30%
Late policy
• Late homeworks (except for dean’s excuses)
• late <1 day = 50% off
• late >1 day = zero
• No late term project will be accepted (except for dean’s
excuses). Period.

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 11 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 12
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
A Friendly Warning
This is not an easy class. What is Computer Architecture?
Seriously.
The term architecture is used here to describe the attributes of a
If you’re an ECE undergrad: consider programming in C system as seen by the programmer, i.e., the conceptual structure
and functional behavior as distinct from the organization of the
If you’re a CS grad: consider having to think about circuits dataflow and controls, the logic design, and the physical
implementation.
Please see me if you think you might be getting in over your head!
–Gene Amdahl, IBM Journal of R&D, Apr 1964

Now, moving on to computer architecture ...

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 13 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 14
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Architecture and Other Disciplines Levels of Computer Architecture


architecture
Application Software
• functional appearance to software
• opcodes, addressing modes, architected registers
Operating Systems, Compilers, Networking Software
microarchitecture (= focus of this course)
Computer Architecture • logical structure that implements the architecture
• pipelining, functional units, caches, physical registers

Circuits, Wires, Devices, Network Hardware realization (circuits)


• physical structure that embodies the implementation
• gates, cells, transistors, wires
Architecture interacts with many other fields
• Can’t be studied in a vacuum

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 15 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 16
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
The Role of the Microarchitect Applications -> Requirements -> Designs
architect: defines the hardware/software interface • scientific: weather prediction, molecular modeling
• need: large memory, floating-point arithmetic
microarchitect: defines the hardware implementation • examples: CRAY XT4, IBM BlueGene/L
• usually the same person as the architect • commercial: inventory, payroll, web serving, e-commerce
• need: integer arithmetic, high I/O
• examples: SUN SPARCcenter, Enterprise, AlphaServer GS320
Two very important questions in this course: • desktop: multimedia, games, entertainment
• need: high data bandwidth, graphics
What goals are we (microarchitects!) trying to achieve? • examples: Intel Core2 Quad, AMD Opteron QuadCore, IBM Power6
And what units do we use to measure our success? • mobile: laptops, netbooks, tablet PCs
• need: low power (battery), decent performance
Hint: how do you decide which computer to buy? • examples: Intel Celeron, AMD Turion, Intel Atom
• desktop? laptop? smart phone? mp3 player? • embedded: cell phones, automobile engines, door knobs
• is a Dell box better/worse than an iMac? • need: low power (battery + heat), low cost
• examples: ARM core, Intel Atom
© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 17 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 18
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Why Study Computer Architecture? Why Study Computer Architecture?


answer #1: requirements are always changing answer #2: technology playing field is always changing
aren’t computers fast enough already? • annual technology improvements (approximate)
• SRAM (logic): density +25%, speed +20%
• are they? • DRAM (memory): density + 60%, speed: + 4%
• fast enough to do everything we will EVER want? • disk (magnetic): density +25%, speed: + 4%
• AI, gaming, virtual reality, gaming, protein sequencing, gaming, ... • network interface: 10 Mb/s -> 100 Mb/s -> 1 Gb/s -> 10 GB/s -> ?

• “if you build it, they will come” • parameters change and change relative to one another!
• and that’s not even including “exotic” nanotechnologies
is speed the only goal? • or, for that matter, less exotic technologies like Flash memory
• power: heat dissipation + battery life + utility bill
• cost
designs change even if requirements fixed
• reliability
• etc. ... but requirements are not fixed

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 19 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 20
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Examples of Changing Designs Moore’s Law
example I: caches “Cramming More Components onto Integrated Circuits”
• 1970: 10K transistors, DRAM faster than logic -> bad idea –G.E. Moore, Electronics, 1965
• 1990: 1M transistors, logic faster than DRAM -> good idea
• will caches ever be a bad idea again? • observation: (DRAM) transistor density doubles annually
example II: out-of-order execution • became known as “Moore’s Law”
• wrong—density doubles every 18 months (had only 4 data points)
• 1985: 100K transistors + no precise interrupts -> bad idea
• corollaries
• 1995: 2M transistors + precise interrupts -> good idea • cost per transistor halves annually (18 months)
• 2005: 500M transistors + 4GHz clock -> bad idea? • power per transistor decreases with scaling
• speed increases with scaling
• 2009: >1B transistors + multiple cores -> ??? • reliability starting to decrease with scaling
semiconductor technology is an incredible driving force

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 21 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 22
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Moore’s Law Evolution of Single-Chip Processors


“performance doubles every 18 months” 1971–1980 1981–1990 1991–2000 2010
• common interpretation of Moore’s Law, not original intent Transistor Count 10K–100K 100K–1M 1M–100M 4B?

• wrong! “performance” used to double every ~2 years Clock Frequency 0.2–2MHz 2–20MHz 20M–1GHz 2GHz?

• self-fulfilling prophecy (Moore’s Curve) IPC (per core) < 0.1 0.1–0.9 0.9–2.0 2.0?
• 2X every 2 years = ~3% increase per month MIPS/MFLOPS < 0.2 0.2–20 20–2,000 100,000?
• 3% per month used to judge performance features
Number of cores 1 1 1 64?
• if feature adds 9 months to schedule...
• ...it should add at least 30% to performance (1.039 = 1.30 → 30%)
• e.g., Intel Itanium: under Moore’s Curve in a big way
some perspective: 1971–2001 performance improved 35,000X!!!
• what if cars improved at this rate?
performance improvements have slowed down in past few years • 1971: 60 MPH & 10 MPG, 2001: 2,100,000 MPH & 350,000 MPG
• architects haven’t figured out how to use the extra • but... what if cars crashed as often as computers did?
transistors to improve performance of single core without
melting the chip --> multicore chips at lower frequencies

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 23 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 24
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance Readings
Much of the focus of this course is on improving performance Hennessy & Patterson
Topics: • Chapter 1
• performance metrics R. P. Colwell et al. “Instruction Sets and Beyond: Computers,
Complexity, and Controversy.” IEEE Computer, 18(9), 1996.
• CPU performance equation
• benchmarks and benchmarking
• reporting averages
• Amdahl’s Law
• Little’s Law
• concepts
• balance
• tradeoffs
• bursty behavior (average and peak performance)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 25 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 26
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Performance Metrics Performance Metric I: MIPS


latency: response time, execution time MIPS (millions of instructions per second)
• good metric for fixed amount of work (minimize time) • (instruction count / execution time in seconds) x 10-6
throughput: bandwidth, work per time – but instruction count is not a reliable indicator of work
• Prob #1: work per instruction varies (FP mult >> register move)
• = (1 / latency) when there is NO OVERLAP • Prob #2: instruction sets aren’t equal (3 Pentium instrs != 3 Alpha
• > (1 / latency) when there is overlap instrs)
• in real processors, there is always overlap (e.g., pipelining) – may vary inversely with actual performance
• good metric for fixed amount of time (maximize work) – particularly bad metric for multicore chips
comparing performance
• A is N times faster than B iff
• perf(A)/perf(B) = time(B)/time(A) = N
• A is X% faster than B iff
• perf(A)/perf(B) = time(B)/time(A) = 1 + X/100

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 27 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 28
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance Metric II: MFLOPS CPU Performance Equation
MFLOPS (millions of floating-point operations per second) processor performance = seconds / program
• (FP ops / execution time) x 10-6 • separate into three components (for single core)
• like MIPS, but counts only FP operations
• FP ops have longest latencies anyway (problem #1)
• FP ops are the same across machines (problem #2) instructions cycles seconds
x x
– may have been valid in 1980 (most programs were FP) program instruction cycle
• most programs today are “integer” i.e., light on FP
• load from memory takes longer than FP divide (prob #1)
• Cray doesn’t implement divide, Motorola has SQRT, SIN, COS (#2) architecture implementation realization
(ISA) (micro-architecture) (physical layout)
compiler-designer processor-designer circuit-designer

CPS 206 ECE 252 / CPS 220 ECE 261

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 29 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 30
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

CPU Performance Equation CPU Performance Comparison


instructions / program: dynamic instruction count famous example: “RISC Wars” (RISC vs. CISC)
• mostly determined by program, compiler, ISA • assume
• instructions / program: CISC = P, RISC = 2P
cycles / instruction: CPI • CPI: CISC = 8, RISC = 2
• mostly determined by ISA and CPU/memory organization • T = clock period for CISC and RISC (assume they are equal)
• CISC time = P x 8 x T = 8PT
seconds / cycle: cycle time, clock time, 1 / clock frequency
• RISC time = 2P x 2 x T = 4PT
• mostly determined by technology and CPU organization
• RISC time = CISC CPU time/2
uses of CPU performance equation
the truth is much, much, much more complex
• high-level performance comparisons
• actual data from IBM AS/400 (CISC -> RISC in 1995):
• back of the envelope calculations
• CISC time = P x 7 x T = 7PT
• helping architects think about compilers and technology • RISC time = 3.1P x 3 x T/3.1 = 3PT (+1 tech. gen.)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 31 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 32
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
CPU Back-of-the-Envelope Calculation Actually Measuring Performance
base machine how are execution-time & CPI actually measured?
• 43% ALU ops (1 cycle), 21% loads (1 cycle), 12% stores (2 • execution time: time (Unix cmd): wall-clock, CPU, system
cycles), 24% branches (2 cycles)
• CPI = CPU time / (clock frequency * # instructions)
• note: pretending latency is 1 because of pipelining
• more useful? CPI breakdown (compute, memory stall, etc.)
Q: should 1-cycle stores be implemented if it slows clock 15%? • so we know what the performance problems are (what to fix)
• old CPI = 0.43 + 0.21 + (0.12 x 2) + (0.24 x 2) = 1.36
measuring CPI breakdown
• new CPI = 0.43 + 0.21 + 0.12 + (0.24 x 2) = 1.24
• hardware event counters (built into core)
• speedup = (P x 1.36 x T) / (P x 1.24 x 1.15T) = 0.95 • calculate CPI using instruction frequencies/event costs

Answer: NO! • cycle-level microarchitecture simulator (e.g., SimpleScalar)


+ measure exactly what you want
– model microarchitecture faithfully (at least parts of interest)
• method of choice for many architects (yours, too!)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 33 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 34
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Benchmarks and Benchmarking Let’s Choose Some Benchmarks!


“program” as unit of work What benchmarks would you put in your benchmark suite?
• millions of them, many different kinds, which to use?
benchmarks
• standard programs for measuring/comparing performance
+ represent programs people care about
+ repeatable!!
• benchmarking process
• define workload
• extract benchmarks from workload
• execute benchmarks on candidate machines
• project performance on new machine
• run workload on new machine and compare
• not close enough -> repeat

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 35 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 36
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Benchmarks: Toys, Kernels, Synthetics Benchmarks: Real Programs
toy benchmarks: little programs that no one really runs real programs
• e.g., fibonacci, 8 queens + only accurate way to characterize performance
– little value, what real programs do these represent? – requires considerable work (porting)
• scary fact: used to prove the value of RISC in early 80’s
Standard Performance Evaluation Corporation (SPEC)
kernels: important (frequently executed) pieces of real programs
• e.g., Livermore loops, Linpack (inner product) • http://www.spec.org
+ good for focusing on individual features, but not big picture • collects, standardizes and distributes benchmark suites
– over-emphasize target feature (for better or worse) • consortium made up of industry leaders
synthetic benchmarks: programs made up for benchmarking • SPEC CPU (CPU intensive benchmarks)
• SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2006
• e.g., Whetstone, Dhrystone
• toy kernels++, which programs do these represent? • other benchmark suites
• SPECjvm, SPECmail, SPECweb, SPEComp

Other benchmark suite examples: TPC-C, TPC-H for databases


© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 37 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 38
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

SPEC CPU2006 Benchmarking Pitfalls


12 integer programs (C, C++) • benchmark properties mismatch with features studied
• gcc (compiler), perl (interpreter), hmmer (markov chain) • e.g., using SPEC for large cache studies
• bzip2 (compress), go (AI), sjeng (AI) • careless scaling
• libquantum (physics), h264ref (video)
• using only first few million instructions (initialization phase)
• omnetpp (simulation), astar (path finding algs)
• reducing program data size
• xalanc (XML processing), mcf (network optimization)
• choosing performance from wrong application space
17 floating point programs (C, C++, Fortran) • e.g., in a realtime environment, choosing gcc
• fluid dynamics: bwaves, leslie3d, ibm
• using old benchmarks
• quantum chemistry: gamess, tonto
• “benchmark specials”: benchmark-specific optimizations
• physics: milc, zeusmp, cactusADM
• gromacs (biochem) Benchmarks must be continuously maintained and updated!
• namd (bio, molec dynamics), dealll (finite element analysis)
• soplex (linear programming), povray (ray tracing)
• calculix (mechanics), GemsFDTD (computational E&M)
• wrf (weather), sphinx3 (speech recognition)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 39 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 40
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Reporting Average Performance What Does The Mean Mean?
averages: one of the things architects frequently get wrong arithmetic mean (AM): average execution times of N programs
+ pay attention now and you won’t get them wrong • ∑1..Ν(time(i)) / N
important things about averages (i.e., means) harmonic mean (HM): average IPCs of N programs
• ideally proportional to execution time (ultimate metric) • arithmetic mean cannot be used for rates (e.g., IPCs)
• Arithmetic Mean (AM) for times • 30 MPH for 1 mile + 90 MPH for 1 mile != avg. 60 MPH
• Harmonic Mean (HM) for rates (IPCs) • N / ∑1..N(1 / rate(i))
• Geometric Mean (GM) for ratios (speedups)
• there is no such thing as the average program geometric mean (GM): average speedups of N programs
• use average when absolutely necessary • N√(∏1..N(speedup(i))
what if programs run at different frequencies within workload?
• “weighting”
• weighted AM = (∑1..N w(i) * time(i)) / N

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 41 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 42
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

GM Weirdness Amdahl’s Law


what about averaging ratios (speedups)? “Validity of the Single-Processor Approach to Achieving Large-
• HM / AM change depending on which machine is the base Scale Computing Capabilities” –G. Amdahl, AFIPS, 1967
• let optimization speed up fraction f of program by factor s
machine A machine B B/A A/B • speedup = old / ([(1-f) x old] + f/s x old) = 1 / (1 - f + f/s)
Program1 1 10 10 0.1
Program2 1000 100 0.1 10
(10+.1)/2 = 5.05 (.1+10)/2 = 5.05 • f = 95%, s = 1.1 → 1/[(1-0.95) + (0.95/1.1)] = 1.094
AM
B is 5.05 times faster! A is 5.05 times faster! • f = 5%, s = 10 → 1/[(1-0.05) + (0.05/10)] = 1.047
2/(1/10+1/.1) = 5.05 2/(1/.1+1/10) = 5.05
HM B is 5.05 times faster! A is 5.05 times faster! • f = 5%, s = ∞ → 1/[(1-0.05) + (0.05/∞)] = 1.052
GM √(10*.1) = 1 √(.1*10) = 1 • f = 95%, s ∞ → 1/[(1-0.95) + (0.95/∞)] = 20
– geometric mean of ratios is not proportional to total time! make common case fast, but...
• if we take total execution time, B is 9.1 times faster
• GM says they are equal ...uncommon case eventually limits performance

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 43 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 44
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Little’s Law System Balance
Key Relationship between latency and bandwidth: each system component produces & consumes data
Average number in system = arrival rate * mean holding time • make sure data supply and demand is balanced
• X demand >= X supply ⇒ computation is “X-bound”
Possibly the most useful equation I know • e.g., memory bound, CPU-bound, I/O-bound
• Useful in design of computers, software, industrial • goal: be bound everywhere at once (why?)
processes, etc.
• X can be bandwidth or latency
Example: • X is bandwidth ⇒ buy more bandwidth
• How big of a wine cellar should we build? • X is latency ⇒ much tougher problem

• We drink (and buy) an average of 2 bottles per week


• On average, we want to age the wine for 5 years
• bottles in cellar = 2 bottles/week * 52 weeks/year * 5 years
• = 520 bottles

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 45 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 46
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Tradeoffs Bursty Behavior


“Bandwidth problems can be solved with money. Latency Q: to sustain 2 IPC... how many instructions should processor be
problems are harder, because the speed of light is fixed and you able to
can’t bribe God” – David Clark
• fetch per cycle?
well... • execute per cycle?
• can convert some latency problems to bandwidth problems • complete per cycle?
• solve those with money
A: NOT 2 (more than 2)
• the famous “bandwidth/latency tradeoff”
• dependences will cause stalls (under-utilization)
• if desired performance is X, peak performance must be > X
• architecture is the art of making tradeoffs
programs don’t always obey “average” behavior
• can’t design processor only to handle average behvaior

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 47 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 48
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance in the Real World Roadmap for Rest of Semester
A paper comparing performance of RISC vs. CISC and trying to Primary topics for rest of course
show that RISC is not obviously better
• Pipelined processors
• “Instruction Sets and Beyond: Computers, Complexity, and
Controversy” by Colwell et al., IEEE Computer 1986. • Multiple-issue (superscalar), in-order processors
• Hardware managed out-of-order instruction execution
• Static (compiler) instruction scheduling, VLIW, EPIC
• Advanced cache/memory issues
• Multithreaded processors
• Intro to multicore chips and multi-chip multiprocessors
Advanced topics
• Power-efficiency, fault tolerance, security, virtual machines,
grid processors, nanocomputing

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 49 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 50
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Topics NOT covered in this course Other Courses I’d Recommend


These topics have been well-covered in your prior courses Topics related to this course (non-exhaustive list!)
• Digital logic design • Advanced comp. architecture II (ECE 259/CPS 221)
• Computer arithmetic • VLSI design (ECE 261)
• Instruction sets • Fault tolerant computing (ECE 254/CPS 225)
• Cache/memory basics, including virtual memory • Performance/reliability analysis (ECE 255/257)
• I/O (disks, etc.) • Advanced digital system design (ECE 251)
• Operating systems (CPS 210)
• Compilers (offered at UNC or NC State)
If you’re uncomfortable with these topics, please see me

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 51 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 52
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

You might also like