0% found this document useful (0 votes)

108 views

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

This document provides an overview of an advanced computer architecture course including administrative details like instructors, topics, goals, activities, expectations, components and policies. It outlines the course structure, assignments, projects and expectations. It also details policies around grading, academic integrity and late work.

Uploaded by

Sangameshwer Enterprises

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

Uploaded by

Sangameshwer Enterprises

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ECE 252 / CPS 220 Administrivia

Advanced Computer Architecture I

• addresses, email, website, etc.
Fall 2009 • list of topics
Duke University • expected background
• course requirements
Prof. Daniel Sorin ([email protected]) • grading and academic misconduct

based on slides developed by

Profs. Roth (Penn), Hill, Wood, Sohi, Smith,

Lipasti (Wisconsin), and Vijaykumar (Purdue)

© 2009 by by Sorin, Roth, Hill, Wood, ECE 252/ CPS 220 Lecture Notes 1 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 2
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Instructors Where to Get Answers

professor: Dan Sorin ([email protected]) Consult course resources in this order:
• research: fault-tolerant computer architecture, verification- • Course Website (http://www.ee.duke.edu/~sorin/ece252)
aware microprocessor design, memory systems for
• ECE 252 Google Group
multicore processors, plus other topics
• teaching: architecture (152, 252, 259), fault-tolerance (254) • TAs: Meng Zhang and Jun Pang
• Professor Sorin
• office: 209C Hudson Hall
• office hours: TBD
TAs: Meng Zhang (mz28@ee) and Jun Pang (pangjun@cs) Email to TAs and Professor must have subject that begins with
ECE252
• PhD students in computer architecture
• Otherwise I can’t promise it won’t end up in spam folder
• office hours: TBD

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 3 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 4
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
What is This Course All About? Course Goals and Activities
State-of-the-art computer hardware design Course Goals
Topics + Understand how current processors work
• Microarchitecture of single core microprocessors + Understand how to evaluate/compare processors
• Memory system architecture + Learn how to use simulator to perform experiments
• Multithreaded processors + Learn research skills by performing term project
• Multicore processors + Learn how to critically read research papers
Course Activities:
Fundamentals, current systems, and future systems • Will loosely follow textbook
• Major emphasis on cutting-edge issues
Will read from: classic papers, brand-new papers, textbook
• Students will read and discuss many research papers
• Term project

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 5 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 6
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

What You Should Expect from Course What I Expect You to Know Already
Things NOT to expect in this course: Courses you should have taken already
• 100% of class = me lecturing to you • Basic architecture (ECE 152 / CPS 104 or equivalent)
• Homework sets and exams where every question is either • Programming in C/C++/Java (our simulator is in C)
quantitative or has a single correct answer
• Basic OS (ECE 153 / CPS 110) — not critical, but helpful
Topics you should remember fondly - I will not cover these in any
Things to expect in this course: detail in this course
• Active discussions/arguments about architecture ideas • Instruction sets, computer arithmetic, assembly
programming, memory, I/O
• Essay questions
Topics that wil be briefly reviewed but that you should’ve seen
• Being asked to explain, discuss, defend, and argue
before
• Questions with multiple possible answers
• Pipelining, caches, virtual memory

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 7 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 8
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Course Components Term Project
Reading Materials This is a semester-long research project
• Computer Architecture: A Quantitative Approach by • Do not expect to do whole thing in last week, because
Hennessy and Patterson, 4th Edition E[project grade] < B
• (optional) Modern Processor Design by Shen and Lipasti • I will suggest a bunch of possible project ideas, but many
students choose to pursue their own ideas
• Recent research papers (on course website)
• Project proposals due TBD
Homework
You may “combine” this project with a project from another class,
• 4 to 6 homework assignments, performed in groups of 2
but you MUST consult with me first
Term Project
You must absolutely, positively reference prior work
• Groups of 2 or 3
• Please ask me if you have ANY questions
Exams • Not knowing != valid excuse
• Midterm and final exam

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 9 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 10
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Grading Academic Integrity and Late Policy

Grading breakdown Academic Misconduct
• Homework: 30% • University policy will be followed strictly
• Midterm: 15% • Zero tolerance for cheating and/or plagiarism
• Project: 25%
• Final: 30%
Late policy
• Late homeworks (except for dean’s excuses)
• late <1 day = 50% off
• late >1 day = zero
• No late term project will be accepted (except for dean’s
excuses). Period.

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 11 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 12
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
A Friendly Warning
This is not an easy class. What is Computer Architecture?
Seriously.
The term architecture is used here to describe the attributes of a
If you’re an ECE undergrad: consider programming in C system as seen by the programmer, i.e., the conceptual structure
and functional behavior as distinct from the organization of the
If you’re a CS grad: consider having to think about circuits dataflow and controls, the logic design, and the physical
implementation.
Please see me if you think you might be getting in over your head!
–Gene Amdahl, IBM Journal of R&D, Apr 1964

Now, moving on to computer architecture ...

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 13 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 14
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Architecture and Other Disciplines Levels of Computer Architecture

architecture
Application Software
• functional appearance to software
• opcodes, addressing modes, architected registers
Operating Systems, Compilers, Networking Software
microarchitecture (= focus of this course)
Computer Architecture • logical structure that implements the architecture
• pipelining, functional units, caches, physical registers

Circuits, Wires, Devices, Network Hardware realization (circuits)

• physical structure that embodies the implementation
• gates, cells, transistors, wires
Architecture interacts with many other fields
• Can’t be studied in a vacuum

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 15 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 16
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
The Role of the Microarchitect Applications -> Requirements -> Designs
architect: defines the hardware/software interface • scientific: weather prediction, molecular modeling
• need: large memory, floating-point arithmetic
microarchitect: defines the hardware implementation • examples: CRAY XT4, IBM BlueGene/L
• usually the same person as the architect • commercial: inventory, payroll, web serving, e-commerce
• need: integer arithmetic, high I/O
• examples: SUN SPARCcenter, Enterprise, AlphaServer GS320
Two very important questions in this course: • desktop: multimedia, games, entertainment
• need: high data bandwidth, graphics
What goals are we (microarchitects!) trying to achieve? • examples: Intel Core2 Quad, AMD Opteron QuadCore, IBM Power6
And what units do we use to measure our success? • mobile: laptops, netbooks, tablet PCs
• need: low power (battery), decent performance
Hint: how do you decide which computer to buy? • examples: Intel Celeron, AMD Turion, Intel Atom
• desktop? laptop? smart phone? mp3 player? • embedded: cell phones, automobile engines, door knobs
• is a Dell box better/worse than an iMac? • need: low power (battery + heat), low cost
• examples: ARM core, Intel Atom
© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 17 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 18
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Why Study Computer Architecture? Why Study Computer Architecture?

answer #1: requirements are always changing answer #2: technology playing field is always changing
aren’t computers fast enough already? • annual technology improvements (approximate)
• SRAM (logic): density +25%, speed +20%
• are they? • DRAM (memory): density + 60%, speed: + 4%
• fast enough to do everything we will EVER want? • disk (magnetic): density +25%, speed: + 4%
• AI, gaming, virtual reality, gaming, protein sequencing, gaming, ... • network interface: 10 Mb/s -> 100 Mb/s -> 1 Gb/s -> 10 GB/s -> ?

• “if you build it, they will come” • parameters change and change relative to one another!
• and that’s not even including “exotic” nanotechnologies
is speed the only goal? • or, for that matter, less exotic technologies like Flash memory
• power: heat dissipation + battery life + utility bill
• cost
designs change even if requirements fixed
• reliability
• etc. ... but requirements are not fixed

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 19 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 20
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Examples of Changing Designs Moore’s Law
example I: caches “Cramming More Components onto Integrated Circuits”
• 1970: 10K transistors, DRAM faster than logic -> bad idea –G.E. Moore, Electronics, 1965
• 1990: 1M transistors, logic faster than DRAM -> good idea
• will caches ever be a bad idea again? • observation: (DRAM) transistor density doubles annually
example II: out-of-order execution • became known as “Moore’s Law”
• wrong—density doubles every 18 months (had only 4 data points)
• 1985: 100K transistors + no precise interrupts -> bad idea
• corollaries
• 1995: 2M transistors + precise interrupts -> good idea • cost per transistor halves annually (18 months)
• 2005: 500M transistors + 4GHz clock -> bad idea? • power per transistor decreases with scaling
• speed increases with scaling
• 2009: >1B transistors + multiple cores -> ??? • reliability starting to decrease with scaling
semiconductor technology is an incredible driving force

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 21 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 22
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Moore’s Law Evolution of Single-Chip Processors

“performance doubles every 18 months” 1971–1980 1981–1990 1991–2000 2010
• common interpretation of Moore’s Law, not original intent Transistor Count 10K–100K 100K–1M 1M–100M 4B?

• wrong! “performance” used to double every ~2 years Clock Frequency 0.2–2MHz 2–20MHz 20M–1GHz 2GHz?

• self-fulfilling prophecy (Moore’s Curve) IPC (per core) < 0.1 0.1–0.9 0.9–2.0 2.0?
• 2X every 2 years = ~3% increase per month MIPS/MFLOPS < 0.2 0.2–20 20–2,000 100,000?
• 3% per month used to judge performance features
Number of cores 1 1 1 64?
• if feature adds 9 months to schedule...
• ...it should add at least 30% to performance (1.039 = 1.30 → 30%)
• e.g., Intel Itanium: under Moore’s Curve in a big way
some perspective: 1971–2001 performance improved 35,000X!!!
• what if cars improved at this rate?
performance improvements have slowed down in past few years • 1971: 60 MPH & 10 MPG, 2001: 2,100,000 MPH & 350,000 MPG
• architects haven’t figured out how to use the extra • but... what if cars crashed as often as computers did?
transistors to improve performance of single core without
melting the chip --> multicore chips at lower frequencies

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 23 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 24
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance Readings
Much of the focus of this course is on improving performance Hennessy & Patterson
Topics: • Chapter 1
• performance metrics R. P. Colwell et al. “Instruction Sets and Beyond: Computers,
Complexity, and Controversy.” IEEE Computer, 18(9), 1996.
• CPU performance equation
• benchmarks and benchmarking
• reporting averages
• Amdahl’s Law
• Little’s Law
• concepts
• balance
• tradeoffs
• bursty behavior (average and peak performance)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 25 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 26
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Performance Metrics Performance Metric I: MIPS

latency: response time, execution time MIPS (millions of instructions per second)
• good metric for fixed amount of work (minimize time) • (instruction count / execution time in seconds) x 10-6
throughput: bandwidth, work per time – but instruction count is not a reliable indicator of work
• Prob #1: work per instruction varies (FP mult >> register move)
• = (1 / latency) when there is NO OVERLAP • Prob #2: instruction sets aren’t equal (3 Pentium instrs != 3 Alpha
• > (1 / latency) when there is overlap instrs)
• in real processors, there is always overlap (e.g., pipelining) – may vary inversely with actual performance
• good metric for fixed amount of time (maximize work) – particularly bad metric for multicore chips
comparing performance
• A is N times faster than B iff
• perf(A)/perf(B) = time(B)/time(A) = N
• A is X% faster than B iff
• perf(A)/perf(B) = time(B)/time(A) = 1 + X/100

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 27 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 28
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance Metric II: MFLOPS CPU Performance Equation
MFLOPS (millions of floating-point operations per second) processor performance = seconds / program
• (FP ops / execution time) x 10-6 • separate into three components (for single core)
• like MIPS, but counts only FP operations
• FP ops have longest latencies anyway (problem #1)
• FP ops are the same across machines (problem #2) instructions cycles seconds
x x
– may have been valid in 1980 (most programs were FP) program instruction cycle
• most programs today are “integer” i.e., light on FP
• load from memory takes longer than FP divide (prob #1)
• Cray doesn’t implement divide, Motorola has SQRT, SIN, COS (#2) architecture implementation realization
(ISA) (micro-architecture) (physical layout)
compiler-designer processor-designer circuit-designer

CPS 206 ECE 252 / CPS 220 ECE 261

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 29 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 30
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

CPU Performance Equation CPU Performance Comparison

instructions / program: dynamic instruction count famous example: “RISC Wars” (RISC vs. CISC)
• mostly determined by program, compiler, ISA • assume
• instructions / program: CISC = P, RISC = 2P
cycles / instruction: CPI • CPI: CISC = 8, RISC = 2
• mostly determined by ISA and CPU/memory organization • T = clock period for CISC and RISC (assume they are equal)
• CISC time = P x 8 x T = 8PT
seconds / cycle: cycle time, clock time, 1 / clock frequency
• RISC time = 2P x 2 x T = 4PT
• mostly determined by technology and CPU organization
• RISC time = CISC CPU time/2
uses of CPU performance equation
the truth is much, much, much more complex
• high-level performance comparisons
• actual data from IBM AS/400 (CISC -> RISC in 1995):
• back of the envelope calculations
• CISC time = P x 7 x T = 7PT
• helping architects think about compilers and technology • RISC time = 3.1P x 3 x T/3.1 = 3PT (+1 tech. gen.)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 31 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 32
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
CPU Back-of-the-Envelope Calculation Actually Measuring Performance
base machine how are execution-time & CPI actually measured?
• 43% ALU ops (1 cycle), 21% loads (1 cycle), 12% stores (2 • execution time: time (Unix cmd): wall-clock, CPU, system
cycles), 24% branches (2 cycles)
• CPI = CPU time / (clock frequency * # instructions)
• note: pretending latency is 1 because of pipelining
• more useful? CPI breakdown (compute, memory stall, etc.)
Q: should 1-cycle stores be implemented if it slows clock 15%? • so we know what the performance problems are (what to fix)
• old CPI = 0.43 + 0.21 + (0.12 x 2) + (0.24 x 2) = 1.36
measuring CPI breakdown
• new CPI = 0.43 + 0.21 + 0.12 + (0.24 x 2) = 1.24
• hardware event counters (built into core)
• speedup = (P x 1.36 x T) / (P x 1.24 x 1.15T) = 0.95 • calculate CPI using instruction frequencies/event costs

Answer: NO! • cycle-level microarchitecture simulator (e.g., SimpleScalar)

+ measure exactly what you want
– model microarchitecture faithfully (at least parts of interest)
• method of choice for many architects (yours, too!)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 33 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 34
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Benchmarks and Benchmarking Let’s Choose Some Benchmarks!

“program” as unit of work What benchmarks would you put in your benchmark suite?
• millions of them, many different kinds, which to use?
benchmarks
• standard programs for measuring/comparing performance
+ represent programs people care about
+ repeatable!!
• benchmarking process
• define workload
• extract benchmarks from workload
• execute benchmarks on candidate machines
• project performance on new machine
• run workload on new machine and compare
• not close enough -> repeat

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 35 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 36
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Benchmarks: Toys, Kernels, Synthetics Benchmarks: Real Programs
toy benchmarks: little programs that no one really runs real programs
• e.g., fibonacci, 8 queens + only accurate way to characterize performance
– little value, what real programs do these represent? – requires considerable work (porting)
• scary fact: used to prove the value of RISC in early 80’s
Standard Performance Evaluation Corporation (SPEC)
kernels: important (frequently executed) pieces of real programs
• e.g., Livermore loops, Linpack (inner product) • http://www.spec.org
+ good for focusing on individual features, but not big picture • collects, standardizes and distributes benchmark suites
– over-emphasize target feature (for better or worse) • consortium made up of industry leaders
synthetic benchmarks: programs made up for benchmarking • SPEC CPU (CPU intensive benchmarks)
• SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2006
• e.g., Whetstone, Dhrystone
• toy kernels++, which programs do these represent? • other benchmark suites
• SPECjvm, SPECmail, SPECweb, SPEComp

Other benchmark suite examples: TPC-C, TPC-H for databases

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 37 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 38
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

SPEC CPU2006 Benchmarking Pitfalls

12 integer programs (C, C++) • benchmark properties mismatch with features studied
• gcc (compiler), perl (interpreter), hmmer (markov chain) • e.g., using SPEC for large cache studies
• bzip2 (compress), go (AI), sjeng (AI) • careless scaling
• libquantum (physics), h264ref (video)
• using only first few million instructions (initialization phase)
• omnetpp (simulation), astar (path finding algs)
• reducing program data size
• xalanc (XML processing), mcf (network optimization)
• choosing performance from wrong application space
17 floating point programs (C, C++, Fortran) • e.g., in a realtime environment, choosing gcc
• fluid dynamics: bwaves, leslie3d, ibm
• using old benchmarks
• quantum chemistry: gamess, tonto
• “benchmark specials”: benchmark-specific optimizations
• physics: milc, zeusmp, cactusADM
• gromacs (biochem) Benchmarks must be continuously maintained and updated!
• namd (bio, molec dynamics), dealll (finite element analysis)
• soplex (linear programming), povray (ray tracing)
• calculix (mechanics), GemsFDTD (computational E&M)
• wrf (weather), sphinx3 (speech recognition)

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 39 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 40
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Reporting Average Performance What Does The Mean Mean?
averages: one of the things architects frequently get wrong arithmetic mean (AM): average execution times of N programs
+ pay attention now and you won’t get them wrong • ∑1..Ν(time(i)) / N
important things about averages (i.e., means) harmonic mean (HM): average IPCs of N programs
• ideally proportional to execution time (ultimate metric) • arithmetic mean cannot be used for rates (e.g., IPCs)
• Arithmetic Mean (AM) for times • 30 MPH for 1 mile + 90 MPH for 1 mile != avg. 60 MPH
• Harmonic Mean (HM) for rates (IPCs) • N / ∑1..N(1 / rate(i))
• Geometric Mean (GM) for ratios (speedups)
• there is no such thing as the average program geometric mean (GM): average speedups of N programs
• use average when absolutely necessary • N√(∏1..N(speedup(i))
what if programs run at different frequencies within workload?
• “weighting”
• weighted AM = (∑1..N w(i) * time(i)) / N

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 41 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 42
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

GM Weirdness Amdahl’s Law

what about averaging ratios (speedups)? “Validity of the Single-Processor Approach to Achieving Large-
• HM / AM change depending on which machine is the base Scale Computing Capabilities” –G. Amdahl, AFIPS, 1967
• let optimization speed up fraction f of program by factor s
machine A machine B B/A A/B • speedup = old / ([(1-f) x old] + f/s x old) = 1 / (1 - f + f/s)
Program1 1 10 10 0.1
Program2 1000 100 0.1 10
(10+.1)/2 = 5.05 (.1+10)/2 = 5.05 • f = 95%, s = 1.1 → 1/[(1-0.95) + (0.95/1.1)] = 1.094
AM
B is 5.05 times faster! A is 5.05 times faster! • f = 5%, s = 10 → 1/[(1-0.05) + (0.05/10)] = 1.047
2/(1/10+1/.1) = 5.05 2/(1/.1+1/10) = 5.05
HM B is 5.05 times faster! A is 5.05 times faster! • f = 5%, s = ∞ → 1/[(1-0.05) + (0.05/∞)] = 1.052
GM √(10*.1) = 1 √(.1*10) = 1 • f = 95%, s ∞ → 1/[(1-0.95) + (0.95/∞)] = 20
– geometric mean of ratios is not proportional to total time! make common case fast, but...
• if we take total execution time, B is 9.1 times faster
• GM says they are equal ...uncommon case eventually limits performance

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 43 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 44
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Little’s Law System Balance
Key Relationship between latency and bandwidth: each system component produces & consumes data
Average number in system = arrival rate * mean holding time • make sure data supply and demand is balanced
• X demand >= X supply ⇒ computation is “X-bound”
Possibly the most useful equation I know • e.g., memory bound, CPU-bound, I/O-bound
• Useful in design of computers, software, industrial • goal: be bound everywhere at once (why?)
processes, etc.
• X can be bandwidth or latency
Example: • X is bandwidth ⇒ buy more bandwidth
• How big of a wine cellar should we build? • X is latency ⇒ much tougher problem

• We drink (and buy) an average of 2 bottles per week

• On average, we want to age the wine for 5 years
• bottles in cellar = 2 bottles/week * 52 weeks/year * 5 years
• = 520 bottles

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 45 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 46
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Tradeoffs Bursty Behavior

“Bandwidth problems can be solved with money. Latency Q: to sustain 2 IPC... how many instructions should processor be
problems are harder, because the speed of light is fixed and you able to
can’t bribe God” – David Clark
• fetch per cycle?
well... • execute per cycle?
• can convert some latency problems to bandwidth problems • complete per cycle?
• solve those with money
A: NOT 2 (more than 2)
• the famous “bandwidth/latency tradeoff”
• dependences will cause stalls (under-utilization)
• if desired performance is X, peak performance must be > X
• architecture is the art of making tradeoffs
programs don’t always obey “average” behavior
• can’t design processor only to handle average behvaior

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 47 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 48
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction
Performance in the Real World Roadmap for Rest of Semester
A paper comparing performance of RISC vs. CISC and trying to Primary topics for rest of course
show that RISC is not obviously better
• Pipelined processors
• “Instruction Sets and Beyond: Computers, Complexity, and
Controversy” by Colwell et al., IEEE Computer 1986. • Multiple-issue (superscalar), in-order processors
• Hardware managed out-of-order instruction execution
• Static (compiler) instruction scheduling, VLIW, EPIC
• Advanced cache/memory issues
• Multithreaded processors
• Intro to multicore chips and multi-chip multiprocessors
Advanced topics
• Power-efficiency, fault tolerance, security, virtual machines,
grid processors, nanocomputing

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 49 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 50
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Topics NOT covered in this course Other Courses I’d Recommend

These topics have been well-covered in your prior courses Topics related to this course (non-exhaustive list!)
• Digital logic design • Advanced comp. architecture II (ECE 259/CPS 221)
• Computer arithmetic • VLSI design (ECE 261)
• Instruction sets • Fault tolerant computing (ECE 254/CPS 225)
• Cache/memory basics, including virtual memory • Performance/reliability analysis (ECE 255/257)
• I/O (disks, etc.) • Advanced digital system design (ECE 251)
• Operating systems (CPS 210)
• Compilers (offered at UNC or NC State)
If you’re uncomfortable with these topics, please see me

© 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 51 © 2009 by Sorin, Roth, Hill, Wood, ECE 252 / CPS 220 Lecture Notes 52
Sohi, Smith, Vijaykumar, Lipasti Introduction Sohi, Smith, Vijaykumar, Lipasti Introduction

Digital Design Interview Questions
100% (2)
Digital Design Interview Questions
47 pages
1-Intro 2
No ratings yet
1-Intro 2
13 pages
Ece4750 Syllabus
No ratings yet
Ece4750 Syllabus
12 pages
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
No ratings yet
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
60 pages
01 Intro
No ratings yet
01 Intro
47 pages
Week0 Basics 2023
No ratings yet
Week0 Basics 2023
27 pages
CPEN 402 Advanced Architecture - 2021
No ratings yet
CPEN 402 Advanced Architecture - 2021
4 pages
H AMACHER
0% (1)
H AMACHER
59 pages
Computer Architecture Notes
100% (1)
Computer Architecture Notes
18 pages
ECE 4750 Computer Architecture, Fall 2019 Course Syllabus
No ratings yet
ECE 4750 Computer Architecture, Fall 2019 Course Syllabus
11 pages
CSC 504 - Computer Architecture II: Course Particulars
No ratings yet
CSC 504 - Computer Architecture II: Course Particulars
6 pages
Sri Ramakrishna Institute of Technology: Department of Electrical and Electronics Engineering
No ratings yet
Sri Ramakrishna Institute of Technology: Department of Electrical and Electronics Engineering
3 pages
Lecture01 Intro
No ratings yet
Lecture01 Intro
25 pages
Computer Architecture and Organization - Lesson Plan
100% (1)
Computer Architecture and Organization - Lesson Plan
20 pages
Onur Comparch Fall2017 Lecture2 Fundamentals Memoryhierarchy Caches Afterlecture
No ratings yet
Onur Comparch Fall2017 Lecture2 Fundamentals Memoryhierarchy Caches Afterlecture
191 pages
CS 404 - COA Course Plan
No ratings yet
CS 404 - COA Course Plan
8 pages
CA Sp06 m01 Intro
No ratings yet
CA Sp06 m01 Intro
14 pages
BTech_Semester_III_Computer Architecture
No ratings yet
BTech_Semester_III_Computer Architecture
6 pages
Welcome To CSE 502
No ratings yet
Welcome To CSE 502
18 pages
Ec8552-Cao Unit 5
No ratings yet
Ec8552-Cao Unit 5
72 pages
CAO-Fall-2024-Lecture-01-Introduction-Motivation
No ratings yet
CAO-Fall-2024-Lecture-01-Introduction-Motivation
68 pages
Pressed
No ratings yet
Pressed
222 pages
L01-Intro
No ratings yet
L01-Intro
57 pages
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
No ratings yet
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
27 pages
L01-Intro
No ratings yet
L01-Intro
65 pages
CA_LP.docx
No ratings yet
CA_LP.docx
6 pages
Lectures 1: Review of Technology Trends and Cost/Performance
No ratings yet
Lectures 1: Review of Technology Trends and Cost/Performance
53 pages
Lectures 1: Review of Technology Trends and Cost/Performance
No ratings yet
Lectures 1: Review of Technology Trends and Cost/Performance
53 pages
ReviewedCSC303 CompiledNote 2023 24
No ratings yet
ReviewedCSC303 CompiledNote 2023 24
78 pages
Csc303 _syllabus_ Module _one and Two
No ratings yet
Csc303 _syllabus_ Module _one and Two
52 pages
Computer Organization 01
No ratings yet
Computer Organization 01
22 pages
Syllabus CEA201 Summer 2015
No ratings yet
Syllabus CEA201 Summer 2015
17 pages
Computer Architecture and Organization Learning Module 1
No ratings yet
Computer Architecture and Organization Learning Module 1
31 pages
10) WASE 2018 - Comp - Org - Archi - Flipped - HO
No ratings yet
10) WASE 2018 - Comp - Org - Archi - Flipped - HO
14 pages
Syllabus CEA201 Spring 2022
No ratings yet
Syllabus CEA201 Spring 2022
15 pages
Instructor: Nima Honarmand: Spring 2015:: CSE 502 - Computer Architecture
No ratings yet
Instructor: Nima Honarmand: Spring 2015:: CSE 502 - Computer Architecture
16 pages
Lect 01
No ratings yet
Lect 01
40 pages
Course Description-Computer Organisation and Architecture
No ratings yet
Course Description-Computer Organisation and Architecture
4 pages
01-CSL-part-one (2)
No ratings yet
01-CSL-part-one (2)
79 pages
Unit I Coa
No ratings yet
Unit I Coa
59 pages
CS556 Computer Architecture Syllabus: Dr. Stephan Ehrlich
No ratings yet
CS556 Computer Architecture Syllabus: Dr. Stephan Ehrlich
2 pages
SYllabus Организация+и+Архитектура+Вычислительных+Систем Мирзахмедова++24-25
No ratings yet
SYllabus Организация+и+Архитектура+Вычислительных+Систем Мирзахмедова++24-25
11 pages
Introduction To Computer Organization and Architecture (COA)
No ratings yet
Introduction To Computer Organization and Architecture (COA)
35 pages
CS2305 Computer Architecture: Yanyan Shen Department of Computer Science and Engineering
No ratings yet
CS2305 Computer Architecture: Yanyan Shen Department of Computer Science and Engineering
39 pages
CS244-Introduction To Embedded Systems and Ubiquitous Computing
No ratings yet
CS244-Introduction To Embedded Systems and Ubiquitous Computing
38 pages
Onur 447 Spring15 Lecture2 Isa Afterlecture
No ratings yet
Onur 447 Spring15 Lecture2 Isa Afterlecture
57 pages
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
No ratings yet
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
39 pages
Course Outline: Dire Dawa University Dire Dawa Institute of Technology School of Electrical & Computer Engineering
No ratings yet
Course Outline: Dire Dawa University Dire Dawa Institute of Technology School of Electrical & Computer Engineering
4 pages
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
No ratings yet
CSE 243: Introduction To Computer Architecture and Hardware/Software Interface
27 pages
Bits and Bytes
No ratings yet
Bits and Bytes
11 pages
CH01 Intro
No ratings yet
CH01 Intro
34 pages
Lecture 1 - Introduction To Microcontroller Systems
No ratings yet
Lecture 1 - Introduction To Microcontroller Systems
17 pages
2 Isa
No ratings yet
2 Isa
50 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
26 pages
Course Handout CS2203 - 2022
No ratings yet
Course Handout CS2203 - 2022
7 pages
MTECH SEM 2 SYLLABUS-pages-deleted
No ratings yet
MTECH SEM 2 SYLLABUS-pages-deleted
34 pages
Introduction To CAO
No ratings yet
Introduction To CAO
6 pages
Course Title: Credit Value: 4 VENUE FOR LECTURE: University of Buea Venue For Tutorial: Same NAME (S) OF COURSE LECTRURER (S) : Nyanga Bernard Y
No ratings yet
Course Title: Credit Value: 4 VENUE FOR LECTURE: University of Buea Venue For Tutorial: Same NAME (S) OF COURSE LECTRURER (S) : Nyanga Bernard Y
3 pages
Computer Craft Coursebook 5
From Everand
Computer Craft Coursebook 5
Sarala Devi Dhanapal
No ratings yet
Computer Craft Coursebook 3
From Everand
Computer Craft Coursebook 3
Sarala Devi Dhanapal
No ratings yet
Computer Craft Coursebook 4
From Everand
Computer Craft Coursebook 4
Sarala Devi Dhanapal
No ratings yet
CS 211: Computer Architecture Cache Memory Design
No ratings yet
CS 211: Computer Architecture Cache Memory Design
32 pages
Design Rules, Layout and Stick Diagram
No ratings yet
Design Rules, Layout and Stick Diagram
69 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Mealy and Moore Sequential Circuits
No ratings yet
Mealy and Moore Sequential Circuits
12 pages
Introduction To Automobile Systems
No ratings yet
Introduction To Automobile Systems
18 pages
2: Transistors, Fabrication, Layout
No ratings yet
2: Transistors, Fabrication, Layout
44 pages
Optimum Part Deposition Orientation in Fused Deposition Modeling
No ratings yet
Optimum Part Deposition Orientation in Fused Deposition Modeling
10 pages
MATLAB-Based EM Notaros Contents Preface
0% (1)
MATLAB-Based EM Notaros Contents Preface
16 pages
Labsheet1 Updated
No ratings yet
Labsheet1 Updated
11 pages
Lecture 05
No ratings yet
Lecture 05
27 pages
Axpert EX-MEX-manual
No ratings yet
Axpert EX-MEX-manual
31 pages
LFTC Flanged Bearing Dimensions.
No ratings yet
LFTC Flanged Bearing Dimensions.
60 pages
Trip Curve IEC-UIT-10PU: Ultra Inverse Time 3000TC
No ratings yet
Trip Curve IEC-UIT-10PU: Ultra Inverse Time 3000TC
1 page
Practice Test 2
No ratings yet
Practice Test 2
7 pages
22-Electric-Fields-NT-Answers
No ratings yet
22-Electric-Fields-NT-Answers
5 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
21 pages
Cavitec GMA Recomend
No ratings yet
Cavitec GMA Recomend
2 pages
ANSYS Mechanical Advances (Using Command Objects) : APDL Commands
No ratings yet
ANSYS Mechanical Advances (Using Command Objects) : APDL Commands
37 pages
Kannur University Bca Programme: For I C
No ratings yet
Kannur University Bca Programme: For I C
44 pages
Enig M Acoustics Dharma Production 2015
No ratings yet
Enig M Acoustics Dharma Production 2015
1 page
Icecce49384 2020 9179470
No ratings yet
Icecce49384 2020 9179470
5 pages
MA1001 Dynamics
No ratings yet
MA1001 Dynamics
5 pages
BITSAT 2025 mathango
No ratings yet
BITSAT 2025 mathango
10 pages
UNIT 3 Bhs Inggris Teknik Mesin Unimed
No ratings yet
UNIT 3 Bhs Inggris Teknik Mesin Unimed
19 pages
4042-MI20-00DS-0009-001 R04 Available Utilities Data Sheet
No ratings yet
4042-MI20-00DS-0009-001 R04 Available Utilities Data Sheet
7 pages
P3T32 Manual PDF
No ratings yet
P3T32 Manual PDF
350 pages
AutoCAD Shortcu-WPS Office
No ratings yet
AutoCAD Shortcu-WPS Office
19 pages
2 Fundamental Solid-State Principles
No ratings yet
2 Fundamental Solid-State Principles
32 pages
f1 Endterm 2 Topmark Exams
No ratings yet
f1 Endterm 2 Topmark Exams
110 pages
Worksheet 2.1: Tangents To A Circle and Their Properties
No ratings yet
Worksheet 2.1: Tangents To A Circle and Their Properties
43 pages
Fundamental Characteristics of Arc Extinction by Magnetic Blow Out at DC Voltages
No ratings yet
Fundamental Characteristics of Arc Extinction by Magnetic Blow Out at DC Voltages
6 pages
ESS Design and Installation Manual-En
No ratings yet
ESS Design and Installation Manual-En
32 pages
Wave Optics
No ratings yet
Wave Optics
12 pages
Ehrenkranz 1999
No ratings yet
Ehrenkranz 1999
12 pages
21 To 25 Jan - Garagexpo
No ratings yet
21 To 25 Jan - Garagexpo
60 pages
testbank-chapter-3
No ratings yet
testbank-chapter-3
5 pages

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

Uploaded by

Administrivia: ECE 252 / CPS 220 Advanced Computer Architecture I

Uploaded by

ECE 252 / CPS 220 Administrivia

Advanced Computer Architecture I

based on slides developed by

Profs. Roth (Penn), Hill, Wood, Sohi, Smith,

Instructors Where to Get Answers

Grading Academic Integrity and Late Policy

Now, moving on to computer architecture ...

Architecture and Other Disciplines Levels of Computer Architecture

Circuits, Wires, Devices, Network Hardware realization (circuits)

Why Study Computer Architecture? Why Study Computer Architecture?

Moore’s Law Evolution of Single-Chip Processors

Performance Metrics Performance Metric I: MIPS

CPS 206 ECE 252 / CPS 220 ECE 261

CPU Performance Equation CPU Performance Comparison

Answer: NO! • cycle-level microarchitecture simulator (e.g., SimpleScalar)

Benchmarks and Benchmarking Let’s Choose Some Benchmarks!

Other benchmark suite examples: TPC-C, TPC-H for databases

SPEC CPU2006 Benchmarking Pitfalls

GM Weirdness Amdahl’s Law

• We drink (and buy) an average of 2 bottles per week

Tradeoffs Bursty Behavior

Topics NOT covered in this course Other Courses I’d Recommend

You might also like