L01_Intro
L01_Intro
• Course logistics
2
Course Resources
Textbooks (none required)
Patterson & Hennessy, Computer Organization and Design, 5th edition
3
The System Verilog Labs
• “Build your own processor” (pipelined 32-bit RISC-V CPU)
• Use SystemVerilog HDL (hardware description language)
Programming language compiles to gates/wires not insns
• Implement and test on real hardware
FPGA (field-programmable gate array)
+ Instructive: learn by doing
+ Satisfying: “look, I built my own processor”
4
Lab Logistics
Tools
Quartus/Xilinx Vivado hardware compiler
DE-2/DE-10 FPGA boards
Logistics
Development and simulation can be done before final testing on the board
5
Labs
Labs done in groups of two or three
start forming groups now, be mindful of Pass/Fail
Lab descriptions
Lab 1: SystemVerilog debugging
Lab 2: Arithmetic unit
Lab 3: Single-cycle RISC-V & Register File
Lab 4: Pipelined RISC-V
Lab 5: Pipelined+ RISC-V
http://debuggingrules.com
Microprocessor vs Computer Architecture
In EE2039 you learned how a processor worked, in EE3043 we
will tell you how to make it work well.
• “Computer Architecture”
– functional spec for software and programmers
– design spec for the hardware people
• Computer Organization
– take architecture to “micro”architecture
– how to assemble/evaluate/tune
• Computation Structures
– digital representations
– processing, storage and I/O elements
EE1009
18‐447‐S21‐L01‐S10, James C. Hoe, CMU/ECE/CALCM,
©2021
Abstraction, Layering, and Computers
App App App
Software
System software
ISA
Mem CPU I/O
Hardware
Transistors
11
What is a Computer?
• Computer, 2. a. A calculating‐machine; esp. an
automatic electronic device for performing
mathematical or logical operations; freq. with
defining word prefixed, as analogue, digital,
electronic computer.
‐‐‐ Oxford English Dictionary, circa 2000
Processing
control Storage
(sequencing) (program I/O
and data)
datapath
CPU CPU
ALU RF ALU RF
cache cache
Memory “Bus”
state
I/O
logic
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons
21
What We Will Cover (I)
Combinational Logic Design
22
What We Will Cover (II)
Microarchitecture Fundamentals
Single-cycle Microarchitectures
Pipelining
Branch Prediction
Out-of-Order Execution
Superscalar Execution
Caches
Prefetching Problem
Algorithm
24
Processing Paradigms We Will Cover
Pipelining
Out-of-order execution
Dataflow (at the ISA level)
Superscalar Execution
Problem
VLIW Algorithm
Decoupled Access-Execute Program/Language
Systolic Arrays System Software
SW/HW Interface
SIMD Processing (Vector & Array, GPUs)
Micro-architecture
Logic
Devices
Electrons
25
Computer Architecture is Engineering
• An applied discipline of finding and optimizing
solutions under the joint constraints
of demand, technology, economics,
and ethics
• Thus, instances of what we practice
evolve continuously
• Need to learn the principles
that govern how to develop
solutions to meet constraints
• Don’t memorize instances;
understand why it is that way
Historical Perspectives:
prelude to modern computer architecture
28
Penn Legacy
• ENIAC: electronic numerical integrator and calculator
• First operational general-purpose stored-program computer
• Designed and built here by Eckert and Mauchly
• See it in Moore 100!
29
Computer Architecture
• Computer architecture
• Definition of ISA to facilitate implementation of software layers
• The hardware/software interface
• Computer micro-architecture
• Design processor, memory, I/O to implement ISA
• Efficiently implementing the interface
30
Application Specific Designs
• This class is about general-purpose CPUs
• Processor that can do anything, run a full OS, etc.
• E.g., Intel Atom/Core/Xeon, AMD Ryzen/EPYC, ARM M/A series
31
Forces on Innovation
• 64‐bit processor
• 1.7 billion transistors
• 1.7 GHz, issue up to 8
instructions per cycle
• 26 MByte of cache!!
[http://www.intel.com/research/silicon/mooreslaw.htm]
Original article at http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=658762
The “Other” Moore’s Law
1
25%
0
node “label” 14 10 7 5 3.5 2.5 1.8 ??
technology Pentium 4
normalized
power
Better to replace 1 of this
(Watt)
by 2 of these;
Or N of
these Power≈Perf1.75
technology
486 normalized
performance
Energy per Instruction Trends in Intel® (op/sec)
Microprocessors,
18‐447‐S21‐L01‐S33, James C. Hoe, Grochowski et al., 2006
CMU/ECE/CALCM, ©2021
Moore’s Law Scaling with Cores
52
Technology that Drives
gate
• Basic element
• Solid-state transistor (i.e., electrical switch) source drain
• Building block of integrated circuits (ICs)
channel
• What’s so great about ICs? Everything
+ High performance, high reliability, low cost, low power
+ Lever of mass production
Today:
235 transistors
54
Moore’s Law today
57
First Microprocessor
• Intel 4004 (1971)
• Application: calculators
• Technology: 10,000 nm
• 2300 transistors
• 13 mm2
• 108 KHz
• 12 Volts
• 4-bit data
• Single-cycle datapath
58
Revolution II: Implicit Parallelism
• Then to extract implicit instruction-level parallelism
• Hardware provides parallel resources, figures out how to use them
• Software is oblivious
• 55M transistors
• 101 mm2
• 3.4 GHz
• 1.2 Volts
60
Revolution III: Explicit Parallelism
• Then to support explicit data & thread level parallelism
• Hardware provides parallel resources, software specifies usage
• Why? diminishing returns on instruction-level-parallelism
61
Modern Multicore Processor
• AMD EPYC 7H12
• Application: server
• Technology: 7nm
• 39.5B transistors
• 1008 mm2
• 2.6 to 3.3 Ghz
• 256-bit data (2x)
• 19-stage pipelined datapath
• 4 instructions per cycle
• 292MB of on-chip cache
• data-parallel vector (SIMD) instructions, hyperthreading
• 64-core multicore
62
Historical Microprocessor Evolution
Feature Intel Intel Pentium 4 AMD EPYC
4004 Prescott Rome
release date 1971 2004 2019
transistor size 10,000 nm 90 nm 7 nm, 14 nm
transistor 2,300 125M 39.5B
count
area 13 mm2 112 mm2 1008 mm2
frequency 740 KHz 3.8 GHz 2.6-3.3 GHz
data width 4-bit 64-bit 256-bit
pipeline n/a 31 19
stages
pipeline width n/a 3 4
core count 1 1 64
on-chip cache n/a 1MB 292MB
63
Revolution IV: Accelerators
• Combining multiple kinds of compute engines in one die
• not just homogenous collection of cores
• System-on-Chip (SoC) is one common example in mobile space
64
c/o Qualcomm
65
Zedboard SoC
66
Cerebras Wafer-Scale Engine
• giant 8.5” square chip!
• full of deep learning
accelerators
• 18GB on-chip memory
• 9 PB/sec on-chip memory
bandwidth
• TSMC 16nm transistors
• ~$2.5M each?
67
Technology Disruptions
• Classic examples:
• transistor
• microprocessor
• More recent examples:
• flash-based solid-state storage
• shift to accelerators
• Nascent disruptive technologies:
• non-volatile memory (“disks” as fast as DRAM)
• Chip stacking (also called “3D die stacking”)
• The end of Moore’s Law
• “If something can’t go on forever, it must stop eventually”
• Transistor speed/energy efficiency not improving like before
68
“Golden Age of Computer Architecture”
• Hennessy & Patterson, 2018 Turing Laureates
• the end of Dennard Scaling & Moore’s Law means no more
free performance
• “The next decade will see a Cambrian explosion of novel computer
architectures”
69
Themes
• Parallelism
• enhance system performance by doing multiple things at once
• instruction-level parallelism, multicore, GPUs, accelerators
• Caching
• exploiting locality of reference: storage hierarchies
• try to provide the illusion of a single large, fast memory
70