L1-intro
L1-intro
Computer Architecture
Winter 2015
Introduction
2
What is a computer?
3
What is a computer?
4
What is a computer?
5
The Computer Revolution
6
Classes of Computers
Personal computers
General purpose, variety of software
Subject to cost/performance tradeoff
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Supercomputers
High-end scientific and engineering calculations
Highest capability but represent a small fraction of the overall
computer market
Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints 7
The PostPC Era
8
The PostPC Era
Cloud computing
Warehouse Scale Computers (WSC)
Software as a Service (SaaS)
Portion of software run on a PMD and a portion run in
the Cloud
Amazon and Google
9
Components of a Computer
Computer
CPU Memory Devices
Control Input
Datapath Output
Application software
Written in high-level language
System software
Compiler: translates HLL code to
machine code
Operating System: service code
- Handling input/output
- Managing memory and storage
- Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
Levels of Program Code
High-level language
Level of abstraction closer
to problem domain
Provides for productivity
and portability
Assembly language
Textual representation of
instructions
Hardware representation
Binary digits (bits)
Encoded instructions and
data
12
Old School Machine Structures
(Layers of Abstraction)
Operating
System
Compiler
Software (Mac OSX)
Assembler
Instruction Set
Architecture
Processor Memory I/O system
Hardware
Datapath & Control
Digital Design
Circuit Design
Transistors
13
New-School Machine Structures
Software Hardware
Parallel Requests
Assigned to computer Warehouse Smart
Harness Scale Phone
e.g., Search “Katz” Computer
Parallelism &
Parallel Threads Achieve High
Assigned to core Performance
Computer
e.g., Lookup, Ads
Core … Core
Parallel Instructions
Memory (Cache)
>1 instruction @ one time
e.g., 5 pipelined instructions Input/Output Core
Parallel Data Instruction Unit(s) Functional
Unit(s)
>1 data item @ one time
A0+B0 A1+B1 A2+B2 A3+B3
e.g., Add of 4 pairs of words
Main Memory
Hardware descriptions
All gates working in parallel Logic Gates
at same time
Why do computers become so complicated?
Pursuing performance!
Eight Great Ideas
Design for Moore’s Law
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
15
Understanding Performance
Algorithm
Determines number of operations executed
Programming language, compiler, architecture
Determine number of machine instructions executed per
operation
Processor and memory system
Determine how fast instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed
16
What You Will Learn
How programs are translated into the machine
language
And how the hardware executes them
The hardware/software interface
What determines program performance
And how it can be improved
How hardware designers improve performance
What is parallel processing
17
Course Topics
1. Introduction
Machine structures: layers of abstraction
Eight great ideas
2. Performance Metrics I
CPU performance
perf, a profiling tool
3. Hierarchical Memory
The principle of locality
DRAM and cache
Cache misses
Performance metrics II: memory performance and profiling
Cache design and cache mapping techniques
18
Course Topics (cont’d)
4. MIPS Instruction Set Architecture (ISA)
MIPS number representation
MIPS instruction format, addressing modes and procedures
SPIM assembler and simulator
5. Introduction to Logic Circuit Design
Switches and transistors
State circuits
Combinational logic circuits
Combinational logic blocks
MIPS single cycle and multiple cycle CPU datapath and control
6. Instruction Level Parallelism
Pipelining the MIPS ISA
Pipelining hazards and solutions
Multiple issue processors
Loop unrolling, SSE 19
Course Topics (cont’d)
7. Multicore Architecture
Multicore organization
Memory consistency and cache coherence
Thread level parallelism
8. Parallel Performance Metrics III: Parallelism
and Profiling
Amdahl’s law, Graham and Brent’s theorem
Parallelism, Speedup
Cilkview, a parallel performance analyzer
9. GPU Architecture
Memory model
Execution model: scheduling and synchronization
20
Student Evaluation
Four assignments, each worth 10% of the final mark
Assignment 1 (memory hierarchy), due on Friday, Jan. 23rd
Assignment 2 (MIPS and circuits), due on Friday, Feb. 27th
Assignment 3 (ILP), due on Friday, March 13th
Assignment 4 (Multicore and TLP), due on Monday, April 6th
Four quizzes (key concepts, 30-minute in class), each
worth 5% of the final mark.
Quiz 1 (CPU/memory performance metrics and hierarchical
memory), beginning of class on Thursday, Jan. 22nd
Quiz 2 (MIPS and logic circuits), Thursday, Feb. 26th
Quiz 3 (ILP), Thursday, March 12th
Quiz 4 (multicore and TLP), Thursday, April 2nd
One final exam (covering all the course contents), worth
40% of the final mark
21
Recommended Text Book
Patterson & Hennessy (2011), "Computer Organization and
Design: The Hardware/Software Interface“, revised 4 th edition or
5th edition. ISBN: 978-0-12-374750-1
Teaching Assistants:
Xiaohui Chen ([email protected])
Ning Xie ([email protected])
22
Acknowledgements
23