CSE 305
Computer Architecture
Introduction
Prepared by
Madhusudan Basak
Assistant Professor
CSE, BUET
* Some modifications made by Saem Hasan
Why Computer Architecture?
To apply the Architectural sense to a computer
To apply computer for Architectural design
To know about the basic Architecture of a computer
Why Computer Architecture?
Purpose
How hardware (processors, memories, disk drives, network infrastructure) plus
software (operating systems, compilers, libraries, network protocols) combine to
support the execution of application programs
How you as a programmer can best use these resources
Why Computer Architecture?
What to know?
What the computer does How it does
Logical View Physical View
Instruction Set Architecture Computer Organization
(ISA)
Computer Architecture Instruction Set Architecture Computer Organization
Instruction Set Architecture
Instruction set architecture is the attributes of a computing system as seen by
the assembly language programmer or compiler.
Instruction Set (what operations can be performed?)
Instruction Format (how are instructions specified?)
Data storage (where is data located?)
Addressing Modes (how is data accessed?)
Exceptional Conditions (what happens if something goes wrong?)
Machine Organization
Machine organization is the view of the computer that is seen by the logic
designer. This includes
Capabilities & performance characteristics of functional units (e.g., registers,
ALU, shifters, etc.)
Ways in which these components are interconnected
How information flows between components
Logic and means by which such information flow is controlled
Coordination of functional units
Components of a Computer
Components of a Computer
• Gives directions to the other components
Control • e.g., bus controller, memory interface unit
• Performs arithmetic and logic operations
Datapath • e.g., adders, multipliers, shifters
• Holds data and instructions
Memory • e.g., cache, main memory, disk
• Sends data to the computer
Input • e.g., keyboard, mouse
• Gets data from the computer
Output • e.g., screen, sound card
Transfer within the computer:
LOAD R1, 0x1000: Load data from memory address 0x1000
into register R1
THREE TYPES OF COMMANDS in Instructions
Information in a computer -- Instructions
Instructions specify commands to Transfer between the computer and I/O:
INPUT R2: Read a value from the keyboard into
Transfer information within a computer register R2.
OUTPUT R3: Send the value in register R3 to the
• e.g., from memory to ALU printer.
Transfer of information between the computer and I/O devices
• e.g., from keyboard to computer, or computer to printer
Perform arithmetic and logical operations
• e. g., add two numbers, perform a logical AND
Arithmetic and logic:
ADD R4, R1, R2: Add the contents of registers R1 and R2, and store the result in R4.
AND R5, R1, R2: Perform a logical AND operation on R1 and R2, storing the result in R5
Information in a computer -- Instructions
A sequence of instructions to perform a task is called a program, which is
stored in the memory.
Processor fetches instructions from the memory and performs the
operations stated in those instructions.
What do the instructions operate upon?
Information in a computer -- Data
Data are the “operands” upon which instructions operate.
Data could be:
Numbers,
Encoded characters.
Data, in a broad sense means any digital information.
Computers use data that is encoded as a string of binary digits called bits.
Classes of Computers
Desktop / Notebook Computers
Low-end systems, high performance workstations.
Subject to cost/performance tradeoff
Server Computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Embedded Computers
Hidden as components of systems
Minimize memory and power. Often not programmable
Eight Great Ideas in Computer Architecture
Design for Moore’s Law
Use Abstraction to Simplify Design
Make the Common Case Fast
Performance via Parallelism
Performance via Pipelining
Performance via Prediction
Hierarchy of Memories
Dependability via Redundancy
Exponential Growth:
This doubling means that computing power (or performance) and the complexity of electronic systems grow exponentially over time.
Design for Moore’s Law
Provided by Gordon Moore (co-founder of Intel) in 1965
Moore's law is the observation that the number of transistors in a dense
integrated circuit (IC) doubles about every two years.
1971 Intel 4004 Microprocessor:
Contained 2,300 transistors.
1989 Intel 80486 Microprocessor:
Effect on Costs: Contained more than 1 million
Moore's Law also predicted a decline in cost-per-transistor. transistors.
As manufacturing scales improve, the cost of producing 2024 Modern Processors:
additional transistors decreases. Contain billions of transistors (e.g.,
Apple's M2 processor has over 20
billion transistors)
Use Abstraction to Simplify Design
Computing system maintains a hierarchical structure
Lower-level details are hidden to the higher levels
Higher level only gets the abstract view
Both Hardware and Software consist of hierarchical layers using abstraction
Make the Common Case Fast
More efficiency in common case, more impact in overall design.
Enhances throughput and computational power but introduces challenges like synchronization and
complexity
Performance via Parallelism
Current multi processor system exploits parallelism.
Often needs special care for coordination.
x=a+b
y = c*d
x=a+b
y = x*d
Performance via Pipelining
Pipelining
A special case of parallelism
Performing multiple non-dependent operations at the same time
A
B
C
Performance via Prediction
Perform operation just based on prediction/assumption
Applicable when the impact is not costly
Hierarchy of Memories
We want faster and cheaper memory
Faster memory is costlier
Cheaper memory is slower
Trade-off is memory hierarchy
Dependability via Redundancy
Redundancy means keeping multiple copies
One fails, another exists => Dependable
Hierarchical Structure of Program Execution
Simplified view including Hardware
e.g., MS Word, Powerpoint
e.g., Compiler, Operating System
e.g., CPU, HDD, RAM
Software Abstraction
Hierarchical structure for a
program execution A+B
Add A, B
1000110010100000
Hardware Abstraction: Memory
CPU
Memory Hierarchy Registers
Increased speed and cost
Cache
Memory
Volatile
(SRAMS)
Main Memory
(DRAMS)
Magnetic Disks
Optical Disks Non-Volatile
Magnetic Tapes
Simplified overall abstraction
Application S/W Application Software
Compiler,
Assembler,
Linker, Loader Executable Executable Executable
Program Program Program
Process
Operating
System Instruction Set
Virtual Memory
Architecture
File
Hardware Processor Main Memory I/O Devices
Performance
What is the metric of the performance of a computing system?
Depends on the purpose
Two commonly used metrics are:
Execution or Response Time
• How long it takes to do a task
Throughput
• Total work done per unit time
– e.g., tasks/transactions/… per hour
Example
Do the following changes to a computer system increase throughput, decrease
response time, or both?
Replacing the processor in a computer with a faster version
• Response time decreases or improves
• Decreasing response time generally increases throughput
Adding additional processors to a system that uses multiple processors for separate
tasks—for example, searching the web
• Throughput increases
• Response time depends on scenario
– Generally no impact on response time
– But in case tasks were waiting in the queue previously, response time will decrease after the change
Relative Performance
We shall focus on Response or Execution time
“X is times faster than Y” means
Example: Time taken to run a program
A: 10s, B=15s
So, A is 1.5 times faster than B
Measuring Execution Time
Elapsed time measures the total time taken for the program to run, from start to finish.
Counts everything (disk and memory accesses, I/O, operating system overhead
etc.)
A useful number, but often not good for comparison purposes
• Time sharing among multiple programs
CPU time
Doesn’t count I/O or time spent in running other programs
Can be broken into system CPU time and user CPU time
Our focus: user CPU time
CPU time spent in executing the lines of code that are “in” our program
User CPU Time: Time the CPU spends executing the user code in your program.
System CPU Time: Time the CPU spends on system calls, such as file I/O or process
management, on behalf of your program.
Clock Cycles
Time is not continuous, rather discrete to a Computer’s perspective
Activities are performed during the discrete clock ticks
time
cycle time = time between ticks = seconds per cycle
clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
A 2 Ghz. clock has a 1 / 2×109 = 0.5 nano-second (ns) cycle time
So, for a program
Clock Cycles
For a program
Performance improvement means
Decreasing number of clock cycles
Increasing clock rate
Hardware designer often trade off clock rate against cycle count
CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock CyclesB 1.2 Clock Cycles A
Clock RateB
CPU Time B 6s
Clock Cycles A CPU Time A Clock Rate A
10s 2GHz 20 10 9
1.2 20 10 9 24 10 9
Clock RateB 4GHz
6s 6s
Instructions vs Cycles
Is the number of cycles identical with the number of instructions?
No!
Why?
Operations take different time
Multiplication takes longer than addition
Floating point operations take longer than integer operations
The access time to a register is much shorter than to memory location
Instruction Count and CPI
Instruction Count and CPI
Instruction Count for a program
Determined by program, ISA and compiler
CPI is an average since the number of cycles per instruction varies from
instruction to instruction
CPI varies by application, as well as among implementation with the same
instruction set
Number of cycles for each instruction
Frequency of instructions (instruction mix)
Memory access time
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time Instruction Count CPI Cycle Time
A A A
I 2.0 250ps I500ps A is faster…
CPU Time Instruction Count CPI Cycle Time
B B B
I1.2500ps I 600ps
CPU Time
B I 600ps 1.2
CPU Time I500ps …by this much
A
Program Execution time
CPI Example
Alternative compiled code sequences using instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Performance Summary
The BIG Picture
Performance depends on
Algorithm
Programming language
Compiler
Instruction set architecture
Tradeoffs
Instruction count, CPI, and clock cycle present tradeoffs
RISC – reduced instruction set computer (MIPS)
• Simple instructions
• Higher instruction counts for an application
• Lower CPI
CISC – complex instruction set computer (IA-32)
• More complex instructions
• Lower instruction counts for an application
• Higher CPI
Comparing Computing Systems
Comparing systems => comparing execution time of the workload is required
Benchmarks can also help to evaluate measure the performance
12 benchmarks of SPECINTC2006 are given in the next slide
SPECratio can be used to measure the performance
The geometric mean of the SPECratios (of the Benchmarks) can be calculated
Benchmarks
Fallacies and Pitfalls
Fallacy
Commonly held misconceptions
Pitfall
a hidden or unsuspected danger or difficulty
Easily made mistakes
Fallacies
Fallacy 1
Computers at low utilization use little power
Fallacy 2
Designing for performance and designing for energy efficiency are
unrelated goals
Pitfalls
Pitfall 1
Expecting the improvement of one aspect of a computer to
increase overall performance by an amount proportional to the
size of the improvement.
Pitfall 2
Using a subset of the performance equation as a
performance metric.
Amdahl’s Law
Amdahl’s Law
Example
Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I have to
improve the speed of multiplication if I want my program to run five times
faster?
Acknowledgements
These slides contain material developed and copyright by:
Krste Asanovic (UCB), James Hoe (CMU), Li-Shiuan Peh (MIT), Sudhakar
Yalamanchili (GATECH), and Amirali Baniasadi (UVIC) in part of their respective
courses
Lecture slides by Dr. Tanzima Hashem, Professor, CSE, BUET
Lecture slides by Ms. Mehnaz Tabassum Mahin, Assistant Professor, CSE, BUET
Thank You