Slide 1
Slide 1
Lecture 1: Introduction
ELT3047E Computer
Architecture
ELT3048 Digital
Design & Comp. Org.
ELT2032 Semiconductor
EPN1095 Physics
Course information
❑ Lecturers
➢ Hoàng Gia Hưng, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Dương Minh Ngọc, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Appointment-based consultation
❑ Pre-requisites: INT1008/2290, ELT2041.
❑ Text book: David Patterson and John Hennessy, “Computer
Organization & Design: The Hardware/Software Interface” (5th
Ed.), Morgan Kaufmann Publishers, 2014.
❑ Grading:
➢ Quizzes/essays: 20%
➢ Midterm: 20%
➢ Final: 60%
❑ Some ground rules:
➢ Respect
➢ Honest
➢ Proactive
Old-time computers
❑ Mechanical computers
➢ Schickhard (1623), Pascal
(1642)
➢ Babbage (1823 – Difference
Engine, 1834 – Analytical
Engine)
❑ Modern computers
➢ The PC Era (1975-2009)
➢ The Post-PC Era (2010-present)
Device trend
Modern computers
❑ Personal Mobile Device (PMD)
✓ e.g. smart phones, tablet computers
✓ Emphasis: energy efficiency & real-time
❑ Desktop Computing
✓ Emphasis on price-performance
❑ Servers
✓ Emphasis: availability, scalability, throughput
❑ IoT/Embedded Computers
✓ Emphasis: price
What exactly is this course about?
❑ Application software
✓ Written in high-level language
❑ System software
✓ Compiler: translates HLL code to machine code
✓ Operating System: service code
▪ Handling input/output
▪ Managing memory and storage
▪ Scheduling tasks & sharing resources
❑ Hardware
✓ Electronic components organized in
accordance with a certain design
✓ Examples of principal components are:
Processor, memory, I/O controllers
Levels of Program Code
❑ High-level language
➢ Level of abstraction closer to problem
domain
➢ Provides for productivity and
portability
❑ Assembly language
➢ Human-readable format of
instructions
❑ Machine language
➢ Computer-readable format
➢ Binary digits (bits)
➢ Encoded instructions and data
The Five Classic Components of a (Von
Neumann) Computer
❑ A central arithmetical unit
capable of perform the
elementary operations of
arithmetic (Datapath)
❑ A central control unit capable
of logical control of the device,
i.e. properly sequencing of its
operations (Control) + Network
❑ A main memory, which stores both data and instructions (Memory)
➢ Stored-program concept
❑ Input units to transfer information from the outside recording medium
(R) into its specific parts C (CA+CC) and M (Input).
❑ Output units to transfer to transfer information from C (CA+CC) and M
to R (Output).
❖ Major components are interconnected through system bus
How is a stored program executed?
❑ A program written in HLL is a series of
instructions, which will be turn into
binary numbers, just like data, and
stored in memory.
➢ c.f. Harvard architecture
❑ To run or execute the stored program,
the processor fetches the instructions
from memory sequentially.
➢ The fetched instructions are then
decoded and executed by the digital
hardware.
➢ Large, complex program execution is a
series of memory reads and instruction
executions.
➢ Operation of HW is governed by a clock
Clock period
Computer Architecture
❑ Objective: design a computer to meet
functional requirements as well as price,
power, and performance goals.
➢ inspired by the target market (PC/embedded/…)
➢ Use abstract layer representation to hide lower-
level detail → reduce design complexity
❑ Throughput
➢ Total work done per unit time e.g., tasks/transactions/… per hour
❑ Our focus: CPU time = time the CPU spent processing a job
➢ Does NOT count I/O time, other jobs’ shares
➢ Can be measured in seconds or number of CPU clock cycles
Clock period
Clock (cycles)
Data transfer
& computation
Update state
❑ Sequence 1: IC = 5 ❑ Sequence 2: IC = 6
❑ Clock Cycles ❑ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
❑ Avg. CPI = 10/5 = 2.0 ❑ Avg. CPI = 9/6 = 1.5
❑ Performance depends on
➢ Hardware: affects clock rate (limited by the power wall!), CPI (parallelism)
➢ Algorithm: affects IC, possibly CPI
➢ Programming language: affects IC, CPI
➢ Compiler: affects IC, CPI
➢ Instruction set architecture: affects IC, CPI, Clock cycle time
Quizz 1
❑ Suppose we have two implementations of the same ISA for a
given program. Which one is faster and by how much?
Machine Clock cycle CPI
A 250 (ps) 2.0
B 500 (ps) 1.2
❑ Solution:
CPU Time = Instruction Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps
CPU Time = Instruction Count CPI Cycle Time
B B B A is faster…
= I 1.2 500ps = I 600ps
CPU Time
B = I 600ps = 1.2
…by this much
CPU Time I 500ps
A
Quizz 2
❑ Given: instruction mix of a program on a computer
Classi Freqi CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
➢ What is average CPI? What is the % of time used by each instruction class?
➢ How faster would the machine be if load time is 2 cycles? What if two ALU
instructions could be executed at once?
❑ Solution:
➢ Average CPI = 0.5x1+0.2x5+0.1x3+0.2x2 = 2.2. Time percentages: 23%, 45%,
14%, 18%.
𝑜𝑙𝑑 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If load time is 2 cycles: = = = 1.38
𝑛𝑒𝑤 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑛𝑒𝑤 𝐶𝑃𝐼 1.6
𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If two ALU instructions could be executed at once: = = 1.13
𝑛𝑒𝑤 𝐶𝑃𝐼 1.95
A brief review of binary numbers
❖ You’ll have to conduct a detailed review by yourselves after this
session.
Binary representations of integers
❑ Natural numbers: unsigned binary
MSB LSB
❑ Negative numbers
➢ Sign-magnitude: one bit is used for the sign, the remaining represent the
magnitude of the number → several disadvantages.
➢ Two’s complement: the positive half (from 0 to 231 -1) use the same bit
pattern as unsigned binary. The negative half begins with 1000 . . . 0000two
representing – 231 and ends with 1111 . . . 1111two = -1.
Binary number conversions
❑ Given an 𝑛-bit two’s compliment number
❑ Example: 1111 1111 1111 1111 1111 1111 1111 11002 = ?10
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
❑ Example: -510 = ?2 (two’s complement)
1. Convert magnitude to binary: 510 = 0101
2. Invert bits: 1010
3. Add 1 to lsb: + 1
10112
Some useful shortcuts
❑ Sign extension
➢ How does computer convert a two’s complement number stored in 8 bit
format to its 16 bit equivalent?
❑ Negation
➢ Is there a quick way to negate a two’s complement binary number?
Hexadecimal representation
❑ Binary numbers are written in long, tedious strings of 0s
and 1s
➢ Hexadecimal: a higher base that can be easily converted to binary
❑ Easy binary-hexadecimal conversion
Binary representation of fractions
❑ Binary point is implied
❑ Fixed point: the number of integer and fraction bits must be
agreed upon (fixed) beforehand.
➢ Example: What’s the binary representation of 6.7510 using 4 integer bits
and 4 fraction bits?
01101100, implying:
0110.1100
22 + 21 + 2-1 + 2-2 =6.75
❑ Floating point: binary point floats to the right of the MSB
➢ Similar to decimal scientific notation: −2340 = −2.34 × 103 (normalized,
i.e. exactly one non-zero digit appears before the point), or −0.234 × 104
(not normalized)
➢ Normalized binary representation: ±1.xxxxxxx2 × 2yyyy → significand =
±1.xxxxxxx2, and fraction = xxxxxxx2. Notice that the exponent is also
binary, i.e. exponent = yyyy2, but the notation was dropped in the above
expression for simplification.
IEEE 754 Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
0 X 00000000 00000000000000000000000
∞ 0 11111111 00000000000000000000000
-∞ 1 11111111 00000000000000000000000
▪ 0.5 × 2 = 1.0
▪ Stop when fractional part is 0
➢ Fraction = (0.1101)2 = (1.101)2 × 2 –1 (Normalized)
➢ Exponent = –1 + Bias = 126 (single precision) and 1022 (double)
Single
10111111010100000000000000000000
Precision
10111111111010100000000000000000 Double
Precision
00000000000000000000000000000000
Underflow and denormal numbers
❑ Numbers below 2–126 are too small to be represented
(underflow) → how do we extend the range to 0?
❑ Denormalized: change the behavior of IEEE 754 numbers
when Exponent is 0 and Fraction ≠ 0
➢ Implicit 1. before the fraction now becomes 0. (not normalized)
➢ Ends up losing precision at small numbers but provide gradual underflow to 0
+
-∞ Normalized (–ve) Denorm Denorm Normalized (+ve)
∞
-2128 -2–126 0 2–126 2128
Summary
❑ Course overview
➢ You’ll learn the HW/SW interface in a modern (=Von Neumann) computer
❑ Computer architecture
➢ Represented in abstract layers to manage complexity
➢ ISA = what the computer does; Organization = how the ISA is implemented;
Realization = implementation of the ISA on specific integrated circuits.
❑ Computer performance
➢ Is reciprocal of CPU time
➢ Also follows the classical CPU performance equation