0% found this document useful (0 votes)
3 views

Slide 1

The document outlines the curriculum and course information for ELT3047 Computer Architecture at VNU Hanoi, detailing prerequisites, grading criteria, and key topics covered. It discusses the evolution of computers, modern computing devices, and the fundamentals of computer architecture, including the hardware/software interface and performance metrics. Additionally, it emphasizes the importance of instruction set architecture and organization in designing efficient computing systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Slide 1

The document outlines the curriculum and course information for ELT3047 Computer Architecture at VNU Hanoi, detailing prerequisites, grading criteria, and key topics covered. It discusses the evolution of computers, modern computing devices, and the fundamentals of computer architecture, including the hardware/software interface and performance metrics. Additionally, it emphasizes the importance of instruction set architecture and organization in designing efficient computing systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ELT3047 Computer Architecture

Lecture 1: Introduction

Hoang Gia Hung


Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Curriculum overview
❑ VNU-UET’s computer engineering curriculum
INT1008 INT2210 INT2290
Intro. to prog. DS & algo. C prog.

INT2241 INT3217 INT3402


Princ. of OS Sys. Prog. Compilers

ELT3047E Computer
Architecture

ELT3048 Digital
Design & Comp. Org.

ELT2041 Digital Elec.

ELT2040 Analog Elec.

ELT2032 Semiconductor

ELT2030 Elec. Eng.

EPN1095 Physics
Course information
❑ Lecturers
➢ Hoàng Gia Hưng, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Dương Minh Ngọc, Dept. Elect. Comp. Eng. (R702-E3, 144 Xuan Thuy)
➢ Appointment-based consultation
❑ Pre-requisites: INT1008/2290, ELT2041.
❑ Text book: David Patterson and John Hennessy, “Computer
Organization & Design: The Hardware/Software Interface” (5th
Ed.), Morgan Kaufmann Publishers, 2014.
❑ Grading:
➢ Quizzes/essays: 20%
➢ Midterm: 20%
➢ Final: 60%
❑ Some ground rules:
➢ Respect
➢ Honest
➢ Proactive
Old-time computers
❑ Mechanical computers
➢ Schickhard (1623), Pascal
(1642)
➢ Babbage (1823 – Difference
Engine, 1834 – Analytical
Engine)

❑ The Electronic Era (1946-


1974)
➢ ENIAC (1943-1946)
➢ EDVAC (1944-1952) & Von
Neumann computer (1945)
➢ Mark I-IV (1944-1952) Harvard
architecture

❑ Modern computers
➢ The PC Era (1975-2009)
➢ The Post-PC Era (2010-present)
Device trend
Modern computers
❑ Personal Mobile Device (PMD)
✓ e.g. smart phones, tablet computers
✓ Emphasis: energy efficiency & real-time

❑ Desktop Computing
✓ Emphasis on price-performance

❑ Servers
✓ Emphasis: availability, scalability, throughput

❑ Warehouse Scale Computers


✓ Used for “Software as a Service (SaaS)”
✓ Emphasis: availability, price-performance

❑ IoT/Embedded Computers
✓ Emphasis: price
What exactly is this course about?

❑ Course contents: you’ll learn what’s under the hood of a


modern computer
➢ How programs are translated into the machine language
✓ And how the hardware executes them
➢ The hardware/software interface
✓ How does software instruct the hardware to perform needed functions?
➢ What determines program performance
✓ And can a programmer improve the performance?
➢ How hardware designers improve performance
Below your program

❑ Application software
✓ Written in high-level language

❑ System software
✓ Compiler: translates HLL code to machine code
✓ Operating System: service code
▪ Handling input/output
▪ Managing memory and storage
▪ Scheduling tasks & sharing resources

❑ Hardware
✓ Electronic components organized in
accordance with a certain design
✓ Examples of principal components are:
Processor, memory, I/O controllers
Levels of Program Code

❑ High-level language
➢ Level of abstraction closer to problem
domain
➢ Provides for productivity and
portability
❑ Assembly language
➢ Human-readable format of
instructions
❑ Machine language
➢ Computer-readable format
➢ Binary digits (bits)
➢ Encoded instructions and data
The Five Classic Components of a (Von
Neumann) Computer
❑ A central arithmetical unit
capable of perform the
elementary operations of
arithmetic (Datapath)
❑ A central control unit capable
of logical control of the device,
i.e. properly sequencing of its
operations (Control) + Network
❑ A main memory, which stores both data and instructions (Memory)
➢ Stored-program concept
❑ Input units to transfer information from the outside recording medium
(R) into its specific parts C (CA+CC) and M (Input).
❑ Output units to transfer to transfer information from C (CA+CC) and M
to R (Output).
❖ Major components are interconnected through system bus
How is a stored program executed?
❑ A program written in HLL is a series of
instructions, which will be turn into
binary numbers, just like data, and
stored in memory.
➢ c.f. Harvard architecture
❑ To run or execute the stored program,
the processor fetches the instructions
from memory sequentially.
➢ The fetched instructions are then
decoded and executed by the digital
hardware.
➢ Large, complex program execution is a
series of memory reads and instruction
executions.
➢ Operation of HW is governed by a clock
Clock period
Computer Architecture
❑ Objective: design a computer to meet
functional requirements as well as price,
power, and performance goals.
➢ inspired by the target market (PC/embedded/…)
➢ Use abstract layer representation to hide lower-
level detail → reduce design complexity

❑ Computer engineers mainly involve with:


➢ Instruction set architecture (ISA): what the
computer does (programmer’s view).
➢ Organization: how the ISA is implemented
(physical view of the computer)
➢ Hardware: implementation of the ISA on
specific logic design & packaging technology.
➢ In this course, we’ll be focusing ISA &
organization only.
Defining computer performance
❑ Response time
➢ Time between start and completion of a task (on a machine X), as observed
1
by end user: 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋 =
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑋

❑ Throughput
➢ Total work done per unit time e.g., tasks/transactions/… per hour

❑ Response time vs. Throughput


➢ Decreasing execution time improves throughput: less time to run a task ⇒
more tasks can be executed
➢ Increasing throughput can also improve response time: even execution time
of individual sequential tasks is not changed, more tasks can be executed in
parallel ⇒ less waiting time in scheduling queue.
❑ In this course, we will primarily focus on response time
➢ Performance are relative: engineers want future generation of computers
(X) they are designing to be 𝑛 times faster than the current generation (Y)
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑌
= =𝑛
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑌 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑋
Measuring Execution Time
❑ Elapsed time = Total response time, including all aspects
➢ Processing, I/O, OS overhead, idle time
➢ Useful, but often not very good for designers (why?)

❑ Our focus: CPU time = time the CPU spent processing a job
➢ Does NOT count I/O time, other jobs’ shares
➢ Can be measured in seconds or number of CPU clock cycles
Clock period
Clock (cycles)

Data transfer
& computation
Update state

CPU Time = CPU Clock Cycles  Clock Cycle Time


CPU Clock Cycles
=
Clock Rate
Example 1
❑ Given Computer A: 2GHz clock, 10s CPU time. Task: design
Computer B with following specs:
➢ Aim for 6s CPU time
➢ Can do faster clock, but causes 1.2 × clock cycles
❑ How fast must Computer B clock be?
❑ Solution:

Clock Cycles B 1.2  Clock Cycles A


Clock Rate B = =
CPU Time B 6s
Clock Cycles A = CPU Time A  Clock Rate A
= 10s  2GHz = 20  10 9
1.2  20  10 9 24  10 9
Clock Rate B = = = 4GHz
6s 6s
Instruction Count and CPI

❑ Instructions of a program is sequentially fetched & executed at a


constant clock rate.
➢ #clock cycles depends on #instructions in a program (instruction count) &
#cycles required by each instruction (e.g. multiplication > addition)
➢ In general, #clock cycles = IC x CPI (average #cycles per instruction).
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝐶𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝐶×𝐶𝑃𝐼
= 𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
➢ CPI provides one way of comparing different implementations of the same
CPI in More Detail
❑ The average number of cycles per instruction

➢ It’s unrealistic to count #clock cycles for every instruction in a program.

❑ In practice, CPI depends on a wide variety of design details


➢ HW factors: the memory system and the processor structure;
➢ SW factors: the mix of instruction types executed in an application

❑ Each instruction classes (e.g. ALU, MEM, …) has different CPI


➢ If a program has 𝑛 different classes of instructions with corresponding 𝐶𝑃𝐼𝑖
and instruction count 𝐼𝐶𝑖 , then 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = σ𝑛𝑖=0 𝐶𝑃𝐼𝑖 × 𝐼𝐶𝑖 .
➢ The (weighted average) CPI of the program is
𝑛
𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝐼𝐶𝑖
𝐶𝑃𝐼 = = ෍ 𝐶𝑃𝐼𝑖 ×
𝐼𝐶 𝐼𝐶
𝑖=1 Relative frequency
Example 2
❑ Alternative compiled code sequences using instructions in classes A, B,
C. Which code sequence executes the most instructions? Which will be
faster? What is the CPI for each sequence?
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

❑ Sequence 1: IC = 5 ❑ Sequence 2: IC = 6
❑ Clock Cycles ❑ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
❑ Avg. CPI = 10/5 = 2.0 ❑ Avg. CPI = 9/6 = 1.5

➢ Sequence 2 is faster, even though it executes one extra instruction.


Performance summary
❑ The iron law of CPU performance
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝐶𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝐶 × 𝐶𝑃𝐼
=
𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
➢ Cycle time: reciprocal of the CPU frequency, provided by manufacturer
➢ Instruction count: measured by software tools (profiler) or hardware counters
➢ CPI: obtain by simulation or hardware counters

❑ Performance depends on
➢ Hardware: affects clock rate (limited by the power wall!), CPI (parallelism)
➢ Algorithm: affects IC, possibly CPI
➢ Programming language: affects IC, CPI
➢ Compiler: affects IC, CPI
➢ Instruction set architecture: affects IC, CPI, Clock cycle time
Quizz 1
❑ Suppose we have two implementations of the same ISA for a
given program. Which one is faster and by how much?
Machine Clock cycle CPI
A 250 (ps) 2.0
B 500 (ps) 1.2

❑ Solution:
CPU Time = Instruction Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps
CPU Time = Instruction Count  CPI  Cycle Time
B B B A is faster…
= I  1.2  500ps = I  600ps
CPU Time
B = I  600ps = 1.2
…by this much
CPU Time I  500ps
A
Quizz 2
❑ Given: instruction mix of a program on a computer
Classi Freqi CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2

➢ What is average CPI? What is the % of time used by each instruction class?
➢ How faster would the machine be if load time is 2 cycles? What if two ALU
instructions could be executed at once?

❑ Solution:
➢ Average CPI = 0.5x1+0.2x5+0.1x3+0.2x2 = 2.2. Time percentages: 23%, 45%,
14%, 18%.
𝑜𝑙𝑑 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If load time is 2 cycles: = = = 1.38
𝑛𝑒𝑤 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑛𝑒𝑤 𝐶𝑃𝐼 1.6
𝑜𝑙𝑑 𝐶𝑃𝐼 2.2
➢ If two ALU instructions could be executed at once: = = 1.13
𝑛𝑒𝑤 𝐶𝑃𝐼 1.95
A brief review of binary numbers
❖ You’ll have to conduct a detailed review by yourselves after this
session.
Binary representations of integers
❑ Natural numbers: unsigned binary
MSB LSB

❑ Negative numbers
➢ Sign-magnitude: one bit is used for the sign, the remaining represent the
magnitude of the number → several disadvantages.
➢ Two’s complement: the positive half (from 0 to 231 -1) use the same bit
pattern as unsigned binary. The negative half begins with 1000 . . . 0000two
representing – 231 and ends with 1111 . . . 1111two = -1.
Binary number conversions
❑ Given an 𝑛-bit two’s compliment number

x = −xn−12n−1 + xn−2 2n−2 +  + x121 + x 0 20


➢ Leading bit is the sign bit (0 → +ve, 1 → -ve)

❑ Example: 1111 1111 1111 1111 1111 1111 1111 11002 = ?10
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
❑ Example: -510 = ?2 (two’s complement)
1. Convert magnitude to binary: 510 = 0101
2. Invert bits: 1010
3. Add 1 to lsb: + 1
10112
Some useful shortcuts
❑ Sign extension
➢ How does computer convert a two’s complement number stored in 8 bit
format to its 16 bit equivalent?

❑ Negation
➢ Is there a quick way to negate a two’s complement binary number?
Hexadecimal representation
❑ Binary numbers are written in long, tedious strings of 0s
and 1s
➢ Hexadecimal: a higher base that can be easily converted to binary
❑ Easy binary-hexadecimal conversion
Binary representation of fractions
❑ Binary point is implied
❑ Fixed point: the number of integer and fraction bits must be
agreed upon (fixed) beforehand.
➢ Example: What’s the binary representation of 6.7510 using 4 integer bits
and 4 fraction bits?
01101100, implying:

0110.1100
22 + 21 + 2-1 + 2-2 =6.75
❑ Floating point: binary point floats to the right of the MSB
➢ Similar to decimal scientific notation: −2340 = −2.34 × 103 (normalized,
i.e. exactly one non-zero digit appears before the point), or −0.234 × 104
(not normalized)
➢ Normalized binary representation: ±1.xxxxxxx2 × 2yyyy → significand =
±1.xxxxxxx2, and fraction = xxxxxxx2. Notice that the exponent is also
binary, i.e. exponent = yyyy2, but the notation was dropped in the above
expression for simplification.
IEEE 754 Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction

x = ( −1) S  (1+ Fraction)  2(Exponent −Bias)


❑ S: sign bit (0 → non-negative, 1 → negative)
❑ Normalize significand: 1.0 ≤ |significand| < 2.0
➢ Always has a leading pre-binary-point 1 bit, so no need to represent it
explicitly (hidden bit)
➢ Significand is Fraction with the “1.” restored
❑ Exponent = actual exponent + Bias (excess representation)
➢ Ensures exponent is unsigned
➢ Single: Bias = 127; Double: Bias = 1023
❑ Example: What number is represented by the single-precision
float 11000000101000…00?
➢ S = 1, Fraction = 01000…002, Exponent = 100000012 = 129
➢ x = (–1)1 × (1 + 012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0
IEEE 754 Special Cases

Number Sign Exponent Fraction

0 X 00000000 00000000000000000000000

∞ 0 11111111 00000000000000000000000

-∞ 1 11111111 00000000000000000000000

NaN X 11111111 non-zero


IEEE 754 Ranges
single: 23 bits single: 8 bits single: 127
double: 52 bits double: 11 bits double: 1023

x = ( −1) S  (1+ Fraction)  2(Exponent −Bias)


❑ Exponents 000…00 and 111…11 are reserved
❑ Single-precision range:
➢ Smallest value: Exponent = 00000001 (i.e. actual exponent = 1 – 127 = –
126), Fraction: 000…00 → significand = 1.0 → the smallest values are ±1.0 ×
2–126 ≈ ±1.2 × 10–38.
➢ Largest value: Exponent = 11111110 (i.e. actual exponent = 254 – 127 =
+127), Fraction: 111…11 → significand ≈ 2.0 → the largest values are ±2.0 ×
2+127 ≈ ±3.4 × 10+38.
❑ Double-precision range:
➢ Smallest value: Exponent = 00000000001 (i.e. actual exponent = 1 – 1023 =
–1022), Fraction: 000…00 → significand = 1.0 → the smallest values are
±1.0 × 2–1022 ≈ ±2.2 × 10–308.
➢ Largest value: Exponent = 11111111110 (i.e. actual exponent = 2046 – 1023 =
+1023), Fraction: 111…11 → significand ≈ 2.0 → the largest values are ±2.0
× 2+1023 ≈ ±1.8 × 10+308.
Example

❑ Convert –0.8125 to binary in single and double precision.


❑ Solution:
➢ Fraction bits can be obtained using multiplication by 2
▪ 0.8125 × 2 = 1.625
▪ 0.625 × 2 = 1.25 0.8125 = (0.1101)
▪ 0.25 × 2 = 0.5 2

▪ 0.5 × 2 = 1.0
▪ Stop when fractional part is 0
➢ Fraction = (0.1101)2 = (1.101)2 × 2 –1 (Normalized)
➢ Exponent = –1 + Bias = 126 (single precision) and 1022 (double)

Single
10111111010100000000000000000000
Precision
10111111111010100000000000000000 Double
Precision
00000000000000000000000000000000
Underflow and denormal numbers
❑ Numbers below 2–126 are too small to be represented
(underflow) → how do we extend the range to 0?
❑ Denormalized: change the behavior of IEEE 754 numbers
when Exponent is 0 and Fraction ≠ 0
➢ Implicit 1. before the fraction now becomes 0. (not normalized)
➢ Ends up losing precision at small numbers but provide gradual underflow to 0

❑ Value of denormalized number (S, 0, F)


➢ Single precision: (–1)S × (0.F)2 × 2–126
➢ Double precision: (–1)S × (0.F)2 × 2–1022

Negative Negative Positive Positive


Overflow Underflow Underflow Overflow

+
-∞ Normalized (–ve) Denorm Denorm Normalized (+ve)

-2128 -2–126 0 2–126 2128
Summary
❑ Course overview
➢ You’ll learn the HW/SW interface in a modern (=Von Neumann) computer

❑ Von Neumann model


➢ Instructions are stored in a linear memory → can be modified like data.
➢ Instructions are fetched and executed in CPU sequentially.

❑ Computer architecture
➢ Represented in abstract layers to manage complexity
➢ ISA = what the computer does; Organization = how the ISA is implemented;
Realization = implementation of the ISA on specific integrated circuits.

❑ Computer performance
➢ Is reciprocal of CPU time
➢ Also follows the classical CPU performance equation

❑ Next week: ISA design.

You might also like