CCS 1202 Lecture 2_Computer Evolution and Performance
CCS 1202 Lecture 2_Computer Evolution and Performance
1
Introduction to Computer Evolution
• Definition: The development and transformation of computer systems over
time.
• Key Milestones:
• First Generation (1946-1954): Vacuum tubes; examples: ENIAC, UNIVAC.
• Second Generation (1955-1965): Transistors; examples: IBM 1401.
• Third Generation (1968-1975): Integrated Circuits (ICs); examples: IBM
System/360.
• Fourth Generation (1976-1980): Microprocessors; examples: Intel 4004.
• Fifth Generation (1980-Present): Advanced microprocessors, AI, parallel
processing.
Second Generation Transistors Reliable, faster, still costly, generated less heat,
(1955 – 1965) smaller size than first generation.
Third Generation (1968 Integrated Circuits(IC) More reliable, faster, still costly, less
– 1975) maintenance, smaller size than second
generation.
Fourth Generation Very Large Scale Very cheap, portable and reliable, very small
(1976 – 1980) Integrated Circuits (VLSI) size
Fifth Generation (1980 Ultra Large Scale Very powerful and compact computers at
– today) Integrated Circuits (ULSI) cheaper rates
3
Embedded Systems
• The use of electronics and software within a product
• Billions of computer systems are produced each year that are
embedded within larger devices
• Today many devices that use electric power have an embedded
computing system
• Often embedded systems are tightly coupled to their
environment
• This can give rise to real-time constraints imposed by the need to interact
with the environment
• Constraints such as required speeds of motion, required precision of
measurement, and required time durations, dictate the timing of software
operations
• If multiple activities must be managed simultaneously this imposes more
complex real-time constraints
4
Deeply Embedded Systems
• Subset of embedded systems
• Uses a microcontroller rather than a microprocessor
• Is not programmable once the program logic for the device has been
burned into ROM
• Dedicated, single-purpose devices that detect something in the
environment, perform a basic level of processing, and then do
something with the results
• Often have wireless capability and appear in networked configurations,
such as networks of sensors deployed over a large area
• Typically have extreme resource constraints in terms of memory,
processor size, time, and power consumption
• Examples: Smart Tvs, game consoles, pacemakers, Voice Assistants
5
The Internet of Things (IoT)
• Term that refers to the expanding interconnection of smart devices, ranging
from appliances to tiny sensors
• Is primarily driven by deeply embedded devices
• Generations of deployment culminating in the IoT:
• Information technology (IT)
• PCs, servers, routers, firewalls, and so on, bought as IT devices by enterprise IT people and
primarily using wired connectivity
• Operational technology (OT)
• Machines/appliances with embedded IT built by non-IT companies, such as medical machinery,
SCADA, process control, and kiosks, bought as appliances by enterprise OT people and
primarily using wired connectivity
• Personal technology
• Smartphones, tablets, and eBook readers bought as IT devices by consumers exclusively using
wireless connectivity and often multiple forms of wireless connectivity
• Sensor/actuator technology
• Single-purpose devices bought by consumers, IT, and OT people exclusively using wireless
connectivity, generally of a single form, as part of larger systems
6
Performance
• Performance is the key to understanding underlying motivation for the
hardware and its organization
• Measure, report, and summarize performance to enable users to
• make intelligent choices
• see through the marketing hype!
7
The Role of Performance
Hardware performance is a key to the effectiveness of the entire
system
Performance has to be measured and compared to evaluate
various design and technological approaches
To optimize the performance, major affecting factors have to be
known
For different types of applications, different performance metrics
may be appropriate and different aspects of a computer system
may be most significant
Instructions use and implementation, memory hierarchy and I/O
handling are among the factors that affect the performance
8
Historical Context and Performance Metrics
• Key Factors in Historical Performance:
• Clock Speed (MHz/GHz): Indicates how fast a processor can execute
instructions.
• Instruction Set Architecture (ISA): The set of commands a processor can
execute.
• Memory Size: Evolution from kilobytes to terabytes.
• Performance Benchmarks:
• MIPS (Millions of Instructions Per Second).
• FLOPS (Floating Point Operations Per Second) for high-performance
computing.
10
Cost-Performance
• Purchasing perspective: from a collection of machines, choose one
which has
• best performance?
• least cost?
• best performance/cost?
• Computer designer perspective: faced with design options, select
one which has
• best performance improvement?
• least cost?
• best performance/cost?
• Both require: basis for comparison and metric for evaluation
11
Two “notions” of performance
• Which computer has better performance?
• User: one which runs a program in less time
• Computer centre manager: one which completes more jobs in a given time
12
Computer Performance
• Response Time (elapsed time, latency):
• how long does it take for my job to run? Individual user
• how long does it take to execute (start to finish) my job? concerns…
14
Performance Metrics
Response (execution) time:
• The time between the start and the completion of a task
• Measures user perception of the system speed
• Common in reactive and time critical systems, single-user computer, etc.
Throughput:
• The total number of tasks done in a given time
• Most relevant to batch processing (billing, credit card processing, etc.)
• Mainly used for input/output systems (disk access, printer, etc)
Examples:
Replacing the processor of a computer with a faster version
Enhance BOTH response time and throughput
Adding additional processors to a system that uses multiple processors
for separate tasks (e.g. handling of airline reservations system)
Improves throughput ONLY
16
Designer’s Performance Metrics
• Users and designers measure performance using different metrics
• Designers looks at the bottom line of program execution
CPU execution time for a program=CPU clock cycles for a program×Clock cycle time
17
Calculation of CPU Time
CPU time = Instruction count CPI Clock cycle time
Or Instruction count×CPI
CPU time=
Clock rate
18
Execution Time
• Elapsed Time
• counts everything (disk and memory accesses, waiting for I/O, running other programs, etc.)
from start to finish
• a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
• doesn't count waiting for I/O or time spent running other programs
• can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
• Our focus: user CPU time (CPU execution time or, simply, execution time)
• time spent executing the lines of code that are in our program
19
CPU Execution Time
CPU time=CPU clock cycles for a program ×Clock cycle time
CPU clock cycles for a program
CPUtime=
Clock rate
Definitions
• Instruction count (IC) = Number of instructions executed
• Clock cycles per instruction (CPI)
CPU clock cycles for a program
CPI =
IC
CPI - one way to compare two machines with same instruction set,
since Instruction Count would be the same
20
CPU Time
• CPU execution time can be measured by running the program
• The clock cycle is usually published by the manufacture
• Measuring the CPI and instruction count is not trivial
• Instruction counts can be measured by: a software profiling, using an
architecture simulator, using hardware counters on some architecture
• The CPI depends on many factors including: processor structure, memory
system, the mix of instruction types and the implementation of these
instructions
n
• Designers uses the following formula: CPU clock cycles=∑ CPI i ×C i
i=1
cycle time
tick
tick
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec, 1 MHz. = 106 cycles/sec)
1
• Example: A 200 Mhz. clock has a 10 9 5 nanoseconds cycle time
200 10 6
22
How many cycles are required for a program?
time
23
Performance Equation
seconds cycles seconds
program program cycle
equivalently
CPU execution time = CPU clock cycles Clock cycle time
for a program for a program
24
Example
• Our favorite program runs in 10 seconds on computer A, which has a 400Mhz.
clock.
• We are trying to help a computer designer build a new machine B, that will run
this program in 6 seconds. The designer can use new (or perhaps more
expensive) technology to substantially increase the clock rate, but has informed
us that this increase will affect the rest of the CPU design, causing machine B to
require 1.2 times as many clock cycles as machine A for the same program.
25
Example: Answer
CPU clock cycles
CPU time ( A )=
Clock rate ( A )
CPU clock cycles of the program
10 seconds=
400 ×10 6 cycles/second
CPU clock cycles of the program = 10 seconds × 400 ×106 cycles/second
= 4000 ×106 cycles
To get the clock rate of the faster computer, we use the same formula
6
1 .2 × CPU clock cycles of the program 1 .2 × 4000 ×10 cycles
6 seconds= =
clock rate ( B ) clock rate ( B )
6
1 . 2 × 4000 ×10 cycles
clock rate (B )= =800 ×106 cycles/second
6 second
26
Quantitative Principles of Design
• Where to spend time making improvements?
Make the Common Case Fast
Most important principle of computer design:
Spend your time on improvements where those improvements will do the
most good
Example
Even if you can drive the time for A to 0, the CPU will only be 5% faster
• Key questions
What is the frequent case?
27
Amdahl’s Law
• Suppose that we make an enhancement to a machine that will
improve its performance; Speedup ratio:
ExTime for entire task without enhancement
Speedup=
ExTime for entire task using enhancement
Performance for entire task using enhancement
Speedup=
Performance for entire task without enhancement
28
An Example
• Enhancement runs 10 times faster and it affects 40% of the
execution time
• Fractionenhanced = 0.40
• Speedupenhanced = 10
1
• Speedupoverall = ? Speedup=
Fractionenhanced
1− Fractionenhanced +
Speedup enhanced
1 1
Speedup= = ≈ 1. 56
0 . 4 0 . 64
1− 0 . 4 +
10
29
Enhancing Performance: Hardware Strategies
• Multi-Core and Many-Core Processors:
• Increased parallelism by incorporating multiple processing units.
• Examples: Intel Core i7, AMD Ryzen.
• Vector Processing and SIMD (Single Instruction, Multiple Data):
• Specialized instructions that process multiple data points simultaneously.
• GPU Acceleration:
• Highly parallelized architecture, ideal for tasks like rendering and AI.
30
Enhancing Performance: Software and Algorithmic Strategies
• Optimizing Code:
• Techniques like loop unrolling, using efficient data structures.
• Parallel Programming:
• Using frameworks such as OpenMP, CUDA for multi-threaded processing.
• Cache Optimization:
• Minimizing cache misses through data locality.
• Compiler Optimizations:
• Advanced compiler techniques that restructure code for better
performance.
• High-Level Abstractions:
• Use of parallel algorithms in higher-level programming languages.
31
Challenges in Performance Enhancement
• Power Consumption and Heat Dissipation:
• Challenges with keeping performance high while managing thermal
output.
• Memory Wall:
• The gap between CPU speed and memory access speed.
• Amdahl’s Law Limitation:
• Real-world scenarios often make it impossible to achieve theoretical
maximum speedups.
• Scalability Issues:
• Managing parallelism effectively as processor count increases.
32