0% found this document useful (0 votes)
2 views

Chapter01

The document provides an overview of the evolution of computer technology, detailing the progression from vacuum tubes to modern microprocessors, including key developments like the Von Neumann architecture and Intel's x86 evolution. It discusses performance issues in computer design, including techniques for improving microprocessor speed and the impact of Moore's Law on chip development. Additionally, it covers embedded systems and ARM architecture, emphasizing their significance in current computing environments.

Uploaded by

buiphat251399
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter01

The document provides an overview of the evolution of computer technology, detailing the progression from vacuum tubes to modern microprocessors, including key developments like the Von Neumann architecture and Intel's x86 evolution. It discusses performance issues in computer design, including techniques for improving microprocessor speed and the impact of Moore's Law on chip development. Additionally, it covers embedded systems and ARM architecture, emphasizing their significance in current computing environments.

Uploaded by

buiphat251399
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

1

Computer Evolution &


Designing for
Performance
CHAPTER 01
2

Contents

 An overview of the evolution of computer technology.

 Von Neuman (IAS) machine.

 Understand the key performance issues relating to computer


design.

 Present an overview of the evolution of the x86 architecture.

 The issues in computer performance assessment.


3

Evolution of • The generations of computer


Computer • Evolution of Intel processors

Technology
4

The First Generation


Vacuum tube

 The ENIAC (Electronic Numerical Integrator And Computer),


designed and constructed at the University of Pennsylvania:
The world’s first general purpose electronic digital computer.

 Weighing 30 tons, occupying 1500 square feet of floor space,


and containing more than 18,000 vacuum tubes. When
operating, it consumed 140 kilowatts of power. It was capable
of 5000 additions per second.

 The major drawback of the ENIAC was that it had to be programmed


manually by setting switches, plugging and unplugging cables
5

The First Generation


The VON NEUMANN MACHINE:

 In 1946, von Neumann and his colleagues began the design


of a new stored-program computer, referred to as the IAS
computer, at the Princeton Institute for Advanced Studies.

 Although not completed until 1952, is the prototype of all


subsequent general-purpose computers.
6

The First Generation


The Structure of IAS computer:
7

The IAS memory structure


 The memory of the IAS consists of 1000 storage locations,
called words (40 bits) each for both data and instructions.
 Each instruction is a binary code. Each number is represented
by a sign bit and a 39-bit value.
 A word may also contain two 20-bit instructions, with each
instruction consisting of an 8-bit operation code (opcode)
specifying the operation to be performed and a 12-bit
address designating one of the words in memory (numbered
from 0 to 999).
 The control unit operates the IAS by fetching instructions from
memory and executing them one at a time.
8
9

The IAS memory formats


10

The IAS registers

 The control unit operates the IAS by fetching instructions from


memory and executing them one at a time

 Both the control unit and the ALU contain storage locations,
called registers,
11

The IAS registers


 Memory buffer register (MBR): word to be stored in memory or
sent to the I/O unit, or is used to receive a word from memory or
from the I/O unit.
 Memory address register (MAR): Specifies the address in memory
of the word to be written from or read into the MBR.
 Instruction register (IR): Contains the 8-bit opcode instruction
being executed.
 Instruction buffer register (IBR): Employed to hold temporarily the
right-hand instruction from a word in memory.
 Program counter (PC): Contains the address of the next
instruction pair to be fetched from memory.
 Accumulator (AC) and multiplier quotient (MQ):
12

The IAS instruction set example


13

The First Generation


THE COMMERCIAL COMPUTERS
 The UNIVAC I (Universal Automatic Computer) (1950) was the first
successful commercial computer. It was intended for both scientific
and commercial applications.
 The UNIVAC II, which had greater memory capacity and higher
performance than the UNIVAC I, was delivered in the late 1950s.
 The IBM 701 (1953), which was delivered by IBM - the first electronic
stored-program computer (punched-card processing equipment),
which intended primarily for scientific applications.
 The IBM 702 (1955) which had a number of hardware features that
suited it to business applications.
14

The Second Generation


TRANSISTORS

The second generation:

 More complex arithmetic and logic units and control units,


the use of high-level programming languages, and
the provision of system software with the computer.
15

The Second Generation


 PDP-1 (1957), delivered by DEC

 IBM 7094
16

The Third Generation


INTEGRATED CIRCUITS (IC)
17

The Third Generation


THE IBM SYSTEM/360

 The industry’s first planned family of computers which covered a


wide range of performance and cost.
18

The Third Generation


THE DEC PDP-8

 Low cost: $16,000 in comparison with $100,000 of IBM


system/360 series,

 Small size: Another manufacturer purchase a PDP-8 and


integrate it into a total system for resale
19

Later Generations
20

Processor Fabrication Process


21

The growth of transistors count


22

The Moore’s Law


 The number of transistors that could be put on the same chip
area was doubling every 18-24 months
 Consequences:
 Processor speed → Faster
 Energy for the operation of processor → less E but if so many trans the heat will be
large
 Memory capacity →
23

THE CONSEQUENCES OF MOORE LAW (p.30)

1. The cost of a chip has remained virtually unchanged with the


rapid growth in density →

2. Logic and memory elements are placed closer together on


more densely packed chips →

3. The computer size →

4. The interconnections on the integrated circuit →


24

Memory Wall

According to the Moore’s


law:

 Instructions/second: 2x
every 18-24 months

 Memory capacity: 2x
every 18-24 months

 Memory performance:
1.1x every 18-24 months
25

The Microprocessors
 In 1971, Intel developed 4004: the first chip to contain all of the
components of a CPU on a single chip. The 4004 can add two
4-bit numbers and can multiply only by repeated addition.

 In 1972, Intel developed 8008. This was the first 8-bit


microprocessor and was almost twice as complex as the 4004.

 In 1974, Intel developed 8080 (8-bit), which was designed to be


the CPU of a general-purpose microcomputer.

 By the end of 70s, general-purpose 16-bit microprocessors


appeared. One of these was the 8086.
26

Evolution of Intel Processors


1970s
27

Evolution of Intel Processors


1980s
28

Evolution of Intel Processors


1990s
29

Evolution of Intel Processors


Recent processors
30

• Microprocessor speed techniques


• Performance balance
Designing for • Multicore, MICS and GPGPUS
performance • Evolution of Intel x86 Architecture
31

Microprocessor speed Techniques


 Pipelining
 Branch prediction
 Speculative execution
 Data flow analysis
32

Pipelining
Simultaneously work on multiple instructions.
33

Other techiques
 Branch prediction: Looks ahead in the instruction code
fetched from memory and predicts which branches, or groups
of instructions, are likely to be processed next
 Speculative execution: Using branch prediction and data flow
analysis, some processors speculatively execute instructions
ahead of their actual appearance in the program execution,
holding the results in temporary locations.
 Data flow analysis: The processor analyzes which instructions
are dependent on each other’s results, or data, to create an

optimized schedule of instructions.


34

Performance balance
Processor → Memory: (p.39)
 Increase the number of bits that are retrieved at one time by
making DRAMs “wider” → wider data bus.
 Reduce the frequency of memory access by incorporating
increasingly complex and efficient cache structures between the
processor and main memory, including the incorporation of one
or more caches on the processor chip as well as on an off-chip
cache close to the processor chip.
 Increase the interconnect bandwidth between processors and
memory by using higher-speed buses.
35

Performance balance
Handling of IO devices (bps)
36

Performance balance
Improvements in Chip Organization and Architecture (p.41):
 To increase the hardware speed of the processor:
Shrinking the size of the logic gates on the processor chip (process of
fabrication)→ the propagation time for signals is significantly reduced
→ speeding up of the processor.
An increase in clock rate → individual operations are executed more
rapidly.
37

Performance balance
 Increase the size and speed of caches between the processor and
main memory (In particular, by dedicating a portion of the processor
chip itself to the cache) → cache access times drop significantly.

 Make changes to the processor organization and architecture that


increase the effective speed of instruction execution. Typically, this
involves using parallelism in one form or another.
38

Performance balance
Processor trends
39
Intel x86 microprocessors over 8 gen and
25 years
40

Multicore, MICS, GPUS


Improvements in Chip Organization and Architecture :
 Increase the hardware speed (clock speed) of the processor:
 Increase heat dissipation (w/cm2)
 RC delay
 Memory latency

 The use of multiple processors on the same chip, also referred to as


multiple cores, or multicore, provides the potential to increase
performance without increasing the clock rate.
 Chip manufacturers are now in the process of making a huge leap
forward in the number of cores per chip (more than 50).
 The leap in performance as well as the challenges in developing
software to exploit such a large number of cores have led to the
introduction of a new term: many integrated core (MIC)
41

Evolution of Intel x86 Architecture


Two processor families:

 Intel x86: the sophisticated design principles once found on


mainframes, supercomputers and serves (CISC – Complex
Instruction Set Computers),

 The ARM architecture is used in a wide variety of embedded


systems and is one of the most powerful and best-designed
RISC-based systems on the market (RISC - Reduced Instruction
Set Computers),
42

CISC vs RISC
Topic for students to do research and present.
43

Evolution of Intel x86 Architecture


 8080 (8-bit) The world’s first general-purpose microprocessor. The 8080
was used in the first personal computer, the Altair.

 8086 (16-bit) sported an instruction cache, or queue, that prefetches a


few instructions before they are executed. A variant of this processor, the
8088, was used in IBM’s first personal computer The 8086 is the first
appearance of the x86 architecture.

 80286: This extension of the 8086 enabled addressing a 16-MByte memory


instead of just 1 MB.

 80386: Intel’s first 32-bit machine. With a 32-bit architecture, the 80386
rivaled the complexity and power of minicomputers and mainframes
introduced just a few years earlier. the first Intel processor to support
multitasking.
44

Evolution of Intel x86 Architecture


 80486: The 80486 introduced the use of much more sophisticated
and powerful cache technology and sophisticated instruction
pipelining. The 80486 also offered a built-in math coprocessor,
offloading complex math operations from the main CPU.
 Pentium: With the Pentium, Intel introduced the use of superscalar
techniques, which allow multiple instructions to execute in parallel.
 Pentium Pro: The Pentium Pro continued the move into superscalar
organization begun with the Pentium, with aggressive use of register
renaming, branch prediction, data flow analysis, and speculative
execution.
 Pentium II: The Pentium II incorporated Intel MMX technology, which
is designed specifically to process video, audio, and graphics data
efficiently.
45

Evolution of Intel x86 Architecture


 Pentium III: The Pentium III incorporates additional floating-point
instructions to support 3D graphics software.

 Pentium 4: The Pentium 4 includes additional floating-point and other


enhancements for multimedia.

 Core: This is the first Intel x86 microprocessor with a dual core,
referring to the implementation of two processors on a single chip.

 Core 2: The Core 2 extends the architecture to 64 bits. The Core 2


Quad provides four processors on a single chip. More recent Core
offerings have up to 10 processors per chip.
46

Evolution of Intel x86 Architecture


 The x86 provides an excellent illustration of the advances in
computer hardware over the past 30 years. The 1978 8086 was
introduced with a clock speed of 5 MHz and had 29,000 transistors.

 A quad-core Intel Core 2 introduced in 2008 operates at 3 GHz, a


speedup of a factor of 600, and has 820 million transistors, about
28,000 times as many s the 8086.

 The Core 2 is in only a slightly larger package than the 8086 and has
a comparable cost.
47

Embedded
• Embedded Systems
Systems and • ARM evolution
ARM
48

Embedded Systems
 A combination of computer hardware and software, and
perhaps additional mechanical or other parts, designed to
perform a dedicated function. In many cases, embedded
systems are part of a larger system or product, as in the case of
an antilock braking system in a car.
49

Embedded Systems
50

Embedded Systems
51

Embedded Systems
52

ARM Evolution
 A family of RISC-based microprocessors and microcontrollers
designed by ARM Inc., Cambridge, England.

 The company doesn’t make processors but instead designs


microprocessor and multicore architectures and licenses them
to manufacturers.

 ARM chips are high-speed processors that are known for their
small die size and low power requirements.

 It is the most widely used embedded processor architecture


and indeed the most widely used processor architecture of
any kind in the world.
53

ARM Evolution
 ARM originated by British-based Acorn Computers company.
 ARM1 (1985) was used for internal research and development as well
as being used as a coprocessor in the BBC machine.
 Also in 1985, Acorn released the ARM2, which had greater
functionality and speed within the same physical space.
ARM processors are designed to meet the needs of three system
categories
 Embedded real-time systems: Systems for storage, automotive body
and power-train, industrial, and networking applications
 Application platforms: Devices running open operating systems
including Linux, Palm OS, Symbian OS, and Windows CE in wireless,
consumer entertainment and digital imaging applications
 Secure applications: Smart cards, SIM cards, and payment terminals
54

Performance
Assessment
55

Parameters
In evaluating processor hardware and setting requirements for
new systems, parameters need considering are:

 Power consumption

 Performance,

 Cost,

 Size,

 Security,

 Reliability
56

Processor Power consumption


Power consumption
Active power

𝑃 ≈ 𝐶. 𝑉 2 . 𝑓. ∝
 C: Capacitance ≈ Chip area

 V: Voltage

 f: Frequency

 ∝: Activity factor

Static power
57

Active Power example


Active power: A processor can work at different voltage & frequency
steps:

V: 0.9 .. 1.5v (0.1v step), f: 1.8 GHz .. 3GHz (0.2 GHz step)

f 1.8 2.0 2.2 … 3GHz


v 0.9 1.0 1.1 … 1.5v
Suppose the power consumption measured equals 30w at 1.0v, 2GHz.

What is p for most power-efficient settings

What is p for highest-performance settings


58

Fabrication cost
Fabrication:
Silicon ingot → blank wafer → pattern wafer → test wafer → test dices →
bond die to package → test packages

Fabrication yield:
𝑤𝑜𝑟𝑘𝑖𝑛𝑔 𝑐ℎ𝑖𝑝𝑠
𝐶ℎ𝑖𝑝𝑠 𝑜𝑛 𝑤𝑎𝑓𝑒𝑟𝑠
Example:
small vs large vs huge chips per wafer
59

Benchmark suites
 A benchmark suite consists of a set of programs that represent the
characteristics of programs that run into a particular system. After
running benchmark suites, devices are given scores based on the
time taken to execute them.

 The best-known benchmark suites is the SPEC suite, produced by


Standard Performance Evaluation Corporation
60

Clock speed and Cycle per Instruction


THE SYSTEM CLOCK

 Governs operations performed by a processor, such as fetching an


instruction, decoding the instruction, performing an arithmetic
operation and so on.

 The execution of an instruction involves fetching the instruction from


memory, decoding the various portions of the instruction, loading/
storing data, and performing arithmetic and logical operations. Most
instructions on most processors require multiple clock cycles to
complete. The cycle per instruction (CPI) is the number of clock
cycles that the processor must take to complete an instruction.
61

Iron Law of Performance

 CPU time (Execution time)= (number of instructions executed) *


(cycles per instruction (CPI)) * (clock cycle time)
62

Iron Law Quiz 1


 CPU time (Execution time)= (number of instructions executed) *
(cycles per instruction (CPI)) * (clock cycle time)

 A program executes 3 billion instructions in a processor, the processor


spends 2 cycles on each instruction and is working at 3GHz. What is
the execution time of this program?
63

Iron Law Quiz 2


 CPU time (Execution time)= (number of instructions executed) *
(cycles per instruction (CPI)) * (clock cycle time)
 A program contains 50 billion instruction whose composition is:
 10 billion branch instructions, CPI=4
 15 billion load instructions, CPI=2
 5 billion store instructions, CPI=3
 20 billion integer-type instructions, CPI=1
 Evaluate the execution time for above program if the system is
clocked at 4 Ghz
64

Iron Law Quiz 3


 Our favorite program runs in 10 seconds on computer A, which has a
2 GHz clock. We are trying to help a computer designer build a
computer, B, which will run this program in 6 seconds. The designer
has determined that a substantial increase in the clock rate is
possible, but this increase will affect the rest of the CPU design,
causing computer B to require 1.2 times as many clock cycles as
computer A for this program. What clock rate should we tell the
designer to target.
65

Iron Law Quiz 4


 Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250 ps and a CPI
of 2.0 for some program, and computer B has a clock cycle time of
500 ps and a CPI of 1.2 for the same program. Which computer is
faster for this program and by how much?
66

Iron Law Quiz 5


 A given application written in java runs 15 seconds on a desktop
processor. A new compiler is released that requires only 0.6 as many
instructions as the the old compiler. Unfortunately, it increases the CPI
by 1.1. How fast can we expect the application to run using this new
compiler?
68

Performance Factors and System Attributes


INSTRUCTION EXECUTION RATE

 The processor time T needed to execute a given program:

or

Where:

 p: the number of processor cycles needed to decode and execute the


instruction,

 m: the number of memory references needed,

 k: is the ratio between memory cycle time and processor cycle time
69

Performance Factors and System Attributes


The five performance factors (𝑰𝒄 , p, m, k, 𝜏) are influenced by four
system attributes:
• The design of the instruction set (instruction set architecture),
• The compiler technology
• The Processor implementation and
• The Cache and memory hierarchy.
70

MIPS rate and MFLOPS rate


 A common measure of performance for a processor is the rate at
which instructions are executed, expressed as millions of instructions
per second (MIPS) referred to as the MIPS rate

 Common performance measure deals only with floating-point is


expressed as millions of floating-point operations per second (MFLOPS)
71

Example 1
72

Solution
73

Example
 Consider the execution of a program that consists of 2 million
instructions on a 400-MHz processor. The program consists of four major
types of instructions. The instruction mix and the CPI for each
instruction type are given below based on the result of a program
trace experiment:
74

Example 2
 Assuming the following data, which code sequence will be faster?

Instruction type CPI


A 1
B 2
C 3

Code Instruction count for a type


sequence A B C
1 2 1 2
2 4 1 1
75

Amdahl’s law
 Amdahl’s law (Gene Amdahl) deals with the potential speedup
of a program using multiple processors compared to a single
processor.

 Let T be the total execution time of the program using a single


processor. Then the speedup using a parallel processor with N
processors that fully exploits the parallel portion of the program
is as follows
76

Amdahl’s law
77

Amdahl’s law implications


 Consider the following 2 enhancements:

1. Speed up of 20 on 10% of time vs.

2. Speed up of 1.6 on 80% of time


78

Amdahl’s law Quiz


 Consider the following processor Possible improvements:
which is clocked at 2GHz ❑ Branch CPI change: 4 -> 3
❑ Increase clock frequency:
Instr type % of time CPI
2 → 2.3 GHZ
Add integer 40% 1
Branch 20% 4 ❑ Store CPI 3 → 2
Load 30% 2 Which is best?
Store 10% 3
79

Amdahl’s law

You might also like