Structure of Computer Systems
Structure of Computer Systems
Systems
Course 7 – examples of CPU
implementations - Microprocessors
1
Microprocessors
Definition 1:
It is a VLSI circuit that integrates a central
processing unit (CPU)
Definition 2:
An integrated circuit that integrates:
• one or more central processing units (CPUs)
Symmetric multiprocessor architecture
Asymmetric multiprocessor architecture
• Cache memory
• Other components:
Interrupt controller,
Bus management unit,
Memory Management unit (MMU)
2
Microprocessors -
First microprocessor:
Intel Company, I4004 – 4 bits organization
First successful microprocessor:
Intel I8080 – 8 bits processor
First 16 bits processor
Intel I8086 –
First 32 bit processor
Intel I80386
Superscalar microprocessor architecture
Pentium Pro
64 bits processors, multi-core
architectures
Pentium IV, dual core, Core Duo
3
Year Processor structure Memory Main characteristics
space
1971 I4004 4 biti first μP
1972 I8008 8 biti 16ko First μP on 8 bits
1974 8080 8 biti 64ko First successful μP
1978 8086, 8088 16 biti 1Mo First μP on 16 bits, bases for the first PC
1982 80286 16 biti 16Mo PC-AT
1985 80386 32 biti 4Go First μP on 32 bits
1989 80486 32 biti 4 Go Incorporated FPU
1993 Pentium 32 biti 4Go pipeline
1995 P. Pro 32 biti 64 Go P6 super-pipeline architecture
1997 P. II 32 biti 64 Go MMX technology
1999 P. III 32 biti 70 To SSE2 technology
2002 P. IV 32 biti 70 To NetBurst architecture
2004 P. IV 64 biti 70 To Hyper-threading technology
2006 Core 2 64 biti 70 To Multicore architecture (2 cores/chip)
2007 Dual Core 64 biti 70 To 2 processors/chip
2008-9 I5, I7 64 biti 70 To, Nehalem architecture, multicore and hyper-
threading 4cores/8 multithread cache 8Mo (L3)
2011 Sandy Bridge
4
Components of a
microprocessor
Traditional components:
Control Unit (CU)
Arithmetical and Logical Unit (ALU)
General and special Registers (GR, SR)
Supplementary components:
Cache memories (Cache)
• high speed low capacity memories
• hierarchical organization on 2-3 levels
Mathematical co-processor (CoP)
• for floating point arithmetic
Memory Management Unit (MMU)
• controls the traffic (instructions and data) between the main
memory and the cache memory
Interrupt controller
• handles internal and external events
• synchronize the processor with I/O interfaces
5
Signals of a microprocessor –
the System Bus
Memory Memory
μP Address
Data
Commands
6
Structure of a PC
(a more realistic view)
μP
Net
PCI
Chipset
S
Keyboard Mouse
7
Typical signals for a
microprocessor
Address Interrupt
signals signals
Micro- Bus arbitration
processor
Data signals
signals
Clock signal(s)
Command Other signals (e.g. status, control)
signals Power supply signals
8
Typical signals for a
microprocessor
Address signals: A0-An
Used for specifying memory locations or I/O ports (registers)
Generated by the microprocessor to other components in order to
address them (read or write operations)
The number of address lines determine the maximum addressing
space of a microprocessor
• Ex: 20 lines=> 1MB
• 32 lines =>4GB
Data signals: D0-Dm
Bidirectional lines used to transfer instruction codes and data between
the microprocessor and the other components of the system
The number of data lines is usually in accordance with the internal
organization of the processor (there are also exceptions, see 8088,
Pentium Pro)
The number of data lines determine the maximum width of a data
transferred on a bus
• Ex: 8, 16, 32, 64 lines
9
Typical signals for a
microprocessor
Command and control signals
Command signals:
• MRDC\, MWTC\, IORC\, IOW\, INTA\
• determine memory and interface read and write cycles
• very important signals,
• similar signals for any microprocessor
Control signals: ALE (Address Latch Enable), DEN (Data
enable)
• help controlling the address and data amplifiers
• specific for every microprocessor
Interrupt signals: INTR, NMI
Clock signals: CLK, PCLK
Power supply signals: GND +5V, 3,3V
10
Instructions execution
Steps:
Instruction fetch
Operands read
Operation execution
Write the result
Seen from outside:
Instruction fetch cycle – read from the memory - mandatory
Operand(s) read - optional
Write the result - optional
Transfer cycle (on the bus)
a transfer on the bus that involve:
• Processor and memory or
• Processor and an I/O interface
A cycle has a fixed number of clock periods (determined by the
microprocessors architecture)
• it may be extended on request with an integer number of clock periods, if a
slow module is addressed (e.g. EPROM memory)
A cycle is a sequence of signal activations on the bus (address, data
and command)
• a cycle is described by a time diagram
11
Time diagrams for transfers on a
classical bus
Read Memory Cycle
A0-An valid address
MRDC
MWTC
D0-Dm valid data
taccess tcycle
MRDC
MWTC
D0-Dm valid data
taccess tcycle
12
Processors of the Intel x86
family
I8086 and I8088
EU BIU
AH AL AX
BH BL BX
CH CL CX CS
DH DL DX DS
SI ES
DI SS
BP IP
SP IR
Ext.
Bus
Temp.Reg Ctrl.
Control
ALU Unit 1,2,3,4, ..
Instruction queue
State reg.
Execution unit
Instruction unit
Instr. Instr.
queue decode
15
Internal structure of the I80286 processor
I80386
32 bits processor, 32 data lines, 32 address lines (4GB addressing
space)
General registers extended to 32 bits
2 extra segment registers (FS and GS)
Protected mode improved
Segmenting Paging
unit unit
Execution Interface
unit unit
Segmenting Paging
unit unit
Integer
exec. unit Cache Bus
Unit interf.
Float unit
exec. unit
Instr. Instr.
Decoder prefetch u.
18
Pentium Processors
Pentium Pro
Superscalar P6 architecture (CPI<1)
Dynamic instruction execution:
• Data flow analysis
• Branch prediction
• Speculative execution of instructions
Pentium II
MMX technology:
• a SIMD execution unit dedicated for multimedia data
• Parallel (SIMD) execution of arithmetic operations
• 57 new MMX instructions
Pentium III
SSE2 technology
• Parallel execution (SIMD) on floating point variables
• good for 2D/3D graphics
19
P6 superscalar architecture
3 autonomous units, 12 pipeline stages
Speculative execution
Instruction pool
L1 ICache L1 DCache
In s tr u c tio n P o o l
21
Instruction fetch and decoding unit
From BIU (Basic Interface Unit)
Fetch and decode
instructions in advance L1 ICache Next_IP
23
Retirement Unit
Reestablish the
DCache
normal order of the Reservation
station
UIM
interface unit
RRF – Retirement
register file
24
Solving hazard cases in the P6
architecture
Control hazard:
complex branch prediction, BTB, next address predictor
out-of-order instruction execution
execute both branches of an if
Data hazard:
alias registers: renaming of registers and more internal registers (40)
than those seen by the programmer
out-of-order instruction execution
data dependency tree
Structural hazard
multiple execution units (7 ALUs)
separate instruction and data cache
reservation stations
In essence it is an implementation of Tomasulo’s method
25
The P6 Bus
The main elements of the P6 bus:
the bus works in a synchronous mode; every signal
is considered on clock signal edges
transfers are made through transactions that may
be executed in parallel
it is a multi-processor bus; more processors on the
same bus
block transfers are preferred
there are error detection and correction mechanisms
there are mechanisms that assure cache memory
consistency
a new digital technology (different amplifiers) that
assure high frequency transmissions on bus
26
Transfer on the P6 bus
Parallel transactions (pipeline)
Phases:
Arbitration – decides which master has access on the bus
Transfer request – specifies the request (read or write, start
address, number of bytes)
Snooping – detect and solve cache inconsistencies
Error – detect and solve transmission errors (ECC – error
correction code on data and parity on address and command
signals)
Response – specifies the type of the answer (now, delayed,
refused)
Transfer – data transfer in accordance with the request
Technology: GTL (instead of TTL)
27
Time diagram for the P6 bus
1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1
0 1 2 3 4 5 6
BCLK
Arbitrare
Cerere
Eroare
Spionare
Răspuns
Transfer
28
Pentium IV –
NetBurst Architecture (7th generation)
a 20 stage pipeline architecture
double compared with P6
bus frequency is increased 4 times
400MHz, with "quad pump“ technology,
3.2Gbytes/s transfer speed
doubles the speed of the ALU,
2 arithmetical operations are executed in every clock period;
the ALU works with a double frequency clock
the use of very high speed cache memory
Advanced Transfer Cache, that assures at 2GHz 64Gbytes/s data transfer
extension of the MMX technology
the SSE – Streaming SIMD Extension
144 new SIMD instructions that extend the data width to 128 bits (16 bytes
processed in parallel)
improvement of branch prediction with aprox. 30%
through the extension of the BTB unit and
increasing the instruction queue to 126 instructions
29
Pentium IV
Interface with the external
L2 Cache and control
bus
BTB
Decoder
Instruction fetch
and decode
Trace cache ROM
Schedulers
Instruction
scheduling and Reg. for „floats” Registers for „integers”
execution
ALU-F ALU-F ALU ALU ALU ALU AGU AGU
L1 D-Cache
The NetBurst Pentium IV architecture
30
Pentium IV
New tendencies:
Hyper-threading technology
• two threads executed in parallel on the same core
Multi-core technology
• more processors on the same chip
64 bits architecture
31
I7, I5, I3
Nehalem architecture - internal view
32
Nehalem architecture
external view
33
Nehalem architecture
multiprocessor configuration
35
Sandy bridge architecture
1 processor
4 cores
2 processor
8 cores/processor
36
Evolution of Intel processor
architectures
37