0% found this document useful (0 votes)
43 views

Structure of Computer Systems

The document discusses various microprocessor implementations including early microprocessors like the Intel 4004 and 8080, and describes the components and architecture of microprocessors like the Intel 8086, 8088, 80286 and beyond. It provides timelines of processor developments and details the typical components, signals, and instruction execution process of microprocessors.

Uploaded by

Pop Ruxi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Structure of Computer Systems

The document discusses various microprocessor implementations including early microprocessors like the Intel 4004 and 8080, and describes the components and architecture of microprocessors like the Intel 8086, 8088, 80286 and beyond. It provides timelines of processor developments and details the typical components, signals, and instruction execution process of microprocessors.

Uploaded by

Pop Ruxi
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Structure of Computer

Systems
Course 7 – examples of CPU
implementations - Microprocessors

1
Microprocessors
 Definition 1:
 It is a VLSI circuit that integrates a central
processing unit (CPU)
 Definition 2:
 An integrated circuit that integrates:
• one or more central processing units (CPUs)
 Symmetric multiprocessor architecture
 Asymmetric multiprocessor architecture
• Cache memory
• Other components:
 Interrupt controller,
 Bus management unit,
 Memory Management unit (MMU)
2
Microprocessors -
 First microprocessor:
 Intel Company, I4004 – 4 bits organization
 First successful microprocessor:
 Intel I8080 – 8 bits processor
 First 16 bits processor
 Intel I8086 –
 First 32 bit processor
 Intel I80386
 Superscalar microprocessor architecture
 Pentium Pro
 64 bits processors, multi-core
architectures
 Pentium IV, dual core, Core Duo
3
Year Processor structure Memory Main characteristics
space
1971 I4004 4 biti first μP
1972 I8008 8 biti 16ko First μP on 8 bits
1974 8080 8 biti 64ko First successful μP
1978 8086, 8088 16 biti 1Mo First μP on 16 bits, bases for the first PC
1982 80286 16 biti 16Mo PC-AT
1985 80386 32 biti 4Go First μP on 32 bits
1989 80486 32 biti 4 Go Incorporated FPU
1993 Pentium 32 biti 4Go pipeline
1995 P. Pro 32 biti 64 Go P6 super-pipeline architecture
1997 P. II 32 biti 64 Go MMX technology
1999 P. III 32 biti 70 To SSE2 technology
2002 P. IV 32 biti 70 To NetBurst architecture
2004 P. IV 64 biti 70 To Hyper-threading technology
2006 Core 2 64 biti 70 To Multicore architecture (2 cores/chip)
2007 Dual Core 64 biti 70 To 2 processors/chip
2008-9 I5, I7 64 biti 70 To, Nehalem architecture, multicore and hyper-
threading 4cores/8 multithread cache 8Mo (L3)
2011 Sandy Bridge
4
Components of a
microprocessor
 Traditional components:
 Control Unit (CU)
 Arithmetical and Logical Unit (ALU)
 General and special Registers (GR, SR)
 Supplementary components:
 Cache memories (Cache)
• high speed low capacity memories
• hierarchical organization on 2-3 levels
 Mathematical co-processor (CoP)
• for floating point arithmetic
 Memory Management Unit (MMU)
• controls the traffic (instructions and data) between the main
memory and the cache memory
 Interrupt controller
• handles internal and external events
• synchronize the processor with I/O interfaces

5
Signals of a microprocessor –
the System Bus
Memory Memory

μP Address
Data
Commands

I/O interface I/O interface

I/O dev. I/O dev.

6
Structure of a PC
(a more realistic view)
μP

SVGA Mem Mem


AGP
Chipset
N

Net
PCI
Chipset
S

Keyboard Mouse

7
Typical signals for a
microprocessor

Address Interrupt
signals signals
Micro- Bus arbitration
processor
Data signals
signals
Clock signal(s)
Command Other signals (e.g. status, control)
signals Power supply signals

8
Typical signals for a
microprocessor
 Address signals: A0-An
 Used for specifying memory locations or I/O ports (registers)
 Generated by the microprocessor to other components in order to
address them (read or write operations)
 The number of address lines determine the maximum addressing
space of a microprocessor
• Ex: 20 lines=> 1MB
• 32 lines =>4GB
 Data signals: D0-Dm
 Bidirectional lines used to transfer instruction codes and data between
the microprocessor and the other components of the system
 The number of data lines is usually in accordance with the internal
organization of the processor (there are also exceptions, see 8088,
Pentium Pro)
 The number of data lines determine the maximum width of a data
transferred on a bus
• Ex: 8, 16, 32, 64 lines

9
Typical signals for a
microprocessor
 Command and control signals
 Command signals:
• MRDC\, MWTC\, IORC\, IOW\, INTA\
• determine memory and interface read and write cycles
• very important signals,
• similar signals for any microprocessor
 Control signals: ALE (Address Latch Enable), DEN (Data
enable)
• help controlling the address and data amplifiers
• specific for every microprocessor
 Interrupt signals: INTR, NMI
 Clock signals: CLK, PCLK
 Power supply signals: GND +5V, 3,3V
10
Instructions execution
 Steps:
 Instruction fetch
 Operands read
 Operation execution
 Write the result
 Seen from outside:
 Instruction fetch cycle – read from the memory - mandatory
 Operand(s) read - optional
 Write the result - optional
 Transfer cycle (on the bus)
 a transfer on the bus that involve:
• Processor and memory or
• Processor and an I/O interface
 A cycle has a fixed number of clock periods (determined by the
microprocessors architecture)
• it may be extended on request with an integer number of clock periods, if a
slow module is addressed (e.g. EPROM memory)
 A cycle is a sequence of signal activations on the bus (address, data
and command)
• a cycle is described by a time diagram
11
Time diagrams for transfers on a
classical bus
Read Memory Cycle
A0-An valid address

MRDC
MWTC
D0-Dm valid data
taccess tcycle

Write Memory Cycle

A0-An valid address

MRDC
MWTC
D0-Dm valid data
taccess tcycle
12
Processors of the Intel x86
family
 I8086 and I8088
EU BIU

AH AL AX
BH BL BX
CH CL CX CS
DH DL DX DS
SI ES
DI SS
BP IP
SP IR

Ext.
Bus
Temp.Reg Ctrl.

Control
ALU Unit 1,2,3,4, ..
Instruction queue
State reg.

Internal structure of the I8086 and I8088


13
I8086, I8088
 I8086
 16 bits processor with 16 data lines, 20 address lines (1MB addressing space)
 40 pins integrated circuit
 Supporting circuits:
• 8087 – mathematic co-processor (floating point)
• 8288 – bus controller
• 88289 – bus arbiter
 Structure:
• EU –Execution Unit – dedicated for instruction execution
 CU, ALU, general registers, state register
• BIU – Basic Interface Unit – a unit responsible for the operations (transfer
cycles) with the external bus
 transfers instructions (in advance) and data
 contains:
• Special registers (segment registers, IP)
• Instruction queue, bus amplifiers
 8088
 identical with 8086 but with 8 data signals on the external bus
14
I80286
 16 bits processor
 16 data lines, 24 address lines (16MB addressing
space)
 Working modes: real and protected (privileged)

Addressing unit Interfacing unit


Data ampl. External
Address ampl. Bus
Bus control

Execution unit
Instruction unit
Instr. Instr.
queue decode

15
Internal structure of the I80286 processor
I80386
 32 bits processor, 32 data lines, 32 address lines (4GB addressing
space)
 General registers extended to 32 bits
 2 extra segment registers (FS and GS)
 Protected mode improved

Segmenting Paging
unit unit

Execution Interface
unit unit

Decoding Instr. prefetch


unit unit

Internal structure of the I80386 processor 16


I80486
 Integrates: processor + co-processor + MMU
 Enables the use of cache memory
 Protected mode improved

Segmenting Paging
unit unit
Integer
exec. unit Cache Bus
Unit interf.
Float unit
exec. unit
Instr. Instr.
Decoder prefetch u.

Internal structure of the I80486


17
Pentium
 Two pipelines: U (integers) and V (floats)
 64 bits external bus (for a 32 bits processor)
 Versions:
 Pentium –2 pipeline architecture
 Pentium Pro
 Pentium II - superscalara P6 architecture
 Pentium III
 Pentium IV – NetBurst architecture
 I7, I5, I3 - multicore and hyperthreading

18
Pentium Processors
 Pentium Pro
 Superscalar P6 architecture (CPI<1)
 Dynamic instruction execution:
• Data flow analysis
• Branch prediction
• Speculative execution of instructions
 Pentium II
 MMX technology:
• a SIMD execution unit dedicated for multimedia data
• Parallel (SIMD) execution of arithmetic operations
• 57 new MMX instructions
 Pentium III
 SSE2 technology
• Parallel execution (SIMD) on floating point variables
• good for 2D/3D graphics

19
P6 superscalar architecture
 3 autonomous units, 12 pipeline stages
 Speculative execution

Instruction Instruction R e tire m e n t


fetch and dispatch and u n it
decode unit execute unit

Instruction pool

Functional blocks of the P6 architecture 20


Detailed view of the P6 architecture

System bus L2 Cache

Bus interface unit (BIU)

L1 ICache L1 DCache

Instruction Instruction Retirement


fetch and dispatch and unit
decode unit execute unit

In s tr u c tio n P o o l

21
Instruction fetch and decoding unit
From BIU (Basic Interface Unit)
 Fetch and decode
instructions in advance L1 ICache Next_IP

 In-order unit Branch


Instruction target
 3 instructions Decoder buffer
(x3)
decoded /clock Micro-operations
sequencer
 Branch prediction
To the instruction
 Components: Alias reg. pool
allocator
 Decoder (3 units)
 Address generator unit Instruction fetch and decoding unit
(next_IP)
 Branch target buffer
 Micro-operation sequencer
 Alias registers allocator
22
Instruction dispatch and execute
unit
 Responsible for instruction
execution
 Out-of-order unit
Reservation
 7 execution units + reservation station MMX
FEU
station Port 0 IEU
 IEU – Integer Execution Unit Instruction MMX
pool JEU
 FEU – Floating-point Execution Port 1 IEU
Unit
 MMX – Multimedia execution unit Port 2 AGU read

 AGU – Address generation unit


Port 3,4 AGU write
 JGU – Jump generation unit

Instruction dispatch and execute

23
Retirement Unit

Reestablish the
DCache

normal order of the Reservation
station
UIM

instructions (of results)


 In-order unit RRF

 Components: Instruction pool

 MIU – memory Retirement unit

interface unit
 RRF – Retirement
register file

24
Solving hazard cases in the P6
architecture
 Control hazard:
 complex branch prediction, BTB, next address predictor
 out-of-order instruction execution
 execute both branches of an if
 Data hazard:
 alias registers: renaming of registers and more internal registers (40)
than those seen by the programmer
 out-of-order instruction execution
 data dependency tree
 Structural hazard
 multiple execution units (7 ALUs)
 separate instruction and data cache
 reservation stations
 In essence it is an implementation of Tomasulo’s method
25
The P6 Bus
 The main elements of the P6 bus:
 the bus works in a synchronous mode; every signal
is considered on clock signal edges
 transfers are made through transactions that may
be executed in parallel
 it is a multi-processor bus; more processors on the
same bus
 block transfers are preferred
 there are error detection and correction mechanisms
 there are mechanisms that assure cache memory
consistency
 a new digital technology (different amplifiers) that
assure high frequency transmissions on bus
26
Transfer on the P6 bus
 Parallel transactions (pipeline)
 Phases:
 Arbitration – decides which master has access on the bus
 Transfer request – specifies the request (read or write, start
address, number of bytes)
 Snooping – detect and solve cache inconsistencies
 Error – detect and solve transmission errors (ECC – error
correction code on data and parity on address and command
signals)
 Response – specifies the type of the answer (now, delayed,
refused)
 Transfer – data transfer in accordance with the request
 Technology: GTL (instead of TTL)

27
Time diagram for the P6 bus
1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1
0 1 2 3 4 5 6
BCLK

Arbitrare

Cerere

Eroare

Spionare

Răspuns

Transfer

Concurrent transactions on the P6 bus

28
Pentium IV –
NetBurst Architecture (7th generation)
 a 20 stage pipeline architecture
 double compared with P6
 bus frequency is increased 4 times
 400MHz, with "quad pump“ technology,
 3.2Gbytes/s transfer speed
 doubles the speed of the ALU,
 2 arithmetical operations are executed in every clock period;
 the ALU works with a double frequency clock
 the use of very high speed cache memory
 Advanced Transfer Cache, that assures at 2GHz 64Gbytes/s data transfer
 extension of the MMX technology
 the SSE – Streaming SIMD Extension
 144 new SIMD instructions that extend the data width to 128 bits (16 bytes
processed in parallel)
 improvement of branch prediction with aprox. 30%
 through the extension of the BTB unit and
 increasing the instruction queue to 126 instructions
29
Pentium IV
Interface with the external
L2 Cache and control
bus
BTB

Decoder
Instruction fetch
and decode
Trace cache ROM

Alias reg alocator


Instr. queues for
microoperations

Schedulers
Instruction
scheduling and Reg. for „floats” Registers for „integers”
execution
ALU-F ALU-F ALU ALU ALU ALU AGU AGU

L1 D-Cache
The NetBurst Pentium IV architecture
30
Pentium IV
 New tendencies:
 Hyper-threading technology
• two threads executed in parallel on the same core
 Multi-core technology
• more processors on the same chip
 64 bits architecture

31
I7, I5, I3
Nehalem architecture - internal view

32
Nehalem architecture
external view

33
Nehalem architecture
multiprocessor configuration

Communication on FSB – Front Communication on QPI – QuickPath


side bus Interconnect 34
Sandy bridge architecture
 The north bridge (memory controller, graphics controller and PCI
Express controller) is integrated in the same chip as the rest of the CPU.
First models will use a 32-nm manufacturing process
 Ring architecture - 256-bit/cycle
 Two load/store operations per CPU cycle for each memory channel
 New decoded microinstructions cache (L0 cache, capable of storing
1,536 microinstructions, which translates in more or less to 6 kB)
 32 kB L1 instruction and 32 kB L1 data cache per CPU core (no change
from Nehalem)
 L2 memory cache was renamed to “mid-level cache” (MLC) with 256 kB
per CPU core
 L3 memory cache is now called LLC (Last Level Cache), it is not unified
anymore, and is shared by the CPU cores and the graphics engine
 Next generation Turbo Boost technology
 New AVX (Advanced Vector Extensions) instruction set
 Up to 8 physical cores or 16 logical cores through Hyper-threading

35
Sandy bridge architecture
1 processor
4 cores

2 processor
8 cores/processor

36
Evolution of Intel processor
architectures

37

You might also like