Unit 7 - Basic Processing
Unit 7 - Basic Processing
Processing Unit
Overview
Instruction Set Processor (ISP)
Central Processing Unit (CPU)
A typical computing task consists of
a series
of steps specified by a sequence of
machine
instructions that constitute a program.
An instruction is executed by
carrying out a
sequence of more rudimentary
operations.
Some Fundamental
Concepts
Fundamental Concepts
Processor fetches one instruction at a time
and
perform the operation specified.
Instructions are fetched from successive
memory
locations until a branch or a jump instruction
is
encountered.
Processor keeps track of the address of the
memory
location containing the next instruction to be
fetched
using Program Counter (PC).
Instruction Register (IR)
Executing an Instruction
Fetch the contents of the memory location
pointed
to by the PC. The contents of this location
are
loaded into the IR (fetch phase).
IR ← [[PC]]
Assuming that the memory is byte
addressable,
increment the contents of the PC by 4 (fetch
phase).
PC ← [PC] + 4
Carry out the actions specified by the
instruction in
the IR (execution phase).
Processor Organization
Executing an Instruction
Transfer a word of data from one
processor
register to another or to the ALU.
Perform an arithmetic or a logic
operation
and store the result in a processor
register.
Fetch the contents of a given
memory
location and load them into a
processor
register.
Store a word of data from a
processor
register into a given memory location.
Register Transfers
Register Transfers
Alloperations and data transfers are controlled by the
processor clock
Performing an Arithmetic
or
Logic Operation
The ALU is a combinational circuit that
has no
internal storage.
ALU gets the two operands from MUX
and bus.
The result is temporarily stored in register Z.
What is the sequence of operations to add
the
contents of register R1 to those of R2 and
store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Fetching a Word from
Memory
Address into MAR; issue Read operation; data into
MDR
Fetching a Word from
Memory
The response time of each memory access
varies
(cache miss, memory-mapped I/O,…).
To accommodate this, the processor waits
until it
receives an indication that the requested
operation
has been completed (Memory-Function-
Completed,
MFC).
Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 ← [MDR]
Timing
Execution of a Complete
Instruction
Add (R3), R1
Fetch the instruction
Fetch the first operand (the contents
of the
memory location pointed to by R3)
Perform the addition
Load the result into R1
Architecture
Execution of a Complete
Instruction
Execution of Branch
Instructions
A branch instruction replaces the
contents of
PC with the branch target address,
which is
usually obtained by adding an offset X
given
in the branch instruction.
The offset X is usually the difference
between
the branch target address and the
address
immediately following the branch
instruction.
Conditional branch
Execution of Branch
Instructions
Multiple-Bus Organization
Multiple-Bus Organization
Add R4, R5, R6
Quiz
What is the control
sequence for
execution of the
instruction
Add R1, R2
including the
instruction fetch
phase? (Assume
single bus
architecture)
Hardwired Control
Overview
To execute instructions, the
processor must
have some means of generating the
control
signals needed in the proper sequence.
Two categories: hardwired control
and
microprogrammed control
Hardwired system can operate at
high speed;
but with little flexibility.
Control Unit Organization
Detailed Block Description
Generating Zin
Zin = T1 + T6 • ADD + T4 • BR + …
Generating End
A Complete Processor
Microprogrammed
Control
Overview
Control signals are generated by a program similar to
machine
language programs.
Control Word (CW); microroutine; microinstruction
Overview
Overview
Overview
The previous organization cannot handle the situation when
the control
unit is required to check the status of the condition codes or
external
inputs to choose between alternative courses of action.
Use conditional branch microinstruction.
Overview
Microinstructions
A straightforward way to structure
microinstructions is to assign one bit
position
to each control signal.
However, this is very inefficient.
The length can be reduced: most
signals are
not needed simultaneously, and many
signals are mutually exclusive.
All mutually exclusive signals are
placed in
the same group in binary coding.
Microprogram Sequencing
Ifall microprograms require only
straightforward
sequential execution of microinstructions
except for
branches, letting a μPC governs the
sequencing
would be efficient.
However, two disadvantages:
Having a separate microroutine for each machine
instruction results
in a large total number of microinstructions and a large
control store.
Longer execution time because it takes more time to
carry out the
required branches.
Example: Add src, Rdst
Four addressing modes: register,
autoincrement,
autodecrement, and indexed (with indirect
forms).
Microinstructions with
Next
Address Field
The microprogram we discussed requires
several
branch microinstructions, which perform no
useful
operation in the datapath.
A powerful alternative approach is to
include an
address field as a part of every
microinstruction to
indicate the location of the next
microinstruction to
be fetched.
Pros: separate branch microinstructions are
virtually
eliminated; few limitations in assigning
addresses to
microinstructions.
Cons: additional bits for the address field
(around
1/6)
Microinstructions with
NextAddress Field
Implementation of the
Microroutine
bit-ORing
Prefetching
One drawback of Micro Programmed
control
is that it leads to slower operating
speed
because of the time it takes to fetch
microinstructions from control store
Faster operation is achieved if the
next
microinstruction is prefetched while
the
current one is executing
In this way execution time is
overlapped with
fetch time
Prefetching –
Disadvantages
Sometimes the status flag & the
result of the
currently executed microinstructions
are
needed to know the next address
Thus there is a probability of wrong
instructions being prefetched
In this case fetch must be repeated
with the
correct address
Emulation
Emulation allows us to replace
obsolete
equipment with more up-to-date
machines
It facilitates transitions to new
computer
systems with minimal disruption
It is the easiest way when machines
with
similar architecture are involved.
Data Hazards
Data Hazards
We must ensure that the results obtained when
instructions are
executed in a pipelined processor are identical to those
obtained
when the same instructions are executed sequentially.
Hazard occurs
A←3+A
B←4×A
No hazard
A←5×C
B ← 20 + C
When two operations depend on each other, they
must be
executed sequentially in the correct order.
Another example:
Mul R2, R3, R4
Add R5, R4, R6
Data Hazards
Operand Forwarding
Instead of from the register file, the
second
instruction can get data directly from
the
output of ALU after the previous
instruction is
completed.
A special arrangement needs to be
made to
“forward” the output of ALU to the
input of
ALU.
Handling Data Hazards in
Software
Let the compiler detect and handle
the
hazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
The compiler can reorder the
instructions to
perform some useful work during the
NOP
slots.
Side Effects
The previous example is explicit and easily detected.
Sometimes an instruction changes the contents of a
register
other than the one named as the destination.
When a location other than one explicitly named in an
instruction
as a destination operand is affected, the instruction is
said to
have a side effect. (Example?)
Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4
Instructions designed for execution on pipelined
hardware should
have few side effects.
Instruction Hazards
Overview
Whenever the stream of instructions
supplied
by the instruction fetch unit is
interrupted, the
pipeline stalls.
Cache miss
Branch
Unconditional Branches
Branch Timing
- Branch penalty
- Reducing the penalty
Delayed Branch
Branch Prediction
To predict whether or not a particular branch will be
taken.
Simplest form: assume branch will not take place and
continue to
fetch instructions in sequential address order.
Until the branch is evaluated, instruction execution
along the
predicted path must be done on a speculative basis.
Speculative execution: instructions are executed
before the
processor is certain that they are in the correct
execution
sequence.
Need to be careful so that no processor registers or
memory
locations are updated until it is confirmed that these
instructions
should indeed be executed.
Incorrectly Predicted
Branch
Branch Prediction
Better performance can be achieved if we
arrange
for some branch instructions to be predicted
as
taken and others as not taken.
Use hardware to observe whether the target
address is lower or higher than that of the
branch
instruction.
Let compiler include a branch prediction
bit.
So far the branch prediction decision is
always the
same every time a given instruction is
executed –
static branch prediction.
Influence on
Instruction Sets
Overview
Some instructions are much better
suited to
pipeline execution than others.
Addressing modes
Conditional code flags
Addressing Modes
Addressing modes include simple
ones and
complex ones.
In choosing the addressing modes to
be
implemented in a pipelined processor,
we
must consider the effect of each
addressing
mode on instruction flow in the
pipeline:
Side effects
The extent to which complex addressing
modes cause
the pipeline to stall
Whether a given mode is likely to be used by
compilers
Recall
Load X(R1), R2
Complex Addressing Mode
Load (X(R1)), R2
Simple Addressing Mode
Add #X, R1, R2
Load (R2), R2
Load (R2), R2
Addressing Modes
In a pipelined processor, complex
addressing
modes do not necessarily lead to faster
execution.
Advantage: reducing the number of
instructions /
program space
Disadvantage: cause pipeline to stall / more
hardware to decode / not convenient for
compiler to
work with
Conclusion: complex addressing modes are
not
suitable for pipelined execution.
Addressing Modes
Good addressing modes should have:
Access to an operand does not require more
than one
access to the memory
Only load and store instruction access memory
operands
The addressing modes used do not have side
effects
Register, register indirect, index
Conditional Codes
Ifan optimizing compiler attempts to
reorder
instruction to avoid stalling the
pipeline when
branches or data dependencies
between
successive instructions occur, it must
ensure
that reordering does not cause a
change in
the outcome of a computation.
The dependency introduced by the
conditioncode flags reduces the
flexibility available for
the compiler to reorder instructions.
Conditional Codes
Conditional Codes
Two conclusion:
To provide flexibility in reordering
instructions, the
condition-code flags should be affected by as
few
instruction as possible.
The compiler should be able to specify in
which
instructions of a program the condition codes
are
affected and in which they are not.
Superscalar Operation
Overview
The maximum throughput of a pipelined
processor
is one instruction per clock cycle.
If we equip the processor with multiple
processing
units to handle several instructions in
parallel in
each processing stage, several instructions
start
execution in the same clock cycle –
multiple-issue.
Processors are capable of achieving an
instruction
execution throughput of more than one
instruction
per cycle – superscalar processors.
Multiple-issue requires a wider path to the
cache
and multiple execution units.
Superscalar
Timing
Out-of-Order Execution
Hazards
Exceptions
Imprecise exceptions
Precise exceptions
Execution Completion
It is desirable to used out-of-order execution, so that
an
execution unit is freed to execute other instructions as
soon as
possible.
At the same time, instructions must be completed in
program
order to allow precise exceptions.
The use of temporary registers
Commitment unit