0% found this document useful (0 votes)
50 views

Unit 7 - Basic Processing

The document discusses the basic processing unit (BPU) and central processing unit (CPU). It covers fundamental concepts like how the processor fetches and executes instructions from memory. It describes the components involved in executing an instruction like the instruction register, program counter, arithmetic logic unit, and registers. It also covers techniques to handle hazards in pipelined processors like data hazards, instruction hazards from branches, and how branch prediction works.

Uploaded by

bobbyk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Unit 7 - Basic Processing

The document discusses the basic processing unit (BPU) and central processing unit (CPU). It covers fundamental concepts like how the processor fetches and executes instructions from memory. It describes the components involved in executing an instruction like the instruction register, program counter, arithmetic logic unit, and registers. It also covers techniques to handle hazards in pipelined processors like data hazards, instruction hazards from branches, and how branch prediction works.

Uploaded by

bobbyk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 85

Basic

Processing Unit
Overview
 Instruction Set Processor (ISP)
 Central Processing Unit (CPU)
 A typical computing task consists of
a series
of steps specified by a sequence of
machine
instructions that constitute a program.
 An instruction is executed by
carrying out a
sequence of more rudimentary
operations.
Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time
and
perform the operation specified.
 Instructions are fetched from successive
memory
locations until a branch or a jump instruction
is
encountered.
 Processor keeps track of the address of the
memory
location containing the next instruction to be
fetched
using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location
pointed
to by the PC. The contents of this location
are
loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte
addressable,
increment the contents of the PC by 4 (fetch
phase).
PC ← [PC] + 4
 Carry out the actions specified by the
instruction in
the IR (execution phase).
Processor Organization
Executing an Instruction
 Transfer a word of data from one
processor
register to another or to the ALU.
 Perform an arithmetic or a logic
operation
and store the result in a processor
register.
 Fetch the contents of a given
memory
location and load them into a
processor
register.
 Store a word of data from a
processor
register into a given memory location.
Register Transfers
Register Transfers
 Alloperations and data transfers are controlled by the
processor clock

Performing an Arithmetic
or
Logic Operation
 The ALU is a combinational circuit that
has no
internal storage.
 ALU gets the two operands from MUX
and bus.
The result is temporarily stored in register Z.
 What is the sequence of operations to add
the
contents of register R1 to those of R2 and
store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Fetching a Word from
Memory
 Address into MAR; issue Read operation; data into
MDR
Fetching a Word from
Memory
 The response time of each memory access
varies
(cache miss, memory-mapped I/O,…).
 To accommodate this, the processor waits
until it
receives an indication that the requested
operation
has been completed (Memory-Function-
Completed,
MFC).
 Move (R1), R2
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]

Timing
Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction
 Fetch the first operand (the contents
of the
memory location pointed to by R3)
 Perform the addition
 Load the result into R1
Architecture
Execution of a Complete
Instruction
Execution of Branch
Instructions
A branch instruction replaces the
contents of
PC with the branch target address,
which is
usually obtained by adding an offset X
given
in the branch instruction.
 The offset X is usually the difference
between
the branch target address and the
address
immediately following the branch
instruction.
 Conditional branch
Execution of Branch
Instructions
Multiple-Bus Organization

Multiple-Bus Organization
 Add R4, R5, R6
Quiz
 What is the control
sequence for
execution of the
instruction
Add R1, R2
including the
instruction fetch
phase? (Assume
single bus
architecture)
Hardwired Control

Overview
 To execute instructions, the
processor must
have some means of generating the
control
signals needed in the proper sequence.
 Two categories: hardwired control
and
microprogrammed control
 Hardwired system can operate at
high speed;
but with little flexibility.
Control Unit Organization
Detailed Block Description
Generating Zin

Zin = T1 + T6 • ADD + T4 • BR + …
Generating End
A Complete Processor
Microprogrammed
Control
Overview
 Control signals are generated by a program similar to
machine
language programs.
 Control Word (CW); microroutine; microinstruction

Overview
Overview

Overview
 The previous organization cannot handle the situation when
the control
unit is required to check the status of the condition codes or
external
inputs to choose between alternative courses of action.
 Use conditional branch microinstruction.

Overview
Microinstructions
A straightforward way to structure
microinstructions is to assign one bit
position
to each control signal.
 However, this is very inefficient.
 The length can be reduced: most
signals are
not needed simultaneously, and many
signals are mutually exclusive.
 All mutually exclusive signals are
placed in
the same group in binary coding.

Partial Format for the


Microinstructions
Further Improvement
 Enumerate the patterns of required
signals in
all possible microinstructions. Each
meaningful combination of active
control
signals can then be assigned a distinct
code.
 Vertical organization
 Horizontal organization

Microprogram Sequencing
 Ifall microprograms require only
straightforward
sequential execution of microinstructions
except for
branches, letting a μPC governs the
sequencing
would be efficient.
 However, two disadvantages:
 Having a separate microroutine for each machine
instruction results
in a large total number of microinstructions and a large
control store.
 Longer execution time because it takes more time to
carry out the
required branches.
 Example: Add src, Rdst
 Four addressing modes: register,
autoincrement,
autodecrement, and indexed (with indirect
forms).
Microinstructions with
Next
Address Field
 The microprogram we discussed requires
several
branch microinstructions, which perform no
useful
operation in the datapath.
 A powerful alternative approach is to
include an
address field as a part of every
microinstruction to
indicate the location of the next
microinstruction to
be fetched.
 Pros: separate branch microinstructions are
virtually
eliminated; few limitations in assigning
addresses to
microinstructions.
 Cons: additional bits for the address field
(around
1/6)
Microinstructions with
NextAddress Field
Implementation of the
Microroutine
bit-ORing
Prefetching
 One drawback of Micro Programmed
control
is that it leads to slower operating
speed
because of the time it takes to fetch
microinstructions from control store
 Faster operation is achieved if the
next
microinstruction is prefetched while
the
current one is executing
 In this way execution time is
overlapped with
fetch time
Prefetching –
Disadvantages
 Sometimes the status flag & the
result of the
currently executed microinstructions
are
needed to know the next address
 Thus there is a probability of wrong
instructions being prefetched
 In this case fetch must be repeated
with the
correct address

Emulation
 Emulation allows us to replace
obsolete
equipment with more up-to-date
machines
 It facilitates transitions to new
computer
systems with minimal disruption
 It is the easiest way when machines
with
similar architecture are involved.

Data Hazards
Data Hazards
 We must ensure that the results obtained when
instructions are
executed in a pipelined processor are identical to those
obtained
when the same instructions are executed sequentially.
 Hazard occurs
A←3+A
B←4×A
 No hazard
A←5×C
B ← 20 + C
 When two operations depend on each other, they
must be
executed sequentially in the correct order.
 Another example:
Mul R2, R3, R4
Add R5, R4, R6

Data Hazards
Operand Forwarding
 Instead of from the register file, the
second
instruction can get data directly from
the
output of ALU after the previous
instruction is
completed.
 A special arrangement needs to be
made to
“forward” the output of ALU to the
input of
ALU.
Handling Data Hazards in
Software
 Let the compiler detect and handle
the
hazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
 The compiler can reorder the
instructions to
perform some useful work during the
NOP
slots.
Side Effects
 The previous example is explicit and easily detected.
 Sometimes an instruction changes the contents of a
register
other than the one named as the destination.
 When a location other than one explicitly named in an
instruction
as a destination operand is affected, the instruction is
said to
have a side effect. (Example?)
 Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4
 Instructions designed for execution on pipelined
hardware should
have few side effects.

Instruction Hazards
Overview
 Whenever the stream of instructions
supplied
by the instruction fetch unit is
interrupted, the
pipeline stalls.
 Cache miss
 Branch

Unconditional Branches
Branch Timing
- Branch penalty
- Reducing the penalty

Instruction Queue and


Prefetching
Conditional Braches
A conditional branch instruction
introduces
the added hazard caused by the
dependency
of the branch condition on the result of
a
preceding instruction.
 The decision to branch cannot be
made until
the execution of that instruction has
been
completed.
 Branch instructions represent about
20% of
the dynamic instruction count of most
programs.
Delayed Branch
 The instructions in the delay slots are
always
fetched. Therefore, we would like to
arrange
for them to be fully executed whether
or not
the branch is taken.
 The objective is to place useful
instructions in
these slots.
 The effectiveness of the delayed
branch
approach depends on how often it is
possible
to reorder instructions.
Delayed Branch

Delayed Branch
Branch Prediction
 To predict whether or not a particular branch will be
taken.
 Simplest form: assume branch will not take place and
continue to
fetch instructions in sequential address order.
 Until the branch is evaluated, instruction execution
along the
predicted path must be done on a speculative basis.
 Speculative execution: instructions are executed
before the
processor is certain that they are in the correct
execution
sequence.
 Need to be careful so that no processor registers or
memory
locations are updated until it is confirmed that these
instructions
should indeed be executed.

Incorrectly Predicted
Branch
Branch Prediction
 Better performance can be achieved if we
arrange
for some branch instructions to be predicted
as
taken and others as not taken.
 Use hardware to observe whether the target
address is lower or higher than that of the
branch
instruction.
 Let compiler include a branch prediction
bit.
 So far the branch prediction decision is
always the
same every time a given instruction is
executed –
static branch prediction.
Influence on
Instruction Sets
Overview
 Some instructions are much better
suited to
pipeline execution than others.
 Addressing modes
 Conditional code flags

Addressing Modes
 Addressing modes include simple
ones and
complex ones.
 In choosing the addressing modes to
be
implemented in a pipelined processor,
we
must consider the effect of each
addressing
mode on instruction flow in the
pipeline:
 Side effects
 The extent to which complex addressing
modes cause
the pipeline to stall
 Whether a given mode is likely to be used by
compilers

Recall
Load X(R1), R2
Complex Addressing Mode
Load (X(R1)), R2
Simple Addressing Mode
Add #X, R1, R2
Load (R2), R2
Load (R2), R2
Addressing Modes
 In a pipelined processor, complex
addressing
modes do not necessarily lead to faster
execution.
 Advantage: reducing the number of
instructions /
program space
 Disadvantage: cause pipeline to stall / more
hardware to decode / not convenient for
compiler to
work with
 Conclusion: complex addressing modes are
not
suitable for pipelined execution.

Addressing Modes
 Good addressing modes should have:
 Access to an operand does not require more
than one
access to the memory
 Only load and store instruction access memory
operands
 The addressing modes used do not have side
effects
 Register, register indirect, index

Conditional Codes
 Ifan optimizing compiler attempts to
reorder
instruction to avoid stalling the
pipeline when
branches or data dependencies
between
successive instructions occur, it must
ensure
that reordering does not cause a
change in
the outcome of a computation.
 The dependency introduced by the
conditioncode flags reduces the
flexibility available for
the compiler to reorder instructions.
Conditional Codes
Conditional Codes
 Two conclusion:
 To provide flexibility in reordering
instructions, the
condition-code flags should be affected by as
few
instruction as possible.
 The compiler should be able to specify in
which
instructions of a program the condition codes
are
affected and in which they are not.

Superscalar Operation
Overview
 The maximum throughput of a pipelined
processor
is one instruction per clock cycle.
 If we equip the processor with multiple
processing
units to handle several instructions in
parallel in
each processing stage, several instructions
start
execution in the same clock cycle –
multiple-issue.
 Processors are capable of achieving an
instruction
execution throughput of more than one
instruction
per cycle – superscalar processors.
 Multiple-issue requires a wider path to the
cache
and multiple execution units.
Superscalar
Timing
Out-of-Order Execution
 Hazards
 Exceptions
 Imprecise exceptions
 Precise exceptions
Execution Completion
 It is desirable to used out-of-order execution, so that
an
execution unit is freed to execute other instructions as
soon as
possible.
 At the same time, instructions must be completed in
program
order to allow precise exceptions.
 The use of temporary registers
 Commitment unit

You might also like