0% found this document useful (0 votes)
11 views

CO Unit 4_Processing_Pipelining (1)

Uploaded by

basava644265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CO Unit 4_Processing_Pipelining (1)

Uploaded by

basava644265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

UNIT – IV

Basic Processing Unit


Overview
 Instruction Set Processor (ISP)
 Central Processing Unit (CPU)

 A typical computing task consists of a series

of steps specified by a sequence of machine


instructions that constitute a program.
 An instruction is executed by carrying out a

sequence of more rudimentary operations.


Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time and
perform the operation specified.
 Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
 Carry out the actions specified by the instruction in
the IR (execution phase).
Processor Organization
In t ern al p ro ces s o r
bus

Co n t ro l s i g n al s

PC

In s t ru ct i o n
Ad d res s
d eco d er an d
l i n es
MAR co n t ro l l o g i c

Memo ry
bus

MDR
Dat a
l i n es IR

Co n s t an t 4 R0

Sel ect MUX

Datapath
Ad d
A B
ALU Su b R n - 1 
co n t ro l ALU
l i n es
Carry -i n
XOR TEMP

F igure 7. 1. S ingle-bus organization of the datapath inside a processor.


Internal organization of the
processor
 ALU
 Registers for temporary storage
 Various digital circuits for executing different micro
operations.(gates, MUX,decoders,counters).
 Internal path for movement of data between ALU
and registers.
 Driver circuits for transmitting signals to external
units.
 Receiver circuits for incoming signals from external
units.
 PC:
 Keeps track of execution of a program
 Contains the memory address of the next instruction to be
fetched and executed.
MAR:
 Holds the address of the location to be accessed.
 I/P of MAR is connected to Internal bus and an O/p to external
bus.
MDR:
 Contains data to be written into or read out of the addressed
location.
 IT has 2 inputs and 2 Outputs.
 Data can be loaded into MDR either from memory bus or from
internal processor bus.
The data and address lines are connected to the internal bus via
MDR and MAR
Registers:
 The processor registers R0 to Rn-1 vary considerably from
one processor to another.
 Registers are provided for general purpose used by

programmer.
 Special purpose registers-index & stack registers.

 Registers Y,Z &TEMP are temporary registers used by


processor during the execution of some instruction.
Multiplexer:
 Select either the output of the register Y or a constant value 4

to be provided as input A of the ALU.


 Constant 4 is used by the processor to increment the contents

of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
b us

Ri in

1.Register Transfers Ri

Ri out

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
 The input and output gates for register Ri are
controlled by signals isRin and Riout .
R Is set to1 – data available on common bus
in
are loaded into Ri.
R
iout Is set to1 – the contents of register are
placed on the bus.
R
iout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus

Ri

Riout

Yin

Constant 4

Select MUX

A B
ALU

Zin

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
 The ALU is a combinational circuit that has no
internal storage.
 ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
 What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers
 All operations and data transfers are controlled by the processor clock.
Bus

D Q
1
Q
Riout

Ri in
Clock

Figure 7.3.
Figure 7.3.Input
Inputand
andoutput
output gating
gating for one register
register bit.
bit.
Fetching a Word from Memory
 Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus

MDR

MDR inE MDRin

Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
3.Fetching a Word from
Memory
 The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
 To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
 Move (R1), R2
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]
Step 1 2 3

Timing Clock

MARin

Assume MAR Address


is always available
on the address lines
Read
of the memory bus.
MR

MDRinE
 Move (R1), R2
1. R1out, MARin, Read Data

2. MDRinE, WMFC
MFC
3. MDRout, R2in
MDR out

Figure 7.5. Timing of a memory Read operation.


4.Storing a word in memory
 Address is loaded into MAR
 Data to be written loaded into MDR.

 Write command is issued.

 Example:Move R2,(R1)

R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction

 Fetch the first operand (the contents of the

memory location pointed to by R3)


 Perform the addition

 Load the result into R1


Execution of a Complete
Instruction Internal processor
bus

Add (R3), R1 Control signals

PC
Step Action
Instruction
Address
decoder and
lines
1 PC out , MAR in , Read, Select4,Add, Zin MAR control logic

Memory
2 Zout , PC in , Y in , WMF C bus

3 MDRout , IR in Data
MDR
lines IR
4 R3out , MAR in , Read
5 R1out , Y in , WMF C Y
Constant 4 R0
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End Select MUX

Add
A B
ALU Sub R n - 1 
control ALU
lines
Figure7.6. Control sequence
forexecutionof theinstructionAdd (R3),R1.
XOR
Carry-in
TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.


Execution of Branch
Instructions
A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
 The offset X is usually the difference between

the branch target address and the address


immediately following the branch instruction.
 UnConditional branch
Execution of Branch
Instructions

StepAction

1 PCout , MAR in , Read,Select4,Add, Zin


2 Zout, PCin , Yin, WMF C
3 MDRout , IR in
4 Offset-field-of-IR
out, Add, Zin

5 Zout, PCin , End

Figure 7.7. Control sequence for an unconditional branch instruction.


Multiple-Bus Organization
Bus A Bus B Bus C

Incrementer

PC
• Allow the contents of two
different registers to be accessed
Register
file
simultaneously and have their
contents placed on buses A and B.
Constant 4

• Allow the data on bus C to be


MUX

A
loaded into a third register during
ALU R

B
the same clock cycle.

Instruction
• Incrementer unit.
decoder

• ALU simply passes one of its two


IR
input operands unmodified to bus
MDR C
MAR  control signal: R=A or R=B
Memory bus Address
data lines lines

Figure 7.8. Three-b us organization of the datapath.


 General purpose registers are combined into
a single block called registers.
 3 ports,2 output ports –access two different
registers and have their contents on buses A
and B
 Third port allows data on bus c during same
clock cycle.
 Bus A & B are used to transfer the source
operands to A & B inputs of the ALU.
 ALU operation is performed.
 The result is transferred to the destination
over the bus C.
 ALU may simply pass one of its 2 input operands
unmodified to bus C.
 The ALU control signals for such an operation R=A
or R=B.
 Incrementer unit is used to increment the PC by 4.
 Using the incrementer eliminates the need to add
the constant value 4 to the PC using the main ALU.
 The source for the constant 4 at the ALU input
multiplexer can be used to increment other address
such as loadmultiple & storemultiple
Multiple-Bus Organization
 Add R4, R5, R6

StepAction

1 PCout, R=B, MAR in , Read, IncPC


2 WMFC
3 MDRoutB, R=B, IR in
4 R4outA, R5outB, SelectA,Add, R6in, End

Figure 7.9. Control sequence for the instruction. Add R4,R5,R6,


for the three-bus organization in Figure 7.8.
 Step 1:The contents of PC are passed
through the ALU using R=B control signal &
loaded into MAR to start a memory read
operation
At the same time PC is incrementer by 4
 Step 2:The processor waits for MFC

 Step 3: Loads the data ,received into

MDR ,then transfers them to IR.


 Step 4: The execution phase of the

instruction requires only one control step to


complete.
Pipelining
Pipelining
Pipelining
 The process of an instruction. In 4 steps
Pipelining
Pipelining
Pipelined Performance
 One of the unit may fail
 Some may require more clock cycle- Eg Divide

 D4 and D5 must be postponed


Pipelined Performance
 Stalled in 2 clk cycle
 Condition that causes pipeline to stall is called

as hazard
 1.Data Hazard- data is not available
 2. Control Hazard / Instruction Hazard: Delay in
availability of instruction (miss in cache)
Alternative Representation
Pipelined Performance
Stalls / bubbles
Pipelined Performance
3. Structured Hazard: when two instructions require
hardware at the same time (access of memory)
Eg: If data and instruction are in the same cache memory
Avoided by providing sufficient hardware
Data Hazard
The pipeline is stalled because data are delayed
Data Hazard
Eg1 : A=5

I1 = A 3 + A
I2 = B 4 * A

If executed in parallel leads to the wrong result

Eg2:
I1 = A 5 * C
I2 = B 20 + C

Can be executed in parallel


Data Dependency
Example

I1 = MUL R2,R3,R4
I2 = ADD R5,R4,R6
operand Forwarding
operand Forwarding
Handling Data Hazards
Alternate approach:
Is to leave data dependent on software
By introducing two-cycle delays as NoP(No operation) by
compiler
May lead to large program size
Side Effects
 Instruction may change the content of the register
 Autoincrement and Autodecrement
 If the destination operand is affected –side effect
Eg :
1, Push and Pop
2. Add with carry-multiple data dependence

Solution : side effect must be reduced


Instruction Hazard
Unconditional Branching
Unconditional Branching
Two-stage pipeline
Eg: Instruction I1 and I3 are in the successive memory locations and I2
is a branch
The time lost is called a branch penalty – one clk cycle
Unconditional Branching
Instruction queue and prefetching
 Cache miss or branch instruction causes a stall of
one or more clock cycle
 Fetch instructions before they are needed in a queue
Instruction queue and prefetching
Branch Folding
Conditional Branching
Dependence of branch instruction on previous instruction

You might also like